|
|
Object Classification Method Based on Weakly Supervised E2LSH and Saliency Map Weighting |
ZHAO Yongwei LI Bicheng KE Shengcai |
(Institute of Information System Engineering, Information Engineering University, Zhengzhou 450002, China) |
|
|
Abstract The most popular approach in object classification is based on the bag of visual-words model. However, there are several fundamental problems that restricts the performance of this method, such as low time efficiency, the synonym and polysemy of visual words, and the lack of spatial information between visual words. In view of this, an object classification method based on weakly supervised Exact Euclidean Locality Sensitive Hashing (E2LSH) and saliency map weighting is proposed. Firstly, E2LSH is employed to generate a group of visual dictionary by clustering SIFT features of the training dataset, and the selecting process of hash functions is effectively supervised inspired by the random forest ideas to reduce the randomcity of E2LSH. Secondly, Graph-Based Visual Saliency (GBVS) algorithm is applied to detect the saliency map of different images and visual words are weighted according to the saliency prior. Finally, saliency map weighted visual language model is carried out to accomplish object classification. Experimental results on datasets of Caltech-256 and Pascal 2007 indicate that the distinguishability of objects is effectively improved and the proposed method is superior to the state- of-the-art object classification methods.
|
Received: 23 March 2015
Published: 17 November 2015
|
|
Fund: The National Natural Science Foundation of China (60872142, 61301232) |
Corresponding Authors:
ZHAO Yongwei
E-mail: zhaoyongwei369@163.com
|
|
|
|
[1] |
SIVIC J and ZISSERMAN A. Video Google: a text retrieval approach to object matching in videos[C]. Proceedings of 9th IEEE International Conference on Computer Vision, Nice, France, 2003: 1470-1477.
|
[2] |
CHEN Y Z, Dick A, LI X, et al. Spatially aware feature selection and weighting for object retrieval[J]. Image and Vision Computing, 2013, 31(6): 935-948.
|
[3] |
WANG J Y, Bensmail H, and GAO X. Joint learning and weighting of visual vocabulary for bag-of-feature based tissue classification[J]. Pattern Recognition, 2013, 46(3): 3249-3255.
|
[4] |
OT?VIO A, PENATTI B, FERNANDA B S, et al. Visual word spatial arrangement for image retrieval and classification[J]. Pattern Recognition, 2014, 47(1): 705-720.
|
[5] |
宋相法, 焦李成. 基于稀疏编码和集成学习的多示例多标记图像分类方法[J]. 电子与信息学报, 2013, 35(3): 622-626. doi: 10.3724/SP.J.1146.2012.01218.
|
|
SONG Xiangfa and JIAO Licheng. A multi-instance multi-label image classification method based on sparse coding and ensemble learning[J]. Jounal of Electronics & Information Technology, 2013, 35(3): 622-626. doi: 10.3724/ SP.J.1146.2012.01218.
|
[6] |
LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
|
[7] |
VAN GEMERT J C, VEENMAN C J, SMEULDERS A W M, et al. Visual word ambiguity[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(7): 1271-1283.
|
[8] |
NISTER D and STEWENIUS H. Scalable recognition with a vocabulary tree[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, 2006: 2161-2168.
|
[9] |
PHILBIN J, CHUM O, ISARD M, et al. Object retrieval with large vocabularies and fast spatial matching[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, USA, 2007: 1-8.
|
[10] |
MU Y D, SUN J, and YAN S C. Randomized locality sensitive vocabularies for bag-of-features model[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, USA, 2010: 1-14.
|
[11] |
CAO Yiqun, JIANG Tao, and THOMAS G. Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing[J]. Bioinformatics, 2010, 26(7): 953-959.
|
[12] |
XIA Hao, WU Pengcheng, and STEVEN C H. Boosting multi-kernel locality-sensitive hashing for scalable image retrieval[C]. Proceedings of 35th ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, Oregon, USA, 2012: 55-64.
|
[13] |
张瑞杰, 郭志刚, 李弼程. 基于E2LSH-MKL的视觉语义概念检测[J]. 自动化学报, 2012, 38(10): 1671-1678.
|
|
ZHANG Ruijie, GUO Zhigang, and LI Bicheng. A visual semantic concept detection algorithm based on E2LSH- MKL[J]. Acta Automatica Sinica, 2012, 38(10): 1671-1678.
|
[14] |
ZHENG Q and GAO W. Constructing visual phrases for effective and efficient object-based image retrieval[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2008, 5(1): 1-19.
|
[15] |
CHEN T, YAP K H, and ZHANG D J. Discriminative soft bag-of-visual phrase for mobile landmark recognition[J]. IEEE Transactions on Multimedia, 2014, 16(3): 612-622.
|
[16] |
PHILBIN J, CHUM O, ISARD M, et al. Lost in quantization: improving particular object retrieval in large scale image databases[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, USA, 2009: 278-286.
|
[17] |
WEINSHALL D, LEVI G, and HANUKAEV D. LDA topic model with soft assignment of descriptors to words[C]. Proceedings of the 30th International Conference on Machine Learning, Atlanta, USA, 2013: 711-719.
|
[18] |
LAZEBNIK S, SCHMID C, and PONCE J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, 2006: 2169-2178.
|
[19] |
SHARMA G and JURIE F. Learning discriminative spatial representation for image classification[C]. Proceedings of the 22nd British Machine Vision Conference, Dundee, Britain, 2011: 1-11.
|
[20] |
赵春晖, 王莹, KANEKO M. 一种基于词典模型的图像优化分类方法[J]. 电子与信息学报, 2012, 34(9): 2064-2070. doi: 10.3724/SP.J.1146.2012.00047.
|
|
ZHAO Chunhui, WANG Ying, and KANEKO M. An optimized method for image classification based on bag of words model[J]. Journal of Electronics & Information Technology, 2012, 34(9): 2064-2070. doi: 10.3724/ SP.J.1146.2012.00047.
|
[21] |
赵仲秋, 季海峰, 高隽, 等. 基于稀疏编码多尺度空间潜在语义分析的图像分类[J]. 计算机学报, 2014, 37(6): 1251-1260.
|
|
ZHAO Zhongqiu, JI Haifeng, GAO Jun, et al. Sparse coding based on multi-scale spatial latent semantic analysis for image classification[J]. Chinese Journal of Computers, 2014, 37(6): 1251-1260.
|
[22] |
XIE L, TIAN Q, and ZHANG B. Spatial pooling of heterogeneous features for image classification[J]. IEEE Transactions on Image Processing, 2014, 23(5): 1994-2008.
|
[23] |
GENG B, YANG L, and XU C. A study of language model for image retrieval[C]. Proceedings of IEEE International Conference on Data Mining Workshops, Washington, DC, USA, 2009: 158-163.
|
[24] |
吴磊. 视觉语言分析: 从底层视觉特征表达到语义距离学习[D]. [博士论文], 中国科学技术大学, 2010.
|
|
WU Lei. Visual language analysis: from low level feature representation to semantic metric learning[D]. [Ph.D. dissertation], University of Science and Technology of China, 2010.
|
[25] |
DATAR M, IMMORLICA N, and INDYK P. Locality-sensitive hashing scheme based on p-stable distributions[C]. Proceedings of the 20th Annual Symposium on Computational Geometry, New York, USA, 2004: 253-262.
|
[26] |
HAREL J, KOCH C, and PERONA P. Graph-based visual saliency [C]. Proceedings of Advances in Neural Information Processing Systems, NewYork, USA, 2007: 545-552.
|
[27] |
SLANEY M and CASEY M. Locality-sensitive hashing for finding nearest neighbors[J]. IEEE Signal Processing Magazine, 2008, 25(2): 128-131.
|
[28] |
高毫林, 彭天强, 李弼程. 基于多表频繁项投票和桶映射链的快速检索方法[J]. 电子与信息学报, 2012, 34(11): 2574-2581. doi: 10.3724/ SP.J.1146.2012.00548.
|
|
GAO Haolin, PENG Tianqiang, and LI Bicheng. A fast retrieval method based on frequent items voting of multi table and bucket map chain[J]. Journal of Electronics & Information Technology, 2012, 34(11): 2574-2581. doi: 10.3724/SP.J.1146.2012.00548.
|
[29] |
BREIMAN L. Random forests [OL]. http://www.stat.- berkeley.edu/RandomForests/.2014-07.
|
[30] |
ITTI L, KOCH C, and NIEBUR E. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(4): 1254-1259.
|
[31] |
LI F F, FERGUS R, and PERONA P. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories[J]. Computer Vision and Image Understanding, 2007, 106(1): 59-70.
|
[32] |
EVERINGHAM M, VAN Gool L, WILLIAMS C K I, et al. The pascal visual object classes challenge 2007 (VOC 2007) results [OL]. http://pascallin.ecs.soton.ac.uk/challenges/VOC/ voc2007/results/index.shtml, 2014.
|
|
|
|