Unsupervised Feature Learning with Sparse Autoencoders in YUV Space
LI Zuhe①② FAN Yangyu① WANG Fengqin②
①(School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China) ②(School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China)
Existing unsupervised feature learning algorithms usually extract features in RGB color space, but YUV color space is widely adopted in image and video compression standards. In order to take advantage of human visual characteristics and avoid the calculation consumption caused by color space conversion, an unsupervised feature learning approach in YUV space based on sparse autoencoders is presented. First, image patches in YUV space are randomly sampled and whitened, and then are fed into sparse autoencoders to learn local features in an unsupervised way. Considering the characteristic that the luminance channel and chrominance channels are independent in YUV space, a whitening method which treats the luminance and chrominance separately is proposed in the pre-processing step. Finally, features learned over local image patches are convolved with large-size images in order to get global feature activations. Global features are then sent into image classification systems for performance testing. Experimental results reveal that unsupervised feature learning in YUV space achieves similar or even slightly better performance in color image classification compared with that in RGB space as long as the luminance component is whitened properly.
李祖贺,樊养余,王凤琴. YUV空间中基于稀疏自动编码器的无监督特征学习[J]. 电子与信息学报, 2016, 38(1): 29-37.
LI Zuhe, FAN Yangyu, WANG Fengqin. Unsupervised Feature Learning with Sparse Autoencoders in YUV Space. JEIT, 2016, 38(1): 29-37.
BENGIO Y, COURVILLE A, and VINCENT P. Representation learning: a review and new perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1798-1828.
[2]
COATES A, NG A Y, and LEE H. An analysis of single-layer networks in unsupervised feature learning[C]. Preceedings of the 14th International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, 2011: 215-223.
[3]
KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. Imagenet classification with deep convolutional neural networks[C]. Preceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, 2012: 1097-1105.
[4]
MASCI J, Meier U, CIREAN D, et al. Stacked convolutional auto-encoders for hierarchical feature extraction[C]. Preceedings of the 21st International Conference on Artificial Neural Networks, Espoo, 2011: 52-59.
[5]
LI Z, FAN Y, and LIU W. The effect of whitening transformation on pooling operations in convolutional autoencoders[J]. EURASIP Journal on Advances in Signal Processing, 2015, 2015(1): 1-11.
[6]
VINCENT P, LAROCHELLE H, Lajoie I, et al. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion[J]. The Journal of Machine Learning Research, 2010, 11(Dec): 3371-3408.
[7]
YIN H, JIAO X, CHAI Y, et al. Scene classification based on single-layer SAE and SVM[J]. Expert Systems with Applications, 2015, 42(7): 3368-3380.
[8]
ZHANG F, DU B, and ZHANG L. Saliency-guided unsupervised feature learning for scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(4): 2175-2184.
[9]
LÄNGKVIST M and LOUTFI A. Learning feature representations with a cost-relevant sparse autoencoder[J]. International Journal of Neural Systems, 2015, 25(1): 1-11.
[10]
LIU H L, Taniguchi T, TAKANO T, et al. Visualization of driving behavior using deep sparse autoencoder[C]. Preceedings of the 2014 IEEE Intelligent Vehicles Symposium, Dearborn, 2014: 1427-1434.
[11]
SERMANET P, KAVUKCUOGLU K, CHINTALA S, et al. Pedestrian detection with unsupervised multi-stage feature learning[C]. Proceedings of Computer Vision and Pattern Recognition (CVPR), Portland, 2013: 3626-3633.
[12]
SHIN H C, ORTON M R, COLLINS D J, et al. Stacked autoencoders for unsupervised feature learning and multiple organ detection in a pilot study using 4D patient data[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1930-1943.
[13]
SANKARAN A, PANDEY P, VATSA M, et al. On latent fingerprint minutiae extraction using stacked denoising sparse AutoEncoders[C]. Proceedings of the 2014 IEEE International Joint Conference on Biometrics (IJCB), Clearwater, 2014: 1-7.
SUN Zhijun, XUE Lei, and XU Yangming. Marginal Fisher feature extraction algorithm based on deep learning[J]. Journal of Electronics & Information Technology, 2013, 35(4): 805-811. doi: 10.3724/SP.J1146.2012.00949.
LIU Gaoping, ZHAO Dujuan, and HUANG Hua. Character recognition of license plate using autoencoder neural network reconstruction[J]. Journal of Optoelectronics·Laser, 2011, 22(1): 144-148.
[16]
SUDHIR R and BABOO L D S S. An efficient CBIR technique with YUV color space and texture features[J]. Computer Engineering and Intelligent Systems, 2011, 2(6): 85-95.
[17]
BELL A J and SEJNOWSKI T J. Edges are the “independent components” of natural scenes[C]. Proceedings of the 10th Annual Conference on Neural Information Processing Systems (NIPS), Denver, 1997: 831-837.
[18]
NG A Y, NGIAM J, FOO C Y, et al. Unsupervised feature learning and deep learning[OL]. http://deeplearning.stanford. edu/wiki/index.php, 2015, 7.
[19]
NG A Y. Sparse autoencoder[OL]. http://web.stanford.edu/ class/cs294a/sparseAutoencoder.pdf, 2015, 4.
ZHENG Xinwei, HU Yanfeng, SUN Xian, et al. Annotation of remote sensing images using spatial constrained multi-feature joint sparse coding[J]. Journal of Electronics & Information Technology, 2014, 36(8): 1891-1898. doi: 10.3724/SP.J1146.2013.01433.
[21]
CIRESAN D C, MEIER U, MASCI J, et al. Flexible, high performance convolutional neural networks for image classificationC]. Proceedings of the 22nd?International Joint Conference on Artificial Intelligence, Barcelona, 2011: 1237-1242.
[22]
BOUREAU Y L, PONCE J, and lEcUN Y. A theoretical analysis of feature pooling in visual recognition[C]. Proceedings of the 27th International Conference on Machine Learning (ICML), Haifa, 2010: 111-118.