Facial Expression Recognition Based on the Fusion of Spatio-temporal Features in Video Sequences
WANG Xiaohua①② XIA Chen① HU Min① REN Fuji①③
①(School of Computer and Information of Hefei University of Technology, Hefei 230009, China) ②(The Laboratory for Internet of Things and Mobile Internet Technology of Jiangsu Province, Huaian, 223001 China) ③(Graduate School of Advanced Technology & Science, University of Tokushima, Tokushima 7708502, Japan)
Abstract:For facial expression recognition based on video sequences, the changing information of facial regions along the time axis can be described by dynamic descriptors more effectively than static descriptors. This paper proposes an expression recognition method based on the dynamic texture and motion information, learning from the principle of Local Binary Pattern on Three Orthogonal Planes (LBP-TOP), Spatio-Temporal Weber Local Descriptor (STWLD) is proposed to describe the dynamic texture feature information of the facial expression sequence. Moreover, using Block-based Histogram of Optical Flow features (BHOF), the motion information can be described. Through the combination of the dynamic texture and motion information, and finally SVM is applied to complete the expression classification. The results of the cross experiments on the CK + and MMI expression database show that the method achieves better performance than methods using the single descriptors. The comparison experiments with other related methods also prove the superiority of the method.
王晓华, 夏晨, 胡敏, 任福继. 融合时空特征的视频序列表情识别[J]. 电子与信息学报, 2018, 40(3): 626-632.
WANG Xiaohua, XIA Chen, HU Min, REN Fuji. Facial Expression Recognition Based on the Fusion of Spatio-temporal Features in Video Sequences. JEIT, 2018, 40(3): 626-632.
CHEON Y and KIM D. Natural facial expression recognition using differential-AAM and manifold learning[J]. Pattern Recognition, 2009, 42(7): 1340-1350. doi: 10.1016/j.patcog. 2008.10.010.
[2]
PAN Z, POLCEANU M, and LISETTI C. On constrained local model feature normalization for facial expression recognition[C]. International Conference on Intelligent Virtual Agents. Los Angeles, CA, USA, 2016: 369-372. doi: 10.1007/978-3-319-47665-0_35.
[3]
ZHU X and RAMANAN D. Face detection, pose estimation, and landmark localization in the wild[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 2012: 2879-2886. doi: 10.1109 /CVPR.2012.6248014.
[4]
ZHAO L, WANG Z, and ZHANG G. Facial expression recognition from video sequences based on spatial-temporal motion local binary pattern and Gabor multiorientation fusion histogram[J]. Mathematical Problems in Engineering, 2017, (1): 1-12. doi: 10.1155/2017/7206041.
[5]
ZHOU J, ZHANG S, MEI H, et al. A method of facial expression recognition based on Gabor and NMF[J]. Pattern Recognition and Image Analysis, 2016, 26(1): 119-124. doi: 10.1134/S1054661815040070.
[6]
CHEN J, SHAN S, HE C, et al. WLD: A robust local image descriptor[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1705-1720. doi: 10.1109/ TPAMI.2009.155.
[7]
ZHAO G and PIETIKAINEN M. Dynamic texture recognition using local binary patterns with an application to facial expressions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(6): 915-928. doi: 10.1109/ TPAMI.2007.111.
FU Xiaofeng, FU Xiaojuan, LI Jianjun, et al. Facial expression recognition using multi-scale spatiotemporal local orienta-tional pattern histogram projection in video sequences[J]. Journal of Computer Aided Design & Computer Graphics, 2015, 27(6): 1060-1066.
[9]
KAMAROL S K A, JAWARD M H, PARKKINEN J, et al. Spatiotemporal feature extraction for facial expression recognition[J]. IET Image Processing, 2016, 10(7): 534-541. doi: 10.1049/iet-ipr.2015.0519.
[10]
MEINHARDT-Llopis E, P?REZ J S, and KONDERMANN D. Horn-schunck optical flow with a multi-scale strategy[J]. Image Processing on Line, 2013, 20: 151-172. doi: 10.5201/ ipol.2013.20.
ZHANG Xuange, TIAN Yantao, YAN Fei, et al. Micro- expression recognition based on global optical flow feature[J]. Pattern Recognition and Artificial Intelligence. 2016, 29(8): 760-768. doi: 10.16451/j.cnki.issn1003-6059.201608011.
[12]
YACOOB Y and DAVIS L S. Recognizing human facial expressions from long image sequences using optical flow[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1996, 18(6): 636-642. doi: 10.1109/34.506414.
[13]
LUCEY P, COHN J F, KANADE T, et al. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression[C]. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), California, USA, 2010: 94-101. doi: 10.1.1.182.3759.
[14]
PANTIC M, VALSTAR M, RADEMAKER R, et al. Web-based database for facial expression analysis[C]. IEEE International Conference on Multimedia and Expo, Amsterdam, The Netherlands, 2005: 317-321. doi: 10.1109/ ICME.2005.1521424.
QIU Yu, ZHAO Jieyu, and WANG Yanfang. Facial expression recognition using temporal relations among facial movements[J]. Acta Electronica Sinica, 2016, 44(6): 1307-1313. doi: 10.3969/j.issn.0372-2112.2016.06.007.
[16]
FAN X and TJAHJADI T. A spatial-temporal framework based on histogram of gradients and optical flow for facial expression recognition in video sequences[J]. Pattern Recognition, 2015, 48(11): 3407-3416. doi: 10.1016/j.patcog. 2015.04.025.
[17]
LONG F and BARTLETT M S. Video-based facial expression recognition using learned spatiotemporal pyramid sparse coding features[J]. Neurocomputing, 2016, 173: 2049-2054. doi: 10.1016/j.neucom.2015.09.049
[18]
GUPTA O, RAVIV D, and RASKAR R. Multi-velocity neural networks for facial expression recognition in videos[J]. IEEE Transactions on Affective Computing, 1949, 99: 1.
[19]
FANG H, MAC Parthaláin N, AUBREY A J, et al. Facial expression recognition in dynamic sequences: An integrated approach[J]. Pattern Recognition, 2014, 47(3): 1271-1281. doi: 10.1016/j.patcog.2013.09.023.
[20]
WANG Z, WANG S, and Ji Q. Capturing complex spatio- temporal relations among facial muscles for facial expression recognition[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA. 2013: 3422-3429. doi: 10.1109/CVPR.2013.439.
[21]
JUNG H, LEE S, YIM J, et al. Joint fine-tuning in deep neural networks for facial expression recognition[C]. Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 2983-2991. doi: 10.1109/ICCV.2015.341.