局部分布信息增强的视觉单词描述与动作识别

doi:10.11999/JEIT150410

Abstract
Figure/Table
References (17)
Related Citation (15)

Download: PDF (4276 KB)
Export: BibTeX | EndNote (RIS)

Abstract

The traditional Bag-Of-Words (BOW) model easy causes confusion of different action classes due to the lack of distribution information among features. And the size of BOW has a large effect on recognition rate. In order to reflect the distribution information of interesting points, the position relationship of interesting points in local spatio-temporal region is calculated as the consistency of distribution features. And the appearance features are fused to build the enhanced BOW model. SVM is adopted for multi-classes recognition. The experiment is carried out on KTH dataset for single person action recognition and UT-interaction dataset for multi-person abnormal action recognition. Compared with traditional BOW model, the enhanced BOW algorithm not only has a great improvement in recognition rate, but also reduces the influence of BOW model’s size on recognition rate. The experiment results of the proposed algorithm show the validity and good performance.

Key words： Human action recognition Local distribution features Enhanced Bag-Of-Words (BOW) model Support Vector Machine (SVM)

Received: 08 April 2015 Published: 22 January 2016

PACS:

TP391

Fund:

The National Natural Science Foundation of China (61179045)

Corresponding Authors: ZHANG Liang E-mail: stonemark@vip.163.com

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	ZHANG Liang
	LU Mengmeng
	JIANG Hua

Cite this article:

ZHANG Liang,LU Mengmeng,JIANG Hua. An Improved Scheme of Visual Words Description and Action Recognition Using Local Enhanced Distribution Information[J]. JEIT, 2016, 38(3): 549-556.

URL:

http://jeit.ie.ac.cn/EN/10.11999/JEIT150410 OR http://jeit.ie.ac.cn/EN/Y2016/V38/I3/549

[1]	胡琼, 秦磊, 黄庆明. 基于视觉的人体动作识别综述[J]. 计算机学报, 2013, 36(12): 2512-2524. doi: 10.3724/SP.J.1016. 2013.02512.
	HU Qiong, QIN Lei, and HUANG Qingming. Human action recognition review based on computer vision[J]. Journal of Computer, 2013, 36(12): 2512-2524. doi: 10.3724/SP.J. 1016.2013.02512.
[2]	BEBAR A A and HEMAYED E E. Comparative study for feature detector in human activity recognition[C]. IEEE the 9th International conference on Computer Engineering Conference, Giza, 2013: 19-24. doi:10.1109/ICENCO.2013. 6736470.
[3]	LI F and DU J X. Local spatio-temporal interest point detection for human action recognition[C]. IEEE the 5th International Conference on Advanced Computational Intelligence, Nanjing, 2012: 579-582. doi: 10.1109/ICACI. 2012.6463231.
[4]	ONOFRI L, SODA P, and IANNELLO G. Multiple subsequence combination in human action recognition[J]. IEEE Journal on Computer Vision, 2014, 8(1): 26-34. doi:10.1049/iet-cvi.2013.0015.
[5]	FOGGIA P, PERCANNELLA G, SAGGESE A, et al. Recognizing human actions by a bag of visual words[C]. IEEE International Conference on Systems, Man, and Cybernetics, Manchester, 2013: 2910-2915. doi:10.1109/SMC.2013.496.
[6]	ZHANG X, MIAO Z J, and WAN L. Human action categories using motion descriptors[C]. IEEE 19th International Conference on Image Processing, Orlando, FL, 2012: 1381-1384. doi:10.1109/ICIP.2012.6467126.
[7]	LI Y and KUAI Y H. Action recognition based on spatio-temporal interest point[C]. IEEE the 5th International
[8]	Conference on Biomedical Engineering and Informatics, Chongqing, 2012: 181-185. doi:10.1109/BMEI.2012.6512972.
[9]	REN H and MOSELUND T B. Action recognition using salient neighboring histograms[C]. IEEE the 20th International Conference on Image Processing, Melbourne, VIC, 2013: 2807-2811. doi:10.1109/ICIP.2013.6738578.
[10]	COZAR J R, GONZALEZ-LINARES J M, GUIL N, et al. Visual words selection for human action classification[C]. International Conference on High Performance Computing and Simulation, Madrid, 2012: 188-194. doi: 10.1109/ HPCSim.2012.6266910.
[11]	WANG H R, YUAN C F, HU W M, et al. Action recognition using nonnegative action component representation and sparse basis selection[J]. IEEE Transactions on Image Processing, 2014, 23(2): 570-581. doi:10.1109/TIP.2013. 2292550.
[12]	BILINSKI P and BREMOND F. Contextual statistics of space-time ordered features for human action recognition[C]. IEEE the 9th International Conference on Advanced Video and Signal-based Surveillance, Beijing, 2012: 228-233. doi:10.1109/AVSS.2012.29.
[13]	ZHANG L, ZHEN X T, and Shao L. High order co-occurrence of visualwords for action recognition[C]. IEEE the 19th International Conference on Image Processing, Orlando, FL, 2012: 757-760. doi:10.1109/ICIP.2012.6466970.
[14]	SHAN Y H, ZHANG Z, ZHANG J, et al. Interest point selection with spatio-temporal context for realistic action recognition[C]. IEEE the 9th International Conference on Advanced Video and Signal-based Surveillance, Beijing, 2012: 94-99. doi:10.1109/AVSS.2012.43.
[15]	TIAN Y and RUAN Q Q. Weight and context method for action recognition using histogram Intersection[C]. The 5th IET International Conference on Wireless, Mobile and Multimedia Networks, Beijing, 2013: 229-233. doi:10.1049/ cp.2013.2414.
[16]	LAPTEV I and LIDEBERG T. Space-time interest points[C]. IEEE the 9th International Conference on Computer Vision, Nice, France, 2003: 432-439. doi:10.1109/ICCV.2003. 1238378.
[17]	KLASER A, MARSZALEK M, and SCHMID C. A spatio- temporal descriptor based on 3D-gradients[C]. The 19th Conference on British Machine Vision and Pattern Recognition, Leeds, United Kingdom, 2008: 1-10.