|
|
Robust Visual Tracking via Perceptive Deep Neural Network |
HOU Zhiqiang DAI Bo HU Dan YU Wangsheng CHEN Chen FAN Shunyi |
(Institute of Information and Navigation, Air Force Engineering University, Xi’an 710077, China) |
|
|
Abstract In a visual tracking system, the feature description plays the most important role. Multi-cue fusion is an effective way to solve the tracking problem under many complex conditions. Therefore, a perceptive deep neural network based on multi parallel networks which can be triggered adaptively is proposed. Then, using the multi-cue fusion, a new tracking method based on deep learning is established, in which the target can be adaptively fragmented. The fragment decreases the input dimension, thus reducing the computation complexity. During the tracking process, the model can dynamically adjust the weights of fragments according to the reliability of them, which is able to improve the flexibility of the tracker to deal with some complex circumstances, such as target posture change, light change and occluded by other objects. Qualitative and quantitative analysis on challenging benchmark video sequences show that the proposed tracking method is robust and can track the moving target robustly.
|
Received: 22 December 2015
Published: 31 May 2016
|
|
Fund: The National Natural Science Foundation of China (61175029, 61473309), The Natural Science Foundation of Shaanxi Province (2015JM6269, 2015JM6269, 2016JM6050) |
Corresponding Authors:
DAI Bo
E-mail: daybright_david@163.com
|
|
|
|
[1] |
侯志强, 韩崇昭. 视觉跟踪技术综述[J]. 自动化学报, 2006, 32(4): 603-617.
|
|
HOU Zhiqiang and HAN Chongzhao. A Survey of visual tracking[J]. Acta Automatica Sinica, 2006, 32(4): 603-617.
|
[2] |
WANG Naiyan, SHI Jianping, YEUNG Dityan, et al. Understanding and diagnosing visual tracking systems[C]. International Conference on Computer Vision, Santiago, Chile, 2015: 11-18.
|
[3] |
BABENKO B, YANG M, and BELONGIE S. Visual tracking with online multiple instance learning[C]. International Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009: 983-990. doi: 10.1109/CVPR.2009. 5206737.
|
[4] |
KALAL Z, MIKOLAJCZYK K, and MATAS J. Tracking learning detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409-1422. doi: 10.1109/TPAMI.2011.239.
|
[5] |
HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification[C]. International Conference on Computer Vision, Santiago, Chile, 2015: 1026-1034.
|
[6] |
COURBARIAUX M, BENGIO Y, and DAVID J P. Binary Connect: training deep neural networks with binary weights during propagations[C]. Advances in Neural Information Processing Systems, Montréal, Quebec, Canada, 2015: 3105-3113.
|
[7] |
SAINATH T N, VINYALS O, SENIOR A, et al. Convolutional, long short term memory, fully connected deep neural networks[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, Australia, 2015: 4580-4584. doi: 10.1109/ICASSP.2015.7178838.
|
[8] |
PARKHI O M, VEDALDI A, and ZISSERMAN A. Deep face recognition[J]. Proceedings of the British Machine Vision, 2015, 1(3): 6.
|
[9] |
WANG Naiyan and YEUNG Dityan. Learning a deep compact image representation for visual tracking[C]. Advances in Neural Information Processing Systems, South Lake Tahoe, Nevada, USA, 2013: 809-817.
|
[10] |
李寰宇, 毕笃彦, 杨源, 等. 基于深度特征表达与学习的视觉跟踪算法研究[J]. 电子与信息学报, 2015, 37(9): 2033-2039.
|
|
LI Huanyu, BI Duyan, YANG Yuan, et al. Research on visual tracking algorithm based on deep feature expression and learning[J]. Journal of Electronics & Information Technology, 2015, 37(9): 2033-2039. doi: 10.11999/JEIT150031.
|
[11] |
RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252. doi: 10.1007/ s11263-015-0816-y.
|
[12] |
VINCENT P, LAROCHELLE H, LAJOIE I, et al. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion[J]. Journal of Machine Learning Research, 2010, 11(11): 3371-3408.
|
[13] |
HINTON G E and SALAKHUTDINOV R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507. doi: 10.1126/science.1127647.
|
[14] |
ADAM A, RIVLIN E, and SHIMSHONI I. Robust fragments-based tracking using the integral histogram[C]. International Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 2006: 798-805. doi: 10.1109/CVPR.2006.256.
|
[15] |
JULIER S J and UHLM J U. Unscented filtering and nonlinear estimation[J]. Proceedings of IEEE, 2004, 192(3): 401-422. doi: 10.1109/JPROC.2003.823141.
|
[16] |
YILMAZ A, JAVED O, and SHAH M. Object tracking: a survey[J]. ACM Computer Survey, 2006, 38(4): 1-45.
|
[17] |
NICKEL K and STIEFELHAGEN R. Dynamic integration of generalized cues for person tracking[C]. European Conference on Computer Vision, Marseille, France, 2008: 514-526. doi: 10.1007/978-3-540-88693-8_38.
|
[18] |
SPENGLER M and SCHIELE B. Towards robust multi-cue integration for visual tracking[J]. Machine Vision and Applications, 2003, 14(1): 50-58. doi: 10.1007/s00138-002- 0095-9.
|
[19] |
WU Yi, LIM Jongwoo, and YANG Minghsuan. Online object tracking: a benchmark[C]. International Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 2013: 2411-2418.
|
[20] |
ZHANG Kaihua, ZHANG Lei, and YANG Minghsuan. Real-time compressive tracking[C]. European Conference on Computer Vision, Florence, Italy, 2012: 866-879. doi: 10.1007/978-3-642-33712-3_62.
|
[21] |
SEVILLA-LARA L and LEARNED-MILLER E. Distribution fields for tracking[C]. International Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 2012: 1910-1917. doi: 10.1109/CVPR.2012.6247891.
|
[22] |
LI Hanxi, LI Yi, and PORIKLI Fatih. Deeptrack: learning discriminative feature representations by convolutional neural networks for visual tracking[C]. Proceedings of the British Machine Vision Conference, Nottingham, UK, 2014: 110-119. doi: 10.1109/TIP.2015.2510583.
|
|
|
|