|
|
Teager Energy Operator and Empirical Mode Decomposition Based Voice Activity Detection Method |
SHEN Xizhong ZHENG Xiaoxiu |
(Electrical and Electronical Engineering School, Shanghai Institute of Technology, Shanghai 201418, China) |
|
|
Abstract In recent years, Teager energy operator is proposed as a kind of nonlinear method characterized with tracking a time-varying signal. The operator is combined with empirical mode decomposition, and a new method of voice activity detection is proposed to find the best voice start point and end point. Empirical Mode Decomposition (EMD) is further exploited and some valid choice conditions are constructed to choose the valid intrinsic mode functions. Thus, the method is able to deal with the voice with noise. Also, the character of the single mode of empirical mode decomposition meets the demand of single frequency component required by Teager Energy Operator (TEO). At last, Hilbert transform is added to solve the inherent problem of the mode mixing due to empirical mode decomposition. Based on the above consideration, the proposed method can identify the unvoiced sound with noise, which is better than the direct TEO and double threshold method. Experiments show the validity of the proposed method.
|
Received: 30 October 2017
Published: 11 May 2018
|
|
Fund:Foundation of Shanghai Science and Technology Commission of Shanghai Municipality (15ZR1440700) |
Corresponding Authors:
SHEN Xizhong
E-mail: xzshen@yeah.net
|
|
|
|
[1] |
胡航. 现代语音信号处理[M]. 北京: 电子工业出版社, 2014: 30-48.
|
[2] |
KUMAR J and JENA P. Solution to fault detection during power swing using Teager-Kaiser Energy Operator[J]. Arabian Journal for Science and Engineering, 2017, 42(12): 5003-5013.
|
[3] |
BHOWMICK A, CHANDRA M, and BISWAS A. Speech enhancement using Teager energy operated ERB-like perceptual wavelet packet decomposition[J]. International Journal of Speech Technology, 2017(4): 1-15.
|
[4] |
汉小欢, 景新幸. 基于功率谱差分和TEO的语音端点检测[J]. 计算机应用与软件, 2011, 28(4): 82-83.
|
|
HAN Xiaohuan and JING Xinxing. Speech endpoint detection based on power spectrum diference and Teager energy operator[J]. Computer Application and Software, 2011, 28(4): 82-83.
|
[5] |
李杰, 周萍, 杜志然. 短时TEO能量在带噪语音端点检测中的应用[J]. 计算机工程与应用, 2013, 49(12): 144-147. doi: 10.3778/j.issn.1002-8331.1110-0479.
|
|
LI Jie, ZHOU Ping, and DU Zhiran. Application of short-time TEO energy in noisy speech endpoint[J]. Computer Engineering and Applications, 2013, 49(12): 144-147. doi: 10.3778/j.issn.1002-8331.1110-0479.
|
[6] |
王茂蓉, 周萍, 景新幸, 等. 基于Mel-TEO的带噪语音端点检测算法[J]. 微电子学与计算机, 2016, 33(4): 46-49. doi: 10.19304/j.cnki.issn1000-7180.2016.04.010.
|
|
WANG Maorong, ZHOU Ping, JING Xinxing, et al. Voice activity detection algorithm based on Mel-TEO in noisy environment[J]. Microelectronics & Computer, 2016, 33(4): 46-49. doi: 10.19304/j.cnki.issn1000-7180.2016.04.010.
|
[7] |
王明合, 张二华, 唐振民, 等. 基于Fisher线性判别分析的语音信号端点检测方法[J]. 电子与信息学报, 2015, 37(6): 1343-1349. doi: 10.11999/JEIT141122.
|
|
WANG Minghe, ZHANG Erhua, TANG Zhenmin, et al. Voice activity detection based on Fisher linear discriminant analysis[J]. Journal of Electronics & Information Technology, 2015, 37(6): 1343-1349. doi: 10.11999/JEIT141122.
|
[8] |
李晔, 张仁志, 崔慧娟, 等. 低信噪比下基于谱熵的语音端点检测算法[J]. 清华大学学报(自然科学版), 2005, 45(10): 1397-1440.
|
|
LI Ye, ZHANG Renzhi, CUI Huijuan, et al. Voice activity detection with low signal-to-noise rations based on the spectrum entropy[J]. Journal of Tsinghua University (Science and Technology), 2005, 45(10): 1397-1440.
|
[9] |
刘欢, 王骏, 林其光, 等. 时域和频域特征相融合的语音端点检测新方法[J]. 江苏科技大学学报(自然科学版), 2017, 31(1): 73-78. doi: 10.3969/j.issn.1673-4807.2017.01.014.
|
|
LIU Huan, WANG Jun, LIN Qiguang, et al. A novel speech activity detection algorithm based on the fusion of time and frequency domain features[J]. Journal of Jiangsu University of Science and Technology(Natural Science Edition), 2017, 31(1): 73-78. doi: 10.3969/j.issn.1673-4807.2017.01.014.
|
[10] |
WAN Yulong, WANG Xianliang, ZHOU Ruohua, et al. Enhanced voice activity detection based on automatic segmentation and event classification[J]. Journal of Computational Information Systems, 2014, 10(10): 4169-4177.
|
[11] |
GHOSH P K, TSIARTAS A, and NARAYANAN S. Robust voice activity detection using long-term signal variability[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(3): 600-613.
|
[12] |
卢志茂, 金辉, 张春祥, 等. 基于HHT和OSF的复杂环境语音端点检测[J]. 电子与信息学报, 2012, 34(1): 213-217. doi: 10.3724/SP.J.1146.2011.0047.
|
|
LU Zhimao, JIN Hui, ZHANG Chunxiang, et al. Voice activity detection in complex environment based on Hilbert-Huang transform and order statistics filter[J]. Journal of Electronics & Information Technology, 2012, 34(1): 213-217. doi: 10.3724/SP.J.1146.2011.0047.
|
[13] |
CHOI Jaehun and CHANG Joonhyuk. Dual-microphone voice activity detection technique based on two-step power level difference ratio[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2014, 22(6): 1069-1081.
|
[14] |
TEAGER H and TEAGER S. Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract[M]. Springer, 1990: 241-261.
|
[15] |
KAISER J F. On a simple algorithm to calculate the energy of a signal[C]. IEEE International Conference on Acoustics, New York, USA, 1990: 381-384.
|
[16] |
HUANG N E, SHEN Z, LONG S R, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis[J]. Proceedings: Mathematical, Physical and Engineering Sciences, 1998, 454(1971): 903-995.
|
[17] |
KIRBAS I and PEKER M. Signal detection based on empirical mode decomposition and Teager-Kaiser energy operator and its application to P and S wave arrival time detection in seismic signal analysis[J]. Neural Computing and Applications, 2017, 28(10): 3035-3045.
|
[18] |
郑近德, 程军圣, 杨宇. 改进的EEMD算法及其应用研究[J]. 振动与冲击, 2013, 32(21): 21-26.
|
|
ZHENG Jinde, CHENG Junsheng, and YANG Yu. Modified EEMD algorithm and its application[J]. Journal of Vibration and Shock, 2013, 32(21): 21-26.
|
|
|
|