Existing dual microphone Voice Activity Detection (VAD) algorithms use normally a fixed threshold. The fixed threshold can not provide an accurate VAD under various noise environments. In such case, it causes voice quality degradation, particularly in handset applications. This paper proposes a new VAD algorithm based on Neural Network (NN). Both sub-band power level difference and inter-microphone cross correlation are used as features. Then the NN based VAD is combined with the method of inter-microphone signal power ratio to get a new voice and noise activity detection algorithm. Furthermore, the algorithm is used into noise suppression in handset to avoid performance degradation caused by VAD misjudgment. Experimental results show that the proposed method provides better noise suppression performance and lower speech distortion compared to the existing method.
章雒霏,张铭,李晨. 一种新的语音和噪声活动检测算法及其在手机双麦克风消噪系统中的应用[J]. 电子与信息学报, 2016, 38(8): 2020-2026.
ZHANG Luofei, ZHANG Ming, LI Chen. A New Voice and Noise Activity Detection Algorithm and Its Application to Dual Microphone Noise Suppression System for Handset. JEIT, 2016, 38(8): 2020-2026.
JEUB M, HERGLOTZ C, NELKE C M, et al. Noise reduction for dual-microphone mobile phones exploiting power level differences[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Kyoto, 2012: 1693-1696. doi: 10.1109/ICASSP.2012.6288223.
[2]
XU Y, DU J, and DAI L R. A Regression approach to speech enhancement based on deep neural networks[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2015, 23(1): 7-19. doi: 10.1109/TASLP.2014.2364452.
[3]
XU Y, DU J, and DAI L R. An experimental study on speech enhancement based on deep neural networks[J]. IEEE Signal Processing Letters, 2014, 21(1): 65-68. doi: 10.1109/LSP. 2013.2291240.
[4]
WANG Y X, NARAYANAN A, and WANG D L. On training targets for supervised speech separation[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2014, 22(12): 1849-1859. doi: 10.1109/TASLP.2014.2352935.
WANG Minghe, ZHANG Erhua, TANG Zhenmin, et al. Voice activity detection based on Fisher linear discriminant analysis[J]. Journal of Electronics & Information Technology, 2015, 37(6): 1343-1349. doi: 10.11999/JEIT141122.
GUO Haiyan, LI Xiaoxiong, and LI Nijun. Single-channel speech separation based on pitch state and interframe correlation[J]. Journal of Southeast University (Natural Science Edition), 2014, 44(6): 1100-1104.
[7]
NELKE C, BEAUGEANT C, and VARY P. Dual microphone noise PSD estimation for mobile phones in hands-free position exploiting the coherence and speech presence probability[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Vancouver, 2013: 7279-7283. doi: 10.1109/ ICASSP.2013.6639076.
[8]
YOUSEFIAN N, RAHMANI M, and AKBARI A. Power level difference as a criterion for speech enhancement[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Taipei, 2009: 4653-4656. doi: dx.doi.org/ 10.1109/ICASSP.2009.4960668.
[9]
YOUSEFIAN N, AKBARI A, and RAHMANI M. Using power level difference for near field dual-microphone speech enhancement[J]. Applied Acoustics, 2009, 70(11/12): 1412-1421.
[10]
FU Z H, FAN F, and HUANG J D. Dual-microphone noise reduction for mobile phone application[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Vancouver, 2013: 7239-7243. doi: 10.1109/ ICASSP.2013.6639068.
[11]
MEYER-BAESE U. Digital Signal Processing with Field Programmable Gate Arrays[M]. Third Edition, Berlin Heidelberg: Springer, 2007: 298-305.
[12]
RUBIO J E, ISHIZUKA K, SAWADA H, et al. Two- microphone voice activity detection based on the homogeneity of the direction of arrival estimates[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Honolulu, 2007: 385-388. doi: 10.1109/ICASSP. 2007.366930.
[13]
ZHAO H C, LI L G, and LI L H, et al. Dual-microphone adaptive noise canceller with a voice activity detector[C]. IEEE Region 10 Symposium, Kuala Lumpur, 2014: 551-554. doi: 10.1109/TENCONSpring.2014.6863095.
[14]
CHOI J H and CHANG J H. Dual-microphone voice activity detection technique based on two-step power level difference ratio[J] IEEE Transactions on Audio, Speech and Language Processing, 2014. 22(6): 1069-1081.
[15]
HU Y, and LOIZHOU P C. Evaluation of objective quality measures for speech enhancement[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(1): 229-238.