一种新的语音和噪声活动检测算法及其在手机双麦克风消噪系统中的应用

doi:10.11999/JEIT151302

摘要
图/表
参考文献(15)
相关文章 (15)

全文: PDF (901 KB)
输出: BibTeX | EndNote (RIS)

摘要

针对现有双通道语音活动检测(Voice Activity Detection, VAD)算法依赖于固定阈值难以在多种噪声环境下准确地检测语音和噪声，应用于手机消噪系统会造成语音失真或噪声消除不好等问题，该文提出一种基于神经网络的VAD算法，该算法以分频带能量差和归一化互通道相关为特征，采用神经网络对语音和噪声进行分类。在此基础上，将神经网络VAD与基于互通道信号功率比值的VAD相结合，提出一种新的适用于手机消噪系统的语音和噪声活动检测算法分别对语音和噪声进行检测，并以此进行噪声抑制处理，减少了消噪系统因VAD误判而造成的性能下降。实验结果表明，该处理方法在抑制背景噪声和减少语音失真等方面优于现有的消噪算法，对于方向性语音干扰也有很好的抑制效果。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	章雒霏
	张铭
	李晨

关键词 ：语音活动检测, 语音增强, 神经网络

Abstract：

Existing dual microphone Voice Activity Detection (VAD) algorithms use normally a fixed threshold. The fixed threshold can not provide an accurate VAD under various noise environments. In such case, it causes voice quality degradation, particularly in handset applications. This paper proposes a new VAD algorithm based on Neural Network (NN). Both sub-band power level difference and inter-microphone cross correlation are used as features. Then the NN based VAD is combined with the method of inter-microphone signal power ratio to get a new voice and noise activity detection algorithm. Furthermore, the algorithm is used into noise suppression in handset to avoid performance degradation caused by VAD misjudgment. Experimental results show that the proposed method provides better noise suppression performance and lower speech distortion compared to the existing method.

Key words： Voice Activity Detection (VAD) Speech enhancement Neural Network (NN)

收稿日期: 2015-11-23 出版日期: 2016-05-31

PACS:

TN912.35

基金资助:

江苏省自然科学基金，江苏省声频技术工程重点实验室基金项目(BE2014139)

通讯作者: 章雒霏：女，1990年生，博士生，研究方向为信号处理、语音增强、语音识别、语音定位. E-mail: lincover@126.com

作者简介: 章雒霏：女，1990年生，博士生，研究方向为信号处理、语音增强、语音识别、语音定位. 张铭：男，1963年生，博士生导师，特聘教授，研究方向为信号处理、语音增强、语音识别. 李晨：女，1980年生，博士，研究方向为信号处理、语音增强、语音识别、语音定位.

引用本文:

章雒霏,张铭,李晨. 一种新的语音和噪声活动检测算法及其在手机双麦克风消噪系统中的应用[J]. 电子与信息学报, 2016, 38(8): 2020-2026. ZHANG Luofei, ZHANG Ming, LI Chen. A New Voice and Noise Activity Detection Algorithm and Its Application to Dual Microphone Noise Suppression System for Handset. JEIT, 2016, 38(8): 2020-2026.

链接本文:

http://jeit.ie.ac.cn/CN/10.11999/JEIT151302 或 http://jeit.ie.ac.cn/CN/Y2016/V38/I8/2020

[1]	JEUB M, HERGLOTZ C, NELKE C M, et al. Noise reduction for dual-microphone mobile phones exploiting power level differences[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Kyoto, 2012: 1693-1696. doi: 10.1109/ICASSP.2012.6288223.
[2]	XU Y, DU J, and DAI L R. A Regression approach to speech enhancement based on deep neural networks[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2015, 23(1): 7-19. doi: 10.1109/TASLP.2014.2364452.
[3]	XU Y, DU J, and DAI L R. An experimental study on speech enhancement based on deep neural networks[J]. IEEE Signal Processing Letters, 2014, 21(1): 65-68. doi: 10.1109/LSP. 2013.2291240.
[4]	WANG Y X, NARAYANAN A, and WANG D L. On training targets for supervised speech separation[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2014, 22(12): 1849-1859. doi: 10.1109/TASLP.2014.2352935.
[5]	王明合, 张二华, 唐振明, 等. 基于Fisher 线性判别分析的语音信号端点检测方法[J]. 电子与信息学报, 2015, 37(6): 1343-1349. doi: 10.11999/JEIT141122.
	WANG Minghe, ZHANG Erhua, TANG Zhenmin, et al. Voice activity detection based on Fisher linear discriminant analysis[J]. Journal of Electronics & Information Technology, 2015, 37(6): 1343-1349. doi: 10.11999/JEIT141122.
[6]	郭海燕, 李枭雄, 李拟珺. 基于基频状态和帧间相关性的单通道语音分离算法[J]. 东南大学学报(自然科学版), 2014, 44(6): 1100-1104.
	GUO Haiyan, LI Xiaoxiong, and LI Nijun. Single-channel speech separation based on pitch state and interframe correlation[J]. Journal of Southeast University (Natural Science Edition), 2014, 44(6): 1100-1104.
[7]	NELKE C, BEAUGEANT C, and VARY P. Dual microphone noise PSD estimation for mobile phones in hands-free position exploiting the coherence and speech presence probability[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Vancouver, 2013: 7279-7283. doi: 10.1109/ ICASSP.2013.6639076.
[8]	YOUSEFIAN N, RAHMANI M, and AKBARI A. Power level difference as a criterion for speech enhancement[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Taipei, 2009: 4653-4656. doi: dx.doi.org/ 10.1109/ICASSP.2009.4960668.
[9]	YOUSEFIAN N, AKBARI A, and RAHMANI M. Using power level difference for near field dual-microphone speech enhancement[J]. Applied Acoustics, 2009, 70(11/12): 1412-1421.
[10]	FU Z H, FAN F, and HUANG J D. Dual-microphone noise reduction for mobile phone application[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Vancouver, 2013: 7239-7243. doi: 10.1109/ ICASSP.2013.6639068.
[11]	MEYER-BAESE U. Digital Signal Processing with Field Programmable Gate Arrays[M]. Third Edition, Berlin Heidelberg: Springer, 2007: 298-305.
[12]	RUBIO J E, ISHIZUKA K, SAWADA H, et al. Two- microphone voice activity detection based on the homogeneity of the direction of arrival estimates[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Honolulu, 2007: 385-388. doi: 10.1109/ICASSP. 2007.366930.
[13]	ZHAO H C, LI L G, and LI L H, et al. Dual-microphone adaptive noise canceller with a voice activity detector[C]. IEEE Region 10 Symposium, Kuala Lumpur, 2014: 551-554. doi: 10.1109/TENCONSpring.2014.6863095.
[14]	CHOI J H and CHANG J H. Dual-microphone voice activity detection technique based on two-step power level difference ratio[J] IEEE Transactions on Audio, Speech and Language Processing, 2014. 22(6): 1069-1081.
[15]	HU Y, and LOIZHOU P C. Evaluation of objective quality measures for speech enhancement[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(1): 229-238.