基于Fisher线性判别分析的语音信号端点检测方法

doi:10.11999/JEIT141122

摘要
图/表
参考文献(16)
相关文章 (6)

全文: PDF (819 KB)
输出: BibTeX | EndNote (RIS)

摘要

传统的语音端点检测方法对辅音，特别是受到噪声污染的清音部分与背景噪声之间分离能力不足。针对上述问题，该文提出一种基于Fisher线性判别分析的梅尔频率倒谱系数(F-MFCC)端点检测方法。将清音信号和背景噪声视为两类分类问题，采用Fisher准则求解具有判别信息的最佳投影方向，使得投影后的特征参数具有最小类内散度和最大类间散度，从而增大清音与背景噪声的可分离性。在不同语音库上的实验结果表明，F-MFCC能够在不同信噪比和背景噪声条件下提高语音端点检测的准确率。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	王明合
	张二华
	唐振民
	许昊

关键词 ：语音处理, 语音端点检测, 梅尔频率倒谱系数, Fisher线性判别分析

Abstract：

Traditional Voice Activity Detection (VAD) approaches can not effectively detect consonant as well as noisy unvoiced consonant. To address this problem, this paper proposes a VAD approach Mel Frequency Cepstrum Coefficient (F-MFCC) based on Fisher linear discriminant analysis, in consideration of two-class issue regarding to consonant and background noise. Fisher criterion rule is used to solve the optimal projection vector, building upon which we can minimize the within-class scatter can be minimized and the between-class scatter can be maximized, as a result to enhance separability between consonant and background noise. Extensive experiments are conducted to evaluate the F-MFCC performance. The results demonstrate that, under different SNR and noise conditions, the proposed approach achieves higher VAD accuracy.

Key words： Speech processing Voice Activity Detection (VAD) Mel Frequency Cepstrum Coefficient (MFCC) Fisher linear discriminant analysis

收稿日期: 2014-08-29

PACS:

TN912.34

通讯作者: 张二华 speechstudio@163.com E-mail: speechstudio@163.com

作者简介: 王明合：男，1970年生，博士生，研究方向为信号处理、语音识别、说话人识别. 张二华：男，1967年生，副教授，主要研究方向为信号处理、语音识别、3维数据可视化方面. 唐振民：男，1961年生，博士生导师，教授，主要研究方向为语音识别、图像处理、智能机器人.

引用本文:

王明合,张二华, 唐振民, 许昊. 基于Fisher线性判别分析的语音信号端点检测方法[J]. 电子与信息学报, 2015, 37(6): 1343-1349. Wang Ming-he,Zhang Er-hua,Tang Zhen-min, Xu Hao. Voice Activity Detection Based on Fisher Linear Discriminant Analysis. JEIT, 2015, 37(6): 1343-1349.

链接本文:

http://jeit.ie.ac.cn/CN/10.11999/JEIT141122 或 http://jeit.ie.ac.cn/CN/Y2015/V37/I6/1343

[1]	Junqua J C. Robustness and cooperative multi-model man-machine communication applications[C]. The Structure of Multimodal Dialogue, Maratea, Italy, 1991: 101-112.
[2]	ETSI. Universal Mobile Telecommunication Systems (UMTS); Mandatory Speech Codec speech processing functions, AMR speech codec; Voice Activity Detector VAD[S]. ETSI TS 126 094 v11.0.0(2012-10): 1-26.
[3]	Wan Yu-long, Wang Xian-liang, Zhou Ruo-hua, et al.. Enhanced voice activity detection based on automatic segmentation and event classification[J]. Journal of Computational Information Systems, 2014, 10(10): 4169-4177.
[4]	宫朝辉, 刁麓弘. 改进共振峰提取的语音端点检测[J]. 计算机辅助设计与图形学学报, 2013, 25(8): 1230-1236.
	Gong Zhao-hui and Diao Lu-hong. Improved speech endpoint detection based on formant[J]. Journal of Computer Aided Design & Computer Graphics, 2013, 25(8): 1230-1236.
[5]	李晔, 张仁志, 崔慧娟, 等. 低信噪比下基于谱熵的语音端点检测算法[J]. 清华大学学报(自然科学版), 2005, 45(10): 1397-1440.
	Li Ye, Zhang Ren-zhi, Cui Hui-juan, et al.. Voice activity detection algorithm with low signal-to-noise ratios based on the spectrum entropy[J]. Journal of Tsinghua University (Science and Technology), 2005, 45(10): 1397-1440.
[6]	Chen Shi-huang and Wang Jhing-fa. A wavelet-based voice activity detection algorithm in noisy environments[C]. Proceedings of the 9th IEEE International Conference on Electmnics, Circuits and Systems, Dubrovnik, Croatia, 2002: 995-998.
[7]	Ghosh P K, Tsiartas A, and Narayanan S. Robust voice activity detection using long-term signal variability[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(3): 600-613.
[8]	王宏志, 徐玉超, 李美静. 基于Mel频率倒谱参数相似度的语音端点检测算法[J]. 吉林大学学报(工学版), 2012, 42(5): 1331-1335.
	Wang Hong-zhi, Xu Yu-chao, and Li Mei-jing. Voice activity detection algorithm based on Mel-frequency cepstrum coefficient (MFCC) similarity[J]. Journal of Jilin University (Engineering and Technology Edition), 2012, 42(5): 1331-1335.
[9]	Oh Sang-yeob and Chung Kyung-yong. Improvement of speech detection using ERB feature extraction[J]. Wireless Personal Communications, 2014, 79(4): 2439-2451.
[10]	卢志茂, 金辉, 张春祥, 等. 基于HHT和OSF的复杂环境语音端点检测[J]. 电子与信息学报, 2012, 34(1): 213-217.
	Lu Zhi-mao, Jin Hui, Zhang Chun-xiang, et al.. Voice activity detection in complex environment based on Hilbert-Huang transform and order statistics filter[J]. Journal of Electronics & Information Technology, 2012, 34(1): 213-217.
[11]	Deng Shi-wen and Han Ji-qing. Statistical voice activity detection based on sparse representation over learned dictionary[J]. Digital Signal Processing, 2013, 23(4): 1228-1232.
[12]	Zhang Yan, Tang Zhen-min, Li Yan-ping, et al.. A hierarchical framework approach for voice activity detection and speech enhancement[J]. The Scientific World Journal, 2014, Vol. 2014: Article ID 723643, 8 pages.
[13]	Choi Jae-hun and Chang Joon-hyuk. Dual-microphone voice activity detection technique based on two-step power level difference ratio[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2014, 22(6): 1069-1081.
[14]	Ryant N, Liberman M, and Yuan Jia-hong. Speech activity detection on YouTube using deep neural networks[C]. Interspeech: 14th Annual Conference of the International Speech Communication Association, Lyon, France, 2013: 728-731.
[15]	Fisher R A. The use of multiple measures in taxonomic problems[J]. Annals of Eugenics, 1936, 7(2): 179-188.
[16]	Mak M W and Yu H B. A study of voice activity detection techniques for NIST speaker recognition evaluations[J]. Computer Speech & Language, 2014, 28(1): 295-313.