基于幅度压缩滤波的清浊音分类及基音估计

doi:10.11999/JEIT150778

摘要
图/表
参考文献(20)
相关文章 (15)

全文: PDF (2331 KB)
输出: BibTeX | EndNote (RIS)

摘要

该文针对传统算法在实环境(不同噪声类型和信噪比)下容易发生清浊误判和基音估计错误问题，提出一种基于幅度压缩基音估计滤波(PEFAC)的清浊音分类及基音估计方法。首先，通过PEFAC削弱语音的低频噪声，提取出基音谐波；然后，采用基于对称平均幅度和函数的脉冲序列加权算法(SIM)确定谐波数目；最后，利用动态规划估计出基音，用基于3元素特征矢量的高斯混合模型对清浊音进行分类。仿真结果表明，在实环境下，所提方法能有效抑制清浊误判及基音估计错误现象的发生，性能优于传统方法。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	徐静云
	赵晓群
	王峤
	王缔罡

关键词 ：语音信号处理, 基音, 幅度压缩基音估计滤波, 对称平均幅度和函数, 高斯混合模型, 噪声语音

Abstract：

A method of voiced/unvoiced classification and pitch estimation based on Pitch Estimation Filter with Amplitude Compression (PEFAC) is proposed in this paper. The method first attenuates strong noise components at the?low frequencies based on PEFAC and extracts pitch harmonic from noisy speech in the log-frequency domain. Then, the harmonic number associated with the pitch harmonic is determined by Symmetric average magnitude sum function weighted Impulse-train Matching (SIM) scheme in time domain. A pitch tracking scheme using dynamic programming is applied to select the pitch candidates and a voiced speech probability is computed from the likelihood ratio of Gaussian Mixture Models (GMMs) classifiers based on 3-element feature vector. The simulated results show that the proposed method efficiently reduces voiced/unvoiced and pitch estimation error, and it is superior to some of the state-of-the–art method in the real environment.

Key words： Speech signal processing Pitch Pitch Estimation Filter with Amplitude Compression (PEFAC) Symmetric average magnitude sum function Gaussian Mixture Model (GMM) Noise speech

收稿日期: 2015-06-29 出版日期: 2016-02-03

PACS:

TN912.3

基金资助:

国家自然科学基金(61271248)，湖州市自然科学基金(2015YZ04)

通讯作者: 赵晓群：男，1962年生，博士生导师，研究方向为通信与信息理论. E-mail: zhao_xiaoqun@tongji.edu.cn

作者简介: 徐静云：男，1980年生，博士生，研究方向为语音信号处理与语音编码. 赵晓群：男，1962年生，博士生导师，研究方向为通信与信息理论. 王峤：女，1990年生，硕士生，研究方向为语音编码. 王缔罡：男，1988年生，博士生，研究方向为通用压缩文件的容错译码.

引用本文:

徐静云,赵晓群,王峤,王缔罡. 基于幅度压缩滤波的清浊音分类及基音估计[J]. 电子与信息学报, 2016, 38(3): 586-593. XU Jingyun, ZHAO Xiaoqun, WANG Qiao, WANG Digang. Voiced/Unvoiced Classification and Pitch Estimation Based on Amplitude Compression Filter. JEIT, 2016, 38(3): 586-593.

链接本文:

http://jeit.ie.ac.cn/CN/10.11999/JEIT150778 或 http://jeit.ie.ac.cn/CN/Y2016/V38/I3/586

[1]	RABINER L, CHENG M, ROSENBERG A E, et al. A comparative performance study of several pitch detection algorithms[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1976, 24(5): 399-418.
[2]	VEPREK P and SCORDILIS M S. Analysis, enhancement and evaluation of five pitch determination techniques[J]. Speech Communication, 2002, 37(3): 249-270.
[3]	HAN Kun and Wang Deliang. Neural network based pitch tracking in very noisy speech[J]. IEEE/ACM Transactions on Audio, speech, and Language Processing, 2014, 22(12): 2158-2168.
[4]	MOLINA E, TARDON L J, BARBANCHO A M, et al. SiPTH: Singing transcription based on hysteresis defined on the pitch-time curve[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(2): 252-263.
[5]	DUAN Zhiyao, HAN Jinyu, and PARDO B. Multi-pitch streaming of harmonic sound mixtures[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(1): 138-150.
[6]	CHEN Yujui, WEI Chengwen, CHIANG Yifan, et al. Neuromorphic pitch based noise reduction for monosyllable hearing aid system application[J]. IEEE Transactions on Circuits and Systems, 2014, 61(2): 463-475.
[7]	王玥, 钱志鸿, 张营. 基于扩展谱相减的RCAF基音周期检测算法[J]. 电子与信息学报, 2009, 31(5): 1161-1165.
	WANG Yue, QIAN Zhihong, and ZHANG Ying. RCAF pitch detection algorithm based on expanded spectral subtraction [J]. Journal of Electronics & Information Technology, 2009, 31(5): 1161-1165.
[8]	SHIMAMURA T and KOBAYASHI H. Weighted autocorrelation for pitch extraction of noisy speech[J]. IEEE Transactions on Speech and Audio Processing, 2001, 9(7): 727-730.
[9]	徐敬德, 常亮, 崔慧娟, 等. 基于频域和时域结合的基音周期提取算法[J]. 清华大学学报, 2012, 52(3): 413-415.
	XU Jingde, CHANG Liang, CUI Huijuan, et al. A pitch period detection algorithm using time and frequency analyses[J]. Journal of Tsinghua University, 2012, 52(3): 413-415.
[10]	SHAHNAZ C, ZHU W P, and AHMAD M O. Pitch estimation based on a harmonic sinusoidal autocorrelation model and a time-domain matching scheme[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1): 322-335.
[11]	HUANG F and LEE T. Pitch estimation in noisy speech using accumulated peak spectrum and sparse estimation technique[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(1): 99-109.
[12]	GONZALEZ S and BROOKES M. PEFACA pitch estimation algorithm robust to high levels of noise[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2014, 22(2): 518-530.
[13]	BYRNE D, DILLON H, TRAN K, et al. An international comparison of long term average speech spectra[J]. The Journal of the Acoustical Society of America, 1994, 96(4): 2108-2120.
[14]	BROOKES M. VOICEBOX: A speech processing toolbox
	for MATLAB[OL]. http://www.ee.ic.ac.uk/hp/staff/dmb/ voicebox/voicebox.html. 2015.1.
[15]	PLANTE F, MEYER G F, and AINSWORTH W A. A pitch extraction reference database[C]. 4th European Conference on Speech Communication and Technology, Madrid, 1995: 837-840.
[16]	STEENEKEN H J and GEURTSEN F W. Description of the RSG-10 noise database[R]. Report IZF 1988-3 TNO, Soesterberg: Institute for Perception, 1988.
[17]	International Telecommunication Union-TP.56. Objective measurement of active speech level[S]. Geneva, 1993.
[18]	张文耀, 许刚, 王裕国. 循环AMDF及其语音基音周期估计算法[J]. 电子学报, 2003, 31(6): 886-890.
	ZHANG Wenyao, XU Gang, and WANG Yuguo. Circular AMDF and pitch estimation based on it[J]. Acta Electronica Sinica, 2003, 31(6): 886-890.
[19]	韩明, 刘教民, 孟军英, 等. 一种自适应调整的混合高斯背景建模和目标检测算法[J]. 电子与信息学报, 2014, 36(8): 2023-2027. doi: 10.3724/SP.J.1146.2013.01438.
	HAN Ming, LIU Jiaomin, MENG Junying, et al. A modeling and target detection algorithm based on adaptive adjustment??for mixture Gaussian background[J]. Journal of Electronics & Information Technology, 2014, 36(8): 2023-2027. doi: 10.3724/SP.J.1146.2013.01438.
[20]	TALKIN D. Speech Coding and Synthesis[M]. Elsevier Science, 1995, Chapter.14: 495-518.