|
|
Adapted Stopping Residue Error Based Sparse Representation for Speech Denoising |
ZHOU Weili HE Qianhua WANG Yalou PANG Wenfeng |
(School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510640, China) |
|
|
Abstract A sparse representation speech denoising method based on adapted stopping residue error is proposed. Firstly, an over complete dictionary of the clean speech power spectrum is learned by the K-Singular Value Decomposition (K-SVD) algorithm. In the sparse representation stage, the stopping residue error is adaptively achieved according to the estimated cross terms and the noise spectrum which is adjusted by a weighted factor, and the Orthogonal Matching Pursuit (OMP) approach is applied to reconstruct the clean speech spectrum from the noisy speech. Finally, the clean speech is re-synthesis via the inverse Fourier transform with the reconstructed speech spectrum and the noisy speech phase. The experiment results show that the proposed method outperforms the standard spectral subtraction, sparse representation based speech denoising algorithm and the AutoRegressive Hidden Markov Model (AR-HMM) based speech denoising method in terms of subjective and objective measure.
|
Received: 18 April 2016
Published: 21 October 2016
|
|
Fund: The National Natural Science Foundation of China (61571192), The Science and Technology Foundation of Guangdong Province (2015A010103003) |
Corresponding Authors:
HE Qianhua
E-mail: eeqhhe@scut.edu.cn
|
|
|
|
[1] |
BABY D, VIRTANEN T, GEMMEKE J F, et al. Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2015, 23(11): 1788-1799. doi: 10.1109/TASLP.2015.2450491.
|
[2] |
ZHOU W L and HE Q H. Non-intrusive speech quality objective evaluation in high-noise environments[C]. IEEE China Summit and International Conference on Signal and Information Processing, Chengdu, China, 2015: 50-54. doi: 10.1109/ChinaSIP.2015.7230360.
|
[3] |
KODRASI I, MARQUARDT D, and DOCLO S. Curvature-based optimization of the trade-off parameter in the speech distortion weighted multichannel wiener filter[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, South Brisbane, Australia, 2015: 315-319. doi: 10.1109/ICASSP.2015.7177982.
|
[4] |
MARTIN R. Noise power spectral density estimation based on optimal smoothing and minimum statistics[J]. IEEE Transactions on Speech and Language Processing, 2001, 9(5): 504-512. doi: 10.1109/89.928915.
|
[5] |
GERKMANN T. MMSE-optimal enhancement of complex speech coefficients with uncertain prior knowledge of the clean speech phase[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, 2014: 4478-4482. doi: 10.1109/ICASSP.2014.6854449.
|
[6] |
DAVID Y and KLEIJN W B. HMM-based gain modeling for enhancement of speech in noise[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(3): 882-892. 10.1109/TASL.2006.885256.
|
[7] |
EVANA N, MASON J, LIU W, et al. An assessment on the fundamental limitations of spectral subtraction[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Toulous, France, 2006: 145-148. doi: 10.1109/ ICASSP.2006.1659978.
|
[8] |
HILMAN F, KOJI I, and KOICHI S. Feature normalization based on non-extensive statistics for speech recognition[J]. Speech Communication, 2013, 55(5): 587-599. doi: 10.1016/ j.specom.2013.02.004.
|
[9] |
HSIEH C T, HUANG P Y, CHEN Y H, et al. Speech enhancement based on sparse representation under color noisy environment[C]. International Symposium on Intelligent Signal Processing and Communication Systems, Nusa Dua, Indonesia, 2015: 134-138. doi: 10.1109/ISPACS. 2015.7432752.
|
[10] |
孙林慧, 杨震. 基于数据驱动字典和稀疏表示的语音增强[J]. 信号处理, 2011, 27(12): 1793-1800.
|
|
SUN L H and YANG Z. Speech enhancement based on data·driven dictionary and sparse representation[J]. Signal Processing, 2011, 27(12): 1793-1800.
|
[11] |
ZHAO Y P, ZHAO X H, and WANG B. A speech enhancement method employing sparse representation of power spectral density[J]. Journal of Information and Computational Science, 2013, 10(6): 1705-1714.
|
[12] |
ZHAO N, XU X, and YANG Y. Sparse representations for speech enhancement[J]. Chinese Journal of Electronics, 2011, 19(2): 268-272.
|
[13] |
SIGG C D, DIKK T, and BUHMANN J M. Speech enhancement using generative dictionary learning[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(6): 1698-1712. doi: 10.1109/TASL.2012.2187194.
|
[14] |
ZHAO Y P and WANG B. A speech enhancement method based on sparse reconstruction of power spectral density [J]. Computers & Electrical Engineering, 2014, 40(4): 1705-1714. doi: 10.1016/j.compeleceng.2013.12.007.
|
[15] |
LOIZOU P C. Speech Enhancement: Theory and Practice [M]. Florida, US: CRC Press, 2013: 104-106.
|
[16] |
RANGACHARI S and LOIZOU P. A noise estimation algorithm for highly nonstationary environments[J]. Speech Communication, 2006, 48(2): 220-231. doi: 10.1016/ j.specom.2006.08.005.
|
[17] |
BEROUTI M, SCHWARTZ M, and MAKHOUL J. Enhancement of speech corrupted by acoustic noise[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Washington, US, 1979: 4478-4482. doi: 10.1109/ ICASSP.1979.1170788.
|
[18] |
CHANG L H and WU J Y. An improved RIP-based performance guarantee for sparse signal recovery via orthogonal matching pursuit[J]. IEEE Transactions on Information Theory, 2014, 60(9): 5702-5715. doi: 10.1109/ TIT.2014.2338314.
|
[19] |
AHARON M and ELAD M. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J]. IEEE Transactions on Signal Processing, 2006, 54(11): 4311-4322. doi: 10.1109/TSP.2006. Signal 881199.
|
[20] |
Ron R. K-SVD ToolBox[OL]. http://www.cs.technion.ac.il /~ronrubin/software.html, 2016.
|
[21] |
ITU-T. P.862-2001. Perceptual evaluation of speech quality (PESQ): An objective method for end to end speech quality assessment of narrow-band telephone networks and speech codecs[S]. Geneva, ITU-T, 2001.
|
|
|
|