基于受限玻尔兹曼机的语音带宽扩展

doi:10.11999/JEIT151034

Abstract
Figure/Table
References (23)
Related Citation (15)

Download: PDF (2550 KB)
Export: BibTeX | EndNote (RIS)

Abstract

Speech Bandwidth Extension (BWE) is a technique that attempts to improve the speech quality by recovering the missing High Frequency (HF) components using the correlation that exists between the Low Frequency (LF) and HF parts of the wide-band speech signal. The Gaussian Mixture Model (GMM) based methods are widely used, but it recovers the missing HF components on the assumption that the LF and HF parts obey a Gaussian distribution and gives their linear relationship, leading to the distortion of reconstructed speech. This Study proposes a new speech BWE method, which uses two Gaussian-Bernoulli Restricted Boltzmann Machines (GBRBMs) to extract the high-order statistical characteristics of spectral envelopes of the LF and HF respectively. Then, high-order features of the LF are mapped to those of the HF using a Feedforward Neural Network (FNN). The proposed method learns deep relationship between the spectral envelopes of LF and HF and can model the distribution of spectral envelopes more precisely by extracting the high-order statistical characteristics of the LF components and the HF components. The objective and subjective test results show that the proposed method outperforms the conventional GMM based method.

Key words： Speech bandwidth extension Restricted Boltzmann machines Feedforward Neural Networks (FNN) Gaussian mixture model

Received: 14 September 2015 Published: 14 April 2016

PACS:

TN912.3

Corresponding Authors: ZHAO Shenghui E-mail: shzhao@bit.edu.cn

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	WANG Yingxue
	ZHAO Shenghui
	YU Yingying
	KUANG Jingming

Cite this article:

WANG Yingxue,ZHAO Shenghui,YU Yingying等. Speech Bandwidth Extension Based on Restricted Boltzmann Machines[J]. JEIT, 2016, 38(7): 1717-1723.

URL:

http://jeit.ie.ac.cn/EN/10.11999/JEIT151034 OR http://jeit.ie.ac.cn/EN/Y2016/V38/I7/1717

[1]	BAUER P, ABEL J, FISCHER V, et al. Automatic recognition of wideband telephone speech with limited amount of matched training data[C]. Proceedings of the 22nd European Signal Processing Conference (EUSIPCO), Lisbon, Portugal, 2013: 1232-1236.
[2]	GANDHIMATHI G and JAYAKUMAR S. Speech enhancement using an artificial bandwidth extension algorithm in multicast conferencing through cloud services[J]. Information Technology Journal, 2014, 13(12): 1953-1960. doi: 10.3923/itj.2014.1953.1960.
[3]	YOSHIDA Y and ABE M. An algorithm to reconstruct wideband speech from narrowband speech based on codebook mapping[C]. Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, 1994: 1591-1594.
[4]	WANG Yingxue, ZHAO Shenghui, et al. Superwideband extension for AMR-WB using conditional codebooks[C]. Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Florence, Italy, 2014: 3695-3698.
[5]	NAKATOH Yoshihisa, TSUSHIMA Mineo, NORIMATSU Takeshi, et al. Generation of broadband speech from narrowband speech using on linear mapping[J]. Electronics and Communications in Japan, Part 2 (Electronics), 2002, 85(8): 44-53. doi: 10.1002/ecjb.10065.
[6]	DUY N D, SUZUKI M, MINEMSTSU N, et al. Artificial bandwidth extension based on regularized piecewise linear mapping with discriminative region weighting and long-Span features[C]. INTERSPEECH, Lyon, France, 2013: 3453-3457.
[7]	PARK K Y and KIM H S. Narrowband to wideband conversion of speech using GMM based transformation[C]. Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Istanbul, Turkey, 2000: 1843-1846.
[8]	PULAKKA H, REMES U, PALOMAKI K, et al. Speech bandwidth extension using gaussian mixture model-based estimation of the highband Mel spectrum[C]. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011: 5100-5103.
[9]	JAX P and VARY P. Artificial bandwidth extension of speech signals using mmse estimation based on a hidden markov model[C]. Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Hong Kong, 2003: 680-683.
[10]	BAUER P, ABEL J, et al. HMM-based artificial bandwidth extension supported by neural networks[C]. 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC), Juan-les-Pins, France, 2014: 1-5.
[11]	LIU Haojie, BAO Changchun, and LIU Xin. Spectral envelope estimation used for audio bandwidth extension based on RBF neural network[C]. Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Vancouver, Canada, 2013: 543-547.
[12]	LI K and LEE C H. A deep neural network approach to speech bandwidth expansion[C]. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015: 4395-4399.
[13]	SEO H, KANG H G, and SOONG F. A maximum a Posterior-based reconstruction approach to speech bandwidth expansion in noise[C]. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 2014: 6087-6091.
[14]	LIU Xin and BAO Changchun. Audio bandwidth extension based on temporal smoothing cepstral coefficients[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2014, 2014(1): 1-16.
[15]	OHTANI Y, AMURA M, ORITA M, et al. GMM-based bandwidth extension using sub-band basis spectrum model[C]. Fifteenth Annual Conference of the International Speech Communication Association, Singapore, 2014: 2489-2493.
[16]	ACKLEY D H, HINTON G E, et al. A learning algorithm for Boltzmann machines[J]. Cognitive Science, 1985, 9(1): 147-169. doi: 10.1207/s15516709cog0901_7.
[17]	MOHAME A, DAHL G E, and HINTON G E. Acoustic modeling using deep belief networks[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1): 14-22.
[18]	HINTON G E. Training products of experts by minimizing contrastive divergence[J]. Neural Computation, 2002, 14(8): 1771-1800.
[19]	HINTON G E and SALAKHUTDINOV R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507.
[20]	NTT Advanced Technology Corporation. Multi-lingual speech database for telephonometry[OL]. http://www.nttat.
	com/products e/speech, 1994.
[21]	MAKINEN J, BESSETTE B, BRUHN S, et al. AMR-WB+: A new audio coding standard for 3rd generation mobile audio services[C]. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Pennsylvania, USA, 2005: 1109-1112.
[22]	张勇, 胡瑞敏. 基于高斯混合模型的语音带宽扩展算法的研究[J]. 声学学报, 2009, 34(5): 471-480.
	ZHANG Yong and HU Ruimin. Speech bandwidth extension based on Gaussian mixture model[J]. Acta Acustica, 2009, 34(5): 471-480.
[23]	NOUR-ELDIN AMR H and KABAL P. Mel-frequency cepstral coefficient-based bandwidth extension of narrowband speech[C]. INTERSPEECH, Brisbane, Australia, 2008: 53-56.