电话语音识别中基于统计模型的动态通道

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (1641 KB)
输出: BibTeX | EndNote (RIS) 背景资料

摘要与桌面环境相比，电话网络环境下的语音识别率仍然还比较低，为了推动电话语音识别在实际中的应用，提高其识别率成了当务之急．先前的研究表明，电话语音识别率明显下降通常是因为测试和训练环境的电话通道不同引起数据失配造成的，因此该文提出基于统计模型的动态通道补偿算法（SMDC)减少它们之间的差异，采用贝叶斯估计算法动态地跟踪电话通道的时变特性．实验结果表明，大词汇量连续语音识别的字误识率(CER)相对降低约27％，孤立词的词误识率(WER)相对降低约30％．同时，算法的结构时延和计算复杂度也比较小．平均时延约200ms．可以很好地嵌入到实际电话语音识别应用中．

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	韩兆兵
	张化云
	张树武
	徐波

关键词 ：电话语音识别, 动态通道补偿, 最大似然估计, 最大后验估计

Abstract：Automatic speech recognition in telecommunications environment still has a lower correct rate compared to its desktop pairs. Improving the performance of telephone-quality speech recognition is an urgent problem for its application in those practical fields. Previous works have shown that the main reason for this performance degradation is the varational mismatch caused by different telephone channels between the testing and train-ing sets. In this paper, they propose an efficient implementation to dynamically compen-sate this mismatch based on a phone-conditioned prior statistic model for the channel bias. This algorithm uses Bayes’ rule to estimate telephone channels and dynamically follows the time-variations within the channels. In their experiments on mandarin Large Vocabulary Continuous Speech Recognition (LVCSR) over telephone lines, the average Character Error Rate (CER) decreases more than 27% when applying this algorithm; in short utterance test, the Vord-Error-Rate(VER) relatively reduced 30%. At the same time, the structural delay and computational consumptions required by this algorithm are limited. The average delay is about 200 ins. So it could be embedded into practical telephone-based applications.

Key words： Telephone speech recognition Dynamic channel compensation Maximum-Likelihood（ML）estimation Maximum A Posteriori（MAP）estimation

收稿日期: 2003-06-12

PACS:

TP391.42

引用本文:

韩兆兵;张化云;张树武;徐波. 电话语音识别中基于统计模型的动态通道[J]. 电子与信息学报, 2004, 26(11): 1714-1720 . Han Zhao-bing; Zhang Hua-yun; Zhang Shu-wu; Xu Bo. Dynamic Channel Compensation Based on Statistical Model for Mandarin Speech Recognition over Telephone. , 2004, 26(11): 1714-1720 .

链接本文:

http://jeit.ie.ac.cn/CN/ 或 http://jeit.ie.ac.cn/CN/Y2004/V26/I11/1714