基于多流三音素DBN模型的音视频语音识别和音素切分

doi:10.3724/SP.J.1146.2007.01216

Abstract
Figure/Table
References
Related Citation (15)

Download: PDF (289 KB)
Export: BibTeX | EndNote (RIS)

Abstract In this paper, a novel Multi-stream Multi-states Asynchronous Dynamic Bayesian Network based context-dependent TRIphone (MM-ADBN-TRI) model is proposed for audio-visual speech recognition and phone segmentation. The model looses the asynchrony of audio and visual stream to the word level. Both in audio stream and in visual stream, word-triphone-state topology structure is used. Essentially, MM-ADBN-TRI model is a triphone model whose recognition basic units are triphones, which captures the variations in real continuous speech spectra more accurately. Recognition and segmentation experiments are done on continuous digit audio-visual speech database, and results show that: MM-ADBN-TRI model obtains the best overall performance in word accuracy and phone segmentation results with time boundaries, and more reasonable asynchrony between audio and visual speech.

Key words： Speech recognition Dynamic Bayesian network Phone segmentation Audio-visual

Received: 23 July 2007

PACS:

TP391.42

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors






	Lü Guo-yun
	Jiang Dong-mei
	Fan Yang-yu
	Zhao Rong-chun
	H. Sahli
	W. Verhelst

Cite this article:

Lü Guo-yun,Jiang Dong-mei,Fan Yang-yu等. DBN Model Based Multi-stream Asynchrony Triphone for Audio-Visual Speech Recognition and Phone Segmentation[J]. , 2009, 31(2): 297-301 .

URL:

http://jeit.ie.ac.cn/EN/10.3724/SP.J.1146.2007.01216 OR http://jeit.ie.ac.cn/EN/Y2009/V31/I2/297