基于Viseme的连续语音识别系统及Talking Head

Abstract
Figure/Table
References
Related Citation (1)

Download: PDF (1471 KB)
Export: BibTeX | EndNote (RIS)

Abstract A continuous speech recognition system for a talking head is presented in this paper, which is based on the viseme (the basic speech unit in visual domain) HMMs and segments speech to mouth shape sequences with timing boundaries. The trisemes are for malized to consider the viseme contexts. Based on the 3D talking head images, the viseme similarity weight (VSW) is denned, and 166 visual questions are designed for the building of the triseme decision trees to tie the states of the trisemes with similar contexts, so that they can share the same parameters. For the system evaluation, besides the recognition rate, an image related measurement, the ’viseme similarity weighted accuracy’ accounts for the mismatches of the recognized viseme sequence with its reference, and ’jerky points’ in liprounding and VSW graphs help evaluate the smoothness of the resulting viseme image sequences. Results show that the viseme based speech recognition system gives smoother and more plausible mouth shapes.

Key words： Talking head Viseme Triseme decision trees Viseme similarity weighted accuracy Liprounding and VSW graphs

Received: 25 July 2002

PACS:

TP391.42

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors






	Jiang Dong-mei
	Xie Lei
	Ilse Ravyse
	Zhao Rong-chun
	Hichem Sahli
	Jan Cornelis

Cite this article:

Jiang Dong-mei,Xie Lei,Ilse Ravyse等. The Viseme Based Continuous Speech Recognition System for a Talking Head[J]. , 2004, 26(3): 375-381 .

URL:

http://jeit.ie.ac.cn/EN/ OR http://jeit.ie.ac.cn/EN/Y2004/V26/I3/375