|
|
The Viseme Based Continuous Speech Recognition System for a Talking Head |
Jiang Dong-mei①; Xie Lei①; Ilse Ravyse②; Zhao Rong-chun①; Hichem Sahli②; Jan Cornelis② |
①Dept Computer Sci. & Eng., Northwestern Polytechnical Univ.,Xi’an 710072 China;②Dept ETRO Free University Brussels Pleinlaan 2 B-1050 Brussels Belgium |
|
|
Abstract A continuous speech recognition system for a talking head is presented in this paper, which is based on the viseme (the basic speech unit in visual domain) HMMs and segments speech to mouth shape sequences with timing boundaries. The trisemes are for malized to consider the viseme contexts. Based on the 3D talking head images, the viseme similarity weight (VSW) is denned, and 166 visual questions are designed for the building of the triseme decision trees to tie the states of the trisemes with similar contexts, so that they can share the same parameters. For the system evaluation, besides the recognition rate, an image related measurement, the ’viseme similarity weighted accuracy’ accounts for the mismatches of the recognized viseme sequence with its reference, and ’jerky points’ in liprounding and VSW graphs help evaluate the smoothness of the resulting viseme image sequences. Results show that the viseme based speech recognition system gives smoother and more plausible mouth shapes.
|
Received: 25 July 2002
|
|
|
|
|
|
|
|