|
|
A WORD SEGMENTATION ALGORITHM FOR CHINESE LANGUAGE BASED ON N-GRAM MODELS AND MACHINE LEARNING |
Wu Yingliang①; Wei Gang②; Li Haizhou② |
①School of Business Administration South China Univ. of Tech., Guangzhou 510641 China;②Dept. of Electron and Info. Eng., Guangzhou 510641 China |
|
|
Abstract Automatic word segmentation for the Chinese language is a fundamental and difficult problem in the field of computer Chinese language information processing. This paper presents a new method for segmenting the input Chinese language text sentence into words, which consists of a character-based N-gram model and an efficient Viterbi search algorithm. In addition, two performance evaluation ration targets, i.e. Recall and Precision for word segmentation algorithm are discussed, The effectiveness has been confirmed by evaluation experiments
using the closed texts and open texts corpus.
|
Received: 29 September 1999
|
|
|
|
|
|
|
|