Abstract:Automatic word segmentation for the Chinese language is a fundamental and difficult problem in the field of computer Chinese language information processing. This paper presents a new method for segmenting the input Chinese language text sentence into words, which consists of a character-based N-gram model and an efficient Viterbi search algorithm. In addition, two performance evaluation ration targets, i.e. Recall and Precision for word segmentation algorithm are discussed, The effectiveness has been confirmed by evaluation experiments
using the closed texts and open texts corpus.
吴应良; 韦岗; 李海洲. 一种基于N-gram模型和机器学习的汉语分词算法[J]. 电子与信息学报, 2001, 23(11): 1148-1153 .
Wu Yingliang①; Wei Gang②; Li Haizhou②. A WORD SEGMENTATION ALGORITHM FOR CHINESE LANGUAGE BASED ON N-GRAM MODELS AND MACHINE LEARNING. , 2001, 23(11): 1148-1153 .