|
|
Parsing Chinese Based on Lexicalized Model |
Cao Hai-long; Zhao Tie-jun; Li Sheng |
MOE-MS Key Lab. of Natural Language Processing and Speech, Harbin Institute of Technology, Harbin 150001, China |
|
|
Abstract In order to process large-scale real text, a method of building Chinese parser based on lexicalized model is proposed. First, a unified approach for segmentation and part of speech tagging is proposed based on hidden Markov model. The method not only conservers the merits of HMM which is simple and efficient but also improves the tagging accuracy. Then the head-driven model is used to recognize phrases. Head-driven model is a well-known English parsing model; we combine it with segmentation and POS tagging model and thus build a Chinese parser that can operate at the character level. The parser is evaluated on the standard test set. It achieves 77.57% precision and 74.96% recall and outperforms the only previous comparable work significantly.
|
Received: 23 January 2006
|
|
|
|
|
|
|
|