|
|
Chinese Word Sense Disambiguation Based on Bayesian Model Improved by Information Gain |
Fan Dong-mei① Lu Zhi-mao① Zhang Ru-bo① Pan Shu-shen② |
①(Harbin Engineering University, Harbin 150001, China) ②(Harbin Institute of Technology, Harbin 150001, China) |
|
|
Abstract Word Sense Disambiguation (WSD) is one of the key issues and difficulties in natural language processing. WSD is usually considered as an issue about pattern classification to study, which feature selection, is an important component. In this paper, according to Naïve Bayesian Model (NBM) assumption, a feature selection method based on information gain is proposed to improve NBM. Location information concealed in the context of ambiguous word is mined through information gain, to improve the knowledge acquisition efficiency of Bayesian model, thereby improving the word-sense classification. The eight ambiguous words are tested in the experiment. The experimental results show that improved Bayesian model is more correct than the NBM an average of 3.5 percentage points. The accuracy rise is bigger and the improvement effect is outstanding. These results prove also the method put forward in this paper is efficacious.
|
Received: 04 June 2007
|
|
Corresponding Authors:
Fan Dong-mei
|
|
|
|
|
|
|