|
|
A Context Tree Kernel Based on Latent Semantic Topic |
Xu Chao Zhou Yi-min Shen Lei |
School of Computer, Beihang University, Beijing 100191, China |
|
|
Abstract The lack of semantic information is a critical problem of context tree kernel in text representation. A context tree kernel method based on latent topics is proposed. First, words are mapped to latent topic space through Latent Dirichlet Allocation(LDA). Then, context tree models are built using latent topics. Finally, context tree kernel for text is defined through mutual information between the models. In this approach, document generative models are defined using semantic class instead of words, and the issue of statistic data sparse is solved. The clustering experiment results on text data set show, the proposed context tree kernel is a better measure of topic similarity between documents, and the performance of text clustering is greatly improved.
|
Received: 20 November 2009
|
|
Corresponding Authors:
Xu Chao
E-mail: chaoxu@263.net
|
|
|
|
|
|
|