基于Rough集约简算法的中文文本自动分类系统

Abstract
Figure/Table
References
Related Citation (3)

Download: PDF (957 KB)
Export: BibTeX | EndNote (RIS)

Abstract Much of the previous automatic Text Classification (TC) methods are closely connected with the construction of document vectors. With each term corresponding to a unit in the vector, this method maps the document vectors into a very high dimensional space, possibly of tens of thousands of dimension, which results in a massive amount of calculation. Since the traditional algorithms based on frequency and threshold filtering may often lead to the loss of effective information, this paper presents a new system for TC, which introduces rough set theory that can greatly reduce the document vector dimensions by reduction algorithm. The empirical results prove to be very successful, for it can not only effectively reduce the dimensional space, but also reach higher accuracy while losing less information compared with usual reduction methods.

Key words： Automatic classification Rough set Decision table Reduction algorithm

Received: 19 February 2004

PACS:

TP391

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors


	Sheng Xiao-wei
	Jiang Ming-hu

Cite this article:

Sheng Xiao-wei,Jiang Ming-hu. Automatic Classification of Chinese Documents Based on Rough Set and Improved Quick-Reduce Algorithm[J]. , 2005, 27(7): 1047-1052 .

URL:

http://jeit.ie.ac.cn/EN/ OR http://jeit.ie.ac.cn/EN/Y2005/V27/I7/1047