论文部分内容阅读
针对文档分类过程中不同大小文档表示、文档特征选择和文档特征编码问题,提出了一种基于粗糙集的角分类神经网络Rough-CC4.利用近义词构成等价类,以此表示文档,可以缩小文档表示的维数、解决由于文档不同大小导致的精度问题、模糊近义词之间的差别;利用二进制编码方法对文档特征编码,可以提高Rough-CC4的精度,同时减小Rough-CC4的空间复杂度.Rough-CC4可以广泛用于大量文档集合的自动分类.
Aiming at the problems of document representation, document feature selection and document feature coding in document classification, an Rough-CC4 corner classification neural network based on rough set is proposed. By using synonyms to construct equivalence classes and expressing documents, the document can be reduced Which can solve the problem of accuracy caused by different size of documents and obscure the differences between synonyms. Using binary coding method to document features can improve the precision of Rough-CC4 and reduce the space complexity of Rough-CC4. Rough-CC4 can be widely used for automatic classification of a large number of document sets.