论文部分内容阅读
Feature selection method is very important for text categorization.In this paper,several classic feature selection methods are analyzed and their defficiencies are summarized firstly,and then a new feature selection method based on Category Correlation and Identification Set is presented.To implement the new presented selection method,a category correlation method combing document-frequency and word-frequency is proposed to filter out noise words and refine the feature space,and a attribute reduction algorithm based on discernible sets is applied to eliminate redundancies.By comparing the new presented selection method with classic feature selection methods in experimental results,it is found out that the presented feature selection method can obtain more representative feature subsets.