Feature Selection Method Based on Category Correlation and Discernible Sets

来源 :2014全国理论计算机科学学术年会 | 被引量 : 0次 | 上传用户:xlweb
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Feature selection method is very important for text categorization.In this paper,several classic feature selection methods are analyzed and their defficiencies are summarized firstly,and then a new feature selection method based on Category Correlation and Identification Set is presented.To implement the new presented selection method,a category correlation method combing document-frequency and word-frequency is proposed to filter out noise words and refine the feature space,and a attribute reduction algorithm based on discernible sets is applied to eliminate redundancies.By comparing the new presented selection method with classic feature selection methods in experimental results,it is found out that the presented feature selection method can obtain more representative feature subsets.
其他文献
Motivated by a previous work showing a new NP-complete decision problem,the Multistage graph Simple Path problem (MSP) possesses a novel polynomial-time heuristic algorithm,which has undergone extensi
Spatio-Temporal properties are the intrinsic properties of Cyber-Physical System(CPS),the correlation in space and time between computing and physical entities should be fully considered in CPS modeli
With the rapid development of cloud computing,how to effectively improve the utilization of computing resources in a cloud becomes a difficult problem.Therefore,a large number of distributed and paral
Considering poor quality and high noise in medical data,we propose an ensemble classifier based on support vector machine (SVM) and apply this classifier to health identification.Health identification
An L(2,1)-labeling of a graph G is an assignment of nonnegative integers to the vertices of G such that adjacent vertices get numbers at least two apart,and vertices at distance two get distinct numbe
We investigate the unbalanced cut problems.A cut (A;B) is called unbalanced if the size of its smaller side is at most k (called k-size) or exactly k (called Ek-size),where k is an input parameter.An
会议
To solve the security problems in single-cloud storage,multi-cloud storage system has been put forward in some literatures.However,when using multi-cloud,a user needs to be authenticated by different
It is crucial for synonym substitution-based steganographic algorithms to choose a suitable coding strategy for high embedding capacity.In this paper,three existing coding strategies are discussed in
Three-dimensional image registration of CT and MRI of human knees facilitates the construction of high-quality models providing the features of both modalities.This multi-modal image registration prob
K-anonymity privacy model is a typical model to protect privacy when disseminating data involving individual subjects.The drawback of k-anonymity is that generalization will result in considerable los