论文部分内容阅读
不平衡数据分类是当前机器学习的研究热点,传统分类算法通常基于数据集平衡状态的前提,不能直接应用于不平衡数据的分类学习。针对不平衡数据分类问题,文章提出一种基于特征选择的改进不平衡分类提升算法,从数据集的不同类型属性来权衡对少数类样本的重要性,筛选出对有效预测分类出少数类样本更意义的属性,同时也起到了约减数据维度的目的。然后结合不平衡分类算法使数据达到平衡状态,最后针对原始算法错分样本权值增长过快问题提出新的改进方案,有效抑制权值的增长速度。实验结果表明,该算法能有效提高不平衡数据的分类性能,尤其是少数类的分类性能。
The unbalanced data classification is a hot research topic in machine learning at present. The traditional classification algorithm is usually based on the premise of the equilibrium state of data sets and can not be directly applied to the classification of unbalanced data. In order to solve the problem of unbalanced data classification, this paper proposes an improved unbalanced classification enhancement algorithm based on feature selection, which weighs the importance of minority samples from different types of data sets, Meaning of the attributes, but also played a reduction of data dimension purposes. Then the unbalanced classification algorithm is used to balance the data. Finally, a new improved scheme is proposed to solve the problem of excessive increase of the weight of the misclassified samples, which can effectively restrain the growth rate of weights. Experimental results show that the proposed algorithm can effectively improve the classification performance of unbalanced data, especially the classification performance of a few classes.