论文部分内容阅读
传统的数据分类算法多是基于平衡的数据集创建,对不平衡数据分类时性能下降,而实践表明组合选择能有效提高算法在不平衡数据集上的分类性能。为此,从组合选择的角度考虑不平衡类学习问题,提出一种新的组合剪枝方法,用于提升组合分类器在不平衡数据上的分类性能。使用Bagging建立分类器库,直接用正类(少数类)实例作为剪枝集,并通过MBM指标和剪枝集,从分类器库中选择一个最优或次优子组合分类器作为目标分类器,用于预测待分类实例。在12个UCI数据集上的实验结果表明,与EasyEnsemble、Bagging和C4.5算法相比,该方法不但能大幅提升组合分类器在正类上的召回率,而且还能提升总体准确率。
Traditional data classification algorithms are mostly based on balanced data set creation, which reduces the performance of unbalanced data classification. Practice shows that combination selection can effectively improve the classification performance of the algorithm on unbalanced data sets. Therefore, considering the imbalanced learning problem from the perspective of portfolio selection, a new combination pruning method is proposed to improve the classification performance of the combinatorial classifier on unbalanced data. Bagging is used to establish a classifier library, directly use positive (minority) instances as a pruning set, and select an optimal or suboptimal sub-classifier from the classifier library as a target classifier through MBM indexes and pruning sets , Used to predict the instance to be classified. Experimental results on 12 UCI datasets show that compared with EasyEnsemble, Bagging and C4.5 algorithms, this method can not only greatly improve the positive recall rate of combinatorial classifiers, but also improve the overall accuracy.