论文部分内容阅读
针对非平衡数据的半监督分类问题,提出了一种基于Biased-SVM的非平衡半监督分类算法.该方法首先利用初始的标记样本集训练处理不平衡数据的Biased-SVM模型,然后用训练好的Biased-SVM模型为未标记样本加上标签,再把新标记样本加入到初始标记样本集中,重新训练Biased-SVM模型,最后在测试集上进行测试.选取公共数据库里的一些数据集进行实验,首先在两类不平衡数据集上实验的结果表明,在标记样本所占比例为20%~80%时,所提方法能够在不降低数据集整体G-mean值的基础上,提高小类的F-value值并具有较高的稳定性;然后在多类不平衡数据集上实验的结果表明,在标记样本所占比例为20%~80%时,所提方法能够在不降低数据集整体的EG-mean值的基础上,提高小类识别率并具有较高的稳定性.
To solve the problem of semi-supervised classification of unbalanced data, a non-equilibrium semi-supervised classification algorithm based on Biased-SVM is proposed in this paper. Firstly, the initial labeled sample set is used to train the Biased-SVM model of imbalanced data, Biased-SVM model is used to label the unlabeled samples, then the new labeled samples are added into the initial labeled sample set to re-train the Biased-SVM model and finally tested on the test set.Experiments are performed on some datasets in the public database . The experimental results on two types of unbalanced datasets first show that when the proportion of labeled samples is 20% ~ 80%, the proposed method can improve the subclassification without reducing the overall G-mean value of the dataset The results of experiment on many kinds of unbalanced datasets show that when the proportion of labeled samples is 20% ~ 80%, the proposed method can be used in the data set without reducing the data set Based on the overall EG-mean value, the recognition rate of small class is improved and has high stability.