论文部分内容阅读
针对特征选择算法的鲁棒性和稳定性问题以及现实应用领域中大量的廉价未标签数据的利用问题,提出一种基于双重融合策略的半监督特征选择算法.该方法综合利用弱分类器融合技术和未标签数据包含的数据集的簇的结构信息来扩充标签数据集,然后在得到的标签数据集上采用不同的特征选择算法,对不同的特征结果进行简单的融合操作,得到最终的特征子集.在一些公共数据集和有毒性预测数据集上的实验结果表明该方法在改善学习精度上有很好的应用前景.
Aiming at the problem of robustness and stability of feature selection algorithm and the utilization of a large amount of cheap and unlabeled data in the field of practical application, a semi-supervised feature selection algorithm based on double fusion strategy is proposed. This method combines weak classifier fusion And unlabeled data contained in the cluster data structure to expand the tag data set, and then the tag data set obtained using different feature selection algorithm, the different features of the results of a simple fusion operation to get the final feature The experimental results on some public datasets and toxic prediction datasets show that this method has a good application prospect in improving the learning accuracy.