论文部分内容阅读
半监督学习是近年来机器学习领域中的一个重要研究方向,其监督信息的质量对半监督聚类的结果影响很大,主动学习高质量的监督信息很有必要.提出一种纠错式主动学习成对约束的方法,算法通过寻找聚类算法本身不能发现的成对约束监督信息,将其引入谱聚类算法,利用该监督信息来调整谱聚类中点与点之间的距离矩阵.采用双向寻找的方法,将点与点间距离进行排序,使得学习器即使在接收到没有标记的数据时也能进行主动学习,实现了在较少的约束下可得到较好的聚类结果.同时,该算法降低了计算复杂度,并解决了聚类过程中成对约束的奇异问题.通过在UCI基准数据集以及人工数据集的实验表明,算法的性能好于相关对比算法,并优于采用随机选取监督信息的谱聚类性能.
Semi-supervised learning is an important research direction in the field of machine learning in recent years, and the quality of its supervised information has a great influence on the results of semi-supervised clustering.It is necessary to actively study high-quality supervisory information.An error-correcting proactive By studying the pairwise constraint method, the algorithm supervises the information by pairing constraints, which can not be found by the clustering algorithm itself, and introduces it into the spectral clustering algorithm. The supervised information is used to adjust the distance matrix between points in the spectral clustering. Using two-way search method, the distance between points is sorted so that the learner can learn actively even when receiving unlabeled data, which achieves better clustering results with fewer constraints. At the same time, the algorithm reduces the computational complexity and solves the singular problem of pairwise constraint in the clustering process.Experiments on the UCI benchmark dataset and the artificial dataset show that the algorithm performs better than the correlation algorithm and is superior to Spectral clustering performance using random selection of supervisory information.