论文部分内容阅读
确定“最佳聚类数”一直是聚类算法面临的一个难题。为了确定一族合理的聚类数而不是单个聚类数,提出了一种基于谱分析的算法,并能处理较为复杂的数据集。该算法构建了数据点之间的相似度图,在不同的分析粒度下,用图上的“随机游走”来传播相似度,采用了一个新的评判标准,“广义特征差”来寻找聚类数族。实验结果表明该算法在聚类数不唯一的情况下能够有效地确定聚类数,并且和其他几种算法相比具有较优的计算复杂度。
Determining the “best clustering number” has always been a challenge for clustering algorithms. In order to determine a reasonable number of clusters rather than a single cluster number, a spectral analysis based algorithm is proposed and can handle more complex data sets. The algorithm constructs a similarity graph between data points, spreads the similarity by “random walk” on the graph under different analysis granularities, and adopts a new evaluation criterion, “generalized characteristic difference” to find the poly Classes. Experimental results show that the proposed algorithm can effectively determine the number of clusters when the number of clusters is not unique and has better computational complexity than other algorithms.