论文部分内容阅读
K-means只能处理球形或具有较好分离度的数据集,不能够处理任意形状的数据集.同时,因为初始中心点是随机选择的,所以K-means聚类结果是不稳定的.为此提出一个新的聚类算法.具体如下:首先用K-means对数据集进行多次划分,计算点对出现在同一个类的频数,辨识并丢弃噪声点,从而获得精炼的类.然后重新分配那些点个数较少的类及分割距离方差较大的类,得到稳定的类.再用基于贝叶斯的连接性准则合并稳定的类,以生成用户指定个数的类.最后,把丢弃的噪声点分配给其最近邻的类.在一些人工数据集上做了实验,提出的聚类方法准确率较原始的K-means及其他传统的方法,如DBSCAN,Single-linkage有显著的提高.
K-means can only deal with spheres or data sets with better resolution, and can not process data sets of arbitrary shape.At the same time, K-means clustering results are unstable because the initial center point is randomly selected In this paper, a new clustering algorithm is proposed as follows: Firstly, K-means is used to divide the dataset multiple times to calculate the frequencies of point pairs in the same class, to identify and discard the noise points to obtain refined classes, Assign those classes with fewer points and classes with larger variance in distance to get a stable class, then combine the stable classes with the Bayesian connectivity criterion to generate the user-specified number of classes Finally, Discarded noise points are assigned to the nearest neighbor classes.Experiments on some artificial datasets show that the proposed clustering method has a higher accuracy than the original K-means and other traditional methods such as DBSCAN and Single-linkage improve.