论文部分内容阅读
针对大多数的谱聚类算法缺少聚类数目的问题,提出一种自动确定最佳聚类数目的单词-文档谱聚类方法.该方法从多文档集合对应的单词-文档矩阵出发,利用形态学对矩阵进行转换、过滤,通过特征间隙确定最佳聚类数目.主要过程包括三个阶段:第一阶段将单词-文档矩阵转换成聚类数目趋势图像,第二阶段采用图像处理技术对灰度图进行过滤.第三阶段通过计算过滤后的灰度矩阵的第一个极大特征间隙所在位置得到最终的最佳聚类数目.实验证明,该方法不仅能评估最佳聚类数目,而且能在一定程度上提高单词-文档谱取类方法的准确性.
Aiming at the problem that most spectral clustering algorithms lack the number of clusters, a word-document spectral clustering method is proposed to automatically determine the optimal number of clusters.This method starts from the word-document matrix corresponding to multiple document sets, The main process consists of three stages: the first stage transforms the word-document matrix into the trend number of the clustering number, the second stage uses the image processing technology to calculate the gray number of gray The third stage obtains the final optimal cluster number by calculating the location of the first maximal feature gap of the filtered gray matrix.The experiment proves that this method not only can evaluate the optimal cluster number but also To some extent, it can improve the accuracy of word-document method.