论文部分内容阅读
在现代信息技术领域,如何快速、准确和全面地找到用户真正所需要的信息,已经成为该领域的研究重点。在文本分类的理论基础之上,文章针对KNN算法存在的不足,设计了一种基于聚类密度的文本分类算法,通过计算待分类文本的相似度及其权重值的大小判断待分类文本的所属类别。并通过3个实验对该分类算法进行了验证,实验结果表明,基于聚类密度的分类算法在不同特征选择方法、不同特征词数下的分类效果都优于KNN分类算法,同时证明在多种相似度算法中,Jensen-Shannon散度更适合聚类密度算法。
In the field of modern information technology, how to find the information that users really need quickly, accurately and comprehensively has become the research focus in this field. Based on the theory of text classification, aiming at the shortcomings of KNN algorithm, this paper designs a text classification algorithm based on clustering density, and determines the classification of the text to be classified by calculating the similarity of the text to be classified and the weight value category. The experimental results show that the clustering algorithm based on clustering density has better classification performance than the KNN classification algorithm under different feature selection methods and different feature words, Similarity algorithm, Jensen-Shannon divergence is more suitable for clustering density algorithm.