论文部分内容阅读
【目的】通过改进热点发现方法,解决传统方法存在的语义理解不足和聚类算法局限性的问题。【方法】从语义分析角度表示文本,使用信息增益和潜在语义分析方法构建词–文档矩阵;提出二次聚类算法方案,实现热点发现与更新,并使用相似强度的大小选取最优热点。【结果】该热点发现方法的查全率为91.3%,查准率为92.9%,较前人方法的聚类效果有所提高;该热点发现方法也可以更新数据,降低实验复杂度。【局限】实验数据的时间跨度较小,使得更新热点方法的效果不太显著。【结论】本文提出的热点发现方法具有良好的准确性。
【Objective】 By improving the hot spot discovery method, the problems of traditional methods such as lack of understanding of semantics and limitation of clustering algorithm are solved. 【Method】 From the point of view of semantic analysis, texts are constructed. The word-document matrix is constructed by using information gain and latent semantic analysis. A scheme of twice clustering algorithm is proposed to realize the hot spot discovery and updating, and the optimal hot spot is selected according to the similar intensity. 【Result】 The detection rate of the hot spot detection method was 91.3% and the accuracy rate was 92.9%. Compared with the previous methods, the clustering effect was improved. The hot spot detection method could also update the data and reduce the experimental complexity. [Limitations] The small time span of experimental data makes the effect of updating the hotspot method less noticeable. 【Conclusion】 The proposed hot spot detection method has good accuracy.