论文部分内容阅读
[目的 /意义]在大数据时代面对海量的数据用户有时会束手无策。因此,越来越多的学者们开始关注互联网热点话题发现的算法,帮助用户快速获取热点话题。[方法 /过程]基于DBSCAN算法,通过动态调整参数来优化算法,实现热点话题发现。根据句法结构与句间关系分析构建热点话题过滤模型,过滤包含热点词项的一般话题。[结果 /结论]采用主流网站新闻数据集进行实验,利用错检率、漏检率等评价指标对算法的有效性进行检验,实验结果证明改进算法性能有所提升,能够为信息用户提供科学研究网络数据的高效途径。
[Purpose / Significance] In the era of big data, facing the massive data users are sometimes helpless. Therefore, more and more scholars begin to pay attention to the algorithm of Internet hot topic discovery, to help users to quickly get hot topics. [Method / Process] Based on the DBSCAN algorithm, the algorithm is optimized by dynamically adjusting parameters to achieve hot topic discovery. According to the syntactic structure and the relationship between sentences to build a hot topic filtering model, filtering hot topics containing the general topic. [Result / Conclusion] Experiments were conducted on the mainstream site news datasets, and the validity of the algorithm was tested by using the evaluation indexes such as the false detection rate and the missed detection rate. The experimental results show that the performance of the improved algorithm is improved and it can provide scientific research for information users Network data efficient way.