论文部分内容阅读
为了解决微博中存在的话题漂移和大量噪声问题,提出了基于动态话题模型和微博信息熵相结合的流的动态话题模型。首先利用动态话题模型在整个追踪过程,从正反两个方面增强对追踪话题的描述,进一步克服了话题漂移问题。但由于微博中存在大量中间类微博,所以定义并使用微博信息熵来衡量一条微博对于话题报道的重要性,并将其扩展到动态话题模型中,用于区分新闻类和中间类微博。在超过17万用户的1 200万条微博上进行了话题追踪,实验结果表明,本文算法较之传统的动态话题模型更有效,追踪结果包含更少噪声。
In order to solve the problem of topic drift and a lot of noise existing in Weibo, a dynamic topic model based on dynamic topic model and Weibo information entropy is proposed. Firstly, the dynamic topic model is used to enhance the description of the tracing topic from the pros and cons of the whole tracing process, thus further solving the problem of topic drifting. However, because of the existence of a large number of middle-class microblogs in Weibo, we define and use the Weibo information entropy to measure the importance of a Weibo for topic reporting and extend it to the dynamic topic model to distinguish between news and intermediate classes Weibo. The topic tracking was conducted on over 12 million microblogs of more than 170,000 users. The experimental results show that the proposed algorithm is more efficient than the traditional dynamic topic model, and the tracking results include less noise.