论文部分内容阅读
【目的】利用数据挖掘算法,从海量繁杂的微博数据中检测出有价值的事件信息。【方法】针对国内具有代表性的微博网站,通过使用微博网络开放接口高效收集带有地理坐标的微博数据。使用K-means、KNN和决策树三种数据挖掘算法,根据微博数据的发布数、转发数、评论数、用户活跃度和移动强度5个指标构建微博的地理规律性特征。将日常地区性的微博数据特征与该地区微博特征的地理规律性进行比较,从而检测出该区域是否有事件发生。【结果】以2015年4月15日、16日的微博数据作为测试语料,使用文中提出的微博事件检测框架,成功检测到“北京沙尘暴”事件。【局限】在抽取微博地理规律性特征时,采用的样本数据偏少,一定程度上影响了事件检测框架的效果。【结论】基于地理坐标的微博事件检测框架是切实有效的,分析出的事件信息不仅可以帮助用户获取感兴趣的事件资讯,而且可以协助政府部门进行舆情管控和行政决策。
【Objective】 Data mining algorithms are used to detect valuable event information from massive and complicated microblog data. 【Method】 Aiming at the representative Weibo website in our country, Weibo data with geographic coordinates was collected efficiently by using Weibo open interface. We use the K-means, KNN and decision tree algorithms to construct the geo-regularity of Weibo based on the five indicators of the number of microblogging data, the number of forwarding, the number of comments, the user activity and the mobile intensity. The daily regional characteristics of the Weibo data are compared with the geographic regularity of the Weibo features in the region to detect whether there is an event in the region. 【Result】 Based on the Weibo data of April 15 and April 16, 2015 as test corpus, we successfully detected the “Beijing Sandstorm” event using the microblog event detection framework proposed in this paper. [Limitations] In taking the geography regular characteristics of microblogging, the sample data used is too small, to some extent, the effect of the event detection framework. 【Conclusion】 The detection framework of microblog events based on geographic coordinates is effective. The analyzed event information can not only help users to get the information of the events of interest, but also assist the government departments in public opinion control and administrative decision-making.