论文部分内容阅读
目前,互联网上的大部分群体性数据资源集中在微博、论坛等社交网络上.跨语言社会舆情分析是我国智能信息处理的一个研究热点.维吾尔语是我国主要少数民族语言之一,为了构建一个好的跨语言舆情分析系统,维吾尔文微博的数据获取显得尤为重要.维吾尔文微博数据获取最大的难点是微博开发商不提供API.本文以技术和经济为基础的“Guduk”微博为研究对象,提出了一种基于用户关系的维吾尔文微博数据获取爬虫系统方案,此方案解决了在不提供API情况下的数据获取难点.本文的研究为跨语言舆情分析系统提供大量的维吾尔文社交网络数据资源、数据获取方法和技术.
At present, most of the mass data resources on the Internet are concentrated on the social networks such as Weibo, Forums, etc. Cross-linguistic social public opinion analysis is a hot research topic in China’s intelligence information processing.Uygur language is one of the major minority languages in our country, A good cross-language public opinion analysis system, Uighur microblogging data acquisition is particularly important.Ugur language microblogging data access is the biggest difficulty is that weibo developers do not provide API.This paper is based on technology and economy “Guduk ”Weibo as the research object and put forward a scheme of Uyghur micro-blog data acquisition crawler based on user relationship, which solves the difficulty of data acquisition without providing API.The research in this paper provides a cross-language opinion analysis system A large number of Uighur social network data resources, data acquisition methods and technologies.