论文部分内容阅读
地理要素变化检测已成为国家地理信息“十二五”规划和全国地理国情普查的重要组成部分。网页文本中蕴含海量的地理要素信息,尤其是新闻、政府、社交平台等网站的网页文本更新频繁,可为地理要素变化检测提供现势性的数据源。本文针对网页文本中地理要素变化的语言描述特点,构建了表达地理要素变化的语义知识库,设计了搜索引擎和通用主题相结合的网页爬虫,实现了相关网页文本的高效获取;采用规则模型和条件随机场模型,分别进行网页文本中地理要素变化信息抽取,包括地理要素名称、位置(地名)、时间和属性等。实验结果显示,本文设计的网页爬虫具有较高的相关网页文本获取能力,地理要素变化信息抽取的准确率能够达到70%以上,但是,语义知识库的完备程度对于信息抽取性能具有较大影响。研究成果表明,以网页文本为数据源的地理要素变化信息获取方法,能提供一种快速检测地理要素变化的新途径,与实地调绘和遥感影像检测等方法结合应用具有较好的优势互补性,可作为有力的辅助手段解决地理要素的持续更新和实时更新问题。
The detection of the change of geographical elements has become an important part of the national geographic information “Twelfth Five-Year Plan” and the national geographical survey. Web page contains a lot of geographical elements of information, in particular, news, government, social networking sites such as web site frequently updated text, changes in the detection of geographical elements to provide a positive data source. This paper aims at the characteristics of language description of the geographical elements in Web page text, constructs the semantic knowledge base which expresses the change of geographical elements, designs the web crawler which combines the search engine and the general theme, and achieves the efficient retrieval of the relevant web page texts. Conditional random field model, we extract the change information of geographical elements in the web page text, including the geographical element name, location (place name), time and attribute. The experimental results show that the web crawler designed in this paper has a high ability of retrieving the relevant webpage texts, and the accuracy of extracting geographic information changes can reach more than 70%. However, the completeness of the semantic knowledge base has a great influence on the information extraction performance. The research results show that the method of obtaining geographic change information using web page as a data source can provide a new way to quickly detect the change of geographical elements. It can be used in combination with field mapping and remote sensing image detection to achieve better complementarity , Which can be used as a powerful aid to solve the problem of continuous updating and real-time updating of geographical elements.