论文部分内容阅读
本文研究应用WEB信息抽取技术在互联网上主动搜索合作伙伴的理论与方法,提出了面向合作伙伴选择的中文Web信息获取系统的总体架构,并分析了实现该系统的关键技术—基于元搜索的网页搜集、基于样本公共特征的企业主页过滤、基于模式的企业信息抽取,并对这三个关键技术进行了详细的介绍。最后,按照作者提出的思想,编程实现了一个面向合作伙伴选择的中文Web信息获取原型系统,利用该系统验证了作者所提方法的可行性及证明了该方法的准确性。
This paper studies the theory and method of using WEB information extraction technology to actively search for partners on the Internet and proposes the overall architecture of the Chinese Web information acquisition system for partner selection and analyzes the key technologies to achieve the system - meta-search-based web pages Collecting, filtering the homepage based on the common characteristics of the sample, extracting the enterprise information based on the pattern, and introducing these three key technologies in detail. Finally, according to the idea put forward by the author, a prototype system of Chinese Web information acquisition for partner selection is programmed. The system verifies the feasibility of the proposed method and proves the accuracy of the method.