论文部分内容阅读
聚焦爬虫可以搜集特定领域的信息资源,能够满足人们的个性化需求。编目人员在从事原始编目工作的过程中,如果能够从网络中查找到相应的编目数据作为参考,那么将会大大提高编目效率。因此,将此类编目数据视为一类主题信息资源,用聚焦爬虫进行抓取为编目人员所用就成为一种可能的方案。从聚焦爬虫的内涵和基本构成入手,分析利用聚焦爬虫搜集编目数据的技术,并构建融合聚焦爬虫技术的编目数据搜集模型。
Focus crawler can collect information in specific areas of resources to meet people’s individual needs. Cataloging staff engaged in the original cataloging process, if you can find the corresponding catalog data from the network as a reference, it will greatly improve the efficiency of the catalog. Therefore, considering such cataloging data as a kind of thematic information resources, crawling with focused crawlers becomes a possible solution for catalogers. Starting from the connotation and basic structure of the focus reptile, this paper analyzes the technology of using the focus reptile to collect cataloged data, and builds the catalog data collection model which integrates the focus reptile technology.