论文部分内容阅读
在深度网研究领域,通用搜索引擎(比如Google和Yahoo)具有许多不足之处:它们各自所能覆盖的数据量与整个深度网数据总量的比值小于1/3;与表层网中的情况不同,几个搜索引擎相结合所能覆盖的数据量基本没有发生变化.许多深度网站点能够提供大量高质量的信息,并且,深度网正在逐渐成为一个最重要的信息资源.提出了一个三分类器的框架,用于自动识别特定领域的深度网入口.查询接口得到以后,可以将它们进行集成,然后将一个统一的接口提交给用户以方便他们查询信息.通过8组大规模的实验,验证了所提出的方法可以准确高效地发现特定领域的深度网入口.
In the area of deep web research, common search engines such as Google and Yahoo have many shortcomings: they each cover less than one-third the amount of data in the entire deep-network; , The amount of data that can be covered by the combination of several search engines has not changed basically.Many deep web sites can provide a large amount of high quality information and Deep Web is gradually becoming one of the most important information resources.There is a three classifier Framework for automatic identification of specific areas of the depth of the network entrance.Query interface is obtained, you can integrate them, and then submit a unified interface to the user to facilitate their query information.By 8 sets of large-scale experiments, verified The proposed method can accurately and efficiently discover deep domain entrances in specific fields.