论文部分内容阅读
Google采用了并行,索引桶,数据压缩,PageRank算法等的技术,建立了复杂的体系结构,包括网络爬行机器人crawler、知识库Repository、索引系统(包括索引器indexer,桶barrels,文件索引等)、排序器Sorter和搜索器Searcher五个部分.Google的rank系统综合了词频,类型,相邻度,网页重要性等因素.其中最值得一提的是计算网页重要性的PageRank算法,它把文献检索的引用理论应用到Web中,即一个网页有很多网页指向它,或者一些重要的网页指向它,则这个网页很重要.PageRank算法大大提高了检索效率.
Google has built a complex architecture using techniques such as parallelism, index buckets, data compression, and PageRank algorithms, including a crawler crawler, repository Repository, indexing system (including indexer, bucket barrels, file indexing, etc.) Sequencer Sorter and Searcher Searcher five parts.Google’s rank system integrates the word frequency, type, proximity, page importance, etc. One of the most noteworthy is the PageRank algorithm that calculates the importance of web pages, The reference theory applied to the Web, that is, a web page has a lot of pages pointing to it, or some important pages point to it, then this page is very important .PageRank algorithm greatly improves the search efficiency.