论文部分内容阅读
理想的P2P(Peer-to-Peer)搜索算法应该同时具有信息检索水平的查询质量和有效的搜索性能。然而,现有的搜索算法都不能同时较好地满足这两点。基于这两个目标,该文提出一种基于层次聚类的分布层层次聚类(DHC)搜索算法。该算法中首先利用向量空间模型将文件内容表示成向量的形式,然后经过层次聚类操作得到一棵关于全网所有文件向量的层次树,层次树信息分布式地存储于整个网络中,以层次树为路由线索,路由深度不会超过树的高度。初步仿真试验表明,该算法的查全率在80%以上,并具有对数量级的搜索与更新代价。
The ideal Peer-to-Peer (P2P) search algorithm should have both the quality of query and effective search performance at the same time. However, none of the existing search algorithms satisfy both of these points at the same time. Based on these two goals, this paper proposes a Distributed Hierarchical Clustering (DHC) search algorithm based on hierarchical clustering. Firstly, the vector space model is used to represent the content of the file as a vector, and then a hierarchical tree of all file vectors of the whole network is obtained through hierarchical clustering. The hierarchical tree information is distributed and stored in the entire network in a hierarchical Tree routing trails, routing depth will not exceed the height of the tree. The preliminary simulation results show that this algorithm has a recall of more than 80% and has an order of magnitude search and update cost.