论文部分内容阅读
网页的链接关系反映了网页之间联系的紧密程度,这种紧密关系是网页聚类的重要依据.首先通过对网页链路结构的特点分析,提出网页节点的基本集、扩展集、半径、邻域、密度和路径树等概念;然后,利用共享入度出度以及网页之间的相异度来衡量其距离,并结合扩展集中的链接信息设计了网页相似度的计算模型;最后,利用密度分布对网页进行聚类.实验结果表明,本算法具有较好的聚类效果.
The link relationship of web pages reflects the closeness of the links between web pages, which is an important basis for web page clustering.First, through the analysis of the characteristics of web page link structure, the paper presents the basic set, extension set, radius, Domain, density and path tree. Secondly, we use the sharing degree of outreach and the dissimilarity between web pages to measure the distance, and then design the computing model of web page similarity based on the extension information. Finally, The clustering of Web pages is carried out.Experimental results show that this algorithm has good clustering effect.