论文部分内容阅读
为了研究互联网用户对网站的访问模式,借助中国互联网络信息中心负责管理的国家域名系统资源,选取了一整天CN域名权威服务器的日志。提出了域名规约的方法,将日志中的域名合并为二级域名或者CN下41个类别和行政区的三级域名。该方法不仅保留了用户对网站的访问信息,而且能够达到压缩数据的目的。采用k-means算法对所提取的IP和域名的时间行为特征矢量进行聚类。结果表明:根据时间行为模式的不同,IP地址有3个主要类别,即攻击者、主要ISP的递归服务器和非主流递归服务器;域名有4个主要类别,对其中大量访问的域名进一步分类,找到了真正体现绝大多数用户网络访问需求的域名集合。
In order to study the internet users’ access to websites, with the aid of the national domain name system resources managed by China Internet Network Information Center, logs of CN domain name authority servers for a whole day were selected. Proposed a method of domain name protocol, the domain name log into two domain names or CN under the 41 categories and administrative regions of the third-level domain name. The method not only retains the user’s access to the site information, but also to achieve the purpose of compressing data. The k-means algorithm is used to cluster the temporal behavior feature vectors of the extracted IP and domain names. The results show that there are three main categories of IP addresses, namely, attacker, recursive server of main ISP and non-mainstream recursive server. There are four main categories of domain names, which further classify the domain names accessed by a large number and find The domain name collection really reflects the needs of the vast majority of users network access.