Probabilistic Top-k Query: Model and Application on Web Traffic Analysis

来源 :中国通信 | 被引量 : 0次 | 上传用户:qian7122011
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Top-k ranking of websites according to traffic volume is important for Internet Service Providers(ISPs) to understand network status and optimize network resources. However, the ranking result always has a big deviation with actual rank for the existence of unknown web traffic, which cannot be identified accurately under current techniques. In this paper, we introduce a novel method to approximate the actual rank. This method associates unknown web traffic with websites according to statistical probabilities. Then, we construct a probabilistic top-k query model to rank websites. We conduct several experiments by using real HTTP traffic traces collected from a commercial ISP covering an entire city in northern China. Experimental results show that the proposed techniques can reduce the deviation existing between the ground truth and the ranking results vastly. In addition, we find that the websites providing video service have higher ratio of unknown IP as well as higher ratio of unknown traffic than the websites providing text web page service. Specifically, we find that the top-3 video websites have more than 90% of unknown web traffic. All these findings are helpful for ISPs understanding network status and deploying Content Distributed Network(CDN). Top-k ranking of websites according to traffic volume is important for Internet Service Providers (ISPs) to understand network status and optimize network resources. However, the ranking result has a big deviation with actual rank for the existence of unknown web traffic, which In this paper, we introduce a novel method to approximate the current technology.. This method associates unknown web traffic with websites according to statistical probabilities. Then, we construct a probabilistic top-k query model to rank website We conduct several experiments by using real HTTP HTTP connections trace from a commercial ISP covering an entire city in northern China. Experimental study show that the proposed techniques can reduce the deviation existing between the ground truth and the ranking results vastly. In addition, we find that the websites providing video service have higher ratio of unknown IP as well as higher ratio of unknow n traffic than the websites providing text web page service. Specifically, we find that the top-3 video websites have more than 90% of unknown web traffic. All these findings are helpful for ISPs understanding network status and deploying Content Distributed Network (CDN) .
其他文献
A series of orthogonal array experiments were conducted using carbon source, ammonia nitrogen and total phosphorus (TP) as major influencing factors to investig
请下载后查看,本文暂不支持在线获取查看简介。 Please download to view, this article does not support online access to view profile.
期刊
探索润土液肥所含的八大菌群促进水稻生长的原因,总结其提高水稻有效穗数,实粒数,结实率,产量之结果,为推广提供科学依据。
期刊
3月28日,印度商工部反倾销调查局召开了对中国充气子午线卡客车轮胎反倾销调查听证会,本案各利益相关方均派出代表参加。听证会上,各方主要就印度国内产业是否遭受损害及损害
期刊
目的查明百色水利枢纽库区沿岸鼠疫主要宿主动物和跳蚤的种类构成,了解该地区鼠疫疫源情况,为预防控制鼠疫提供科学依据。方法运用流行病学方法对库区的6个乡镇12个村屯展开
More and more embedded devices,such as mobile phones,tablet PCs and laptops,are used in every field,so huge files need to be stored or backed up into cloud stor
目的:了解婴儿母乳喂养情况及影响因素。方法:对2009年1月—2010年6月我院保健科287名2岁以下儿童亲属问卷调查,统计并分析我市部分地区母乳喂养状况。结果:287名儿童总母乳
目的:探讨超声引导下经阴道未成熟卵泡穿刺(IMFP)对多囊卵巢综合征(PCOS)不孕患者内分泌水平的影响。方法:对21例PCOS不孕患者在超声引导下进行未成熟卵泡穿刺,抽吸卵泡液。
目的针对SIFT算法计算复杂度高、存储开销大和近几年提出的BRIEF(binary robust independent elementary features)、ORB(oriented BRIEF)、BRISK(binary robust invariant scalable keypoints)和FREAK(fast retina keypoint)等二进制描述子可区分性弱和鲁棒性差的问题,