,PathMarker: protecting web contents against inside crawlers

来源 :网络空间安全科学与技术（英文版） | 被引量 : 0次 | 上传用户：bj20089

【摘要】

：

Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoure

【作者】

：

Shengye Wan Yue Li Kun Sun

【机构】

：

Department of Computer Science College of William and Mary, Williamsburg 23187-8795, VA, USA“,”Depar

【出处】

：

网络空间安全科学与技术（英文版）

【发表日期】

：

2018年3期

【关键词】

：

Anti-Crawler mechanism Stealthy distributed inside crawler Confidential Website

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanisms in the arm races between crawler developers and crawler defenders.In this paper,based on one observation that normal users and malicious crawlers have different short-term and long-term download behaviours,we develop a new anti-crawler mechanism called PathMarker to detect and constrain persistent distributed crawlers.By adding a marker to each Uniform Resource Locator (URL),we can trace the page that leads to the access of this URL and the user identity who accesses this URL.With this supporting information,we can not only perform more accurate heuristic detection using the path related features,but also develop a Support Vector Machine based machine leaing detection model to distinguish malicious crawlers from normal users via inspecting their different pattes of URL visiting paths and URL visiting timings.In addition to effectively detecting crawlers at the earliest stage,PathMarker can dramatically suppress the scraping efficiency of crawlers before they are detected.We deploy our approach on an online forum website,and the evaluation results show that PathMarker can quickly capture all 6 open-source and in-house crawlers,plus two exteal crawlers (i.e.,Googlebots and Yahoo Slurp).

其他文献

复杂对抗条件下鱼雷作战能力评估模型

复杂对抗条件下的鱼雷作战能力评估分析是现代鱼雷研制、改进和使用中不可缺少的环节之一.文章从理论分析和工程应用出发,提出了一种新的鱼雷作战能力评估模型,该模型在鱼雷

期刊

鱼雷作战能力评估模型复杂对抗条件

,Using IM-Visor to stop untrusted IME apps from stealing sensitive keystrokes

Third-party IME (Input Method Editor) apps are often the preference means of interaction for Android users’ input.In this paper,we first discuss the insecurity

期刊

TrustZoneAndroid app securityUser privacy

教育市场化下财政投入、社会投入与高等教育产出——基于OECD成员国2000-2012年的面板数据

科技创新是提高社会生产力和综合国力的战略支撑，作为教育体系的重要组成部分，高等教育肩负着培养高层次人才和发展现代科技的重大责任。高等教育要发展就离不开资金的支持，充足

学位

高等院校教育市场化财政投入社会投入教育产出

复杂电磁环境对舰艇电子对抗装备的影响及对策

在未来作战中,舰艇电子对抗作战将不可避免地面临复杂电磁环境;在分析复杂电磁环境对水面舰艇电子对抗装备作战影响的基础上,从技战术层面,提出适应复杂电磁环境的舰艇电子对

期刊

复杂电磁环境水面舰艇电子对抗

网络化数字音频新技术与应用研讨会在北京召开

2006年9月22日,由中国建筑业协会智能建筑专业委员会和上海新启邦威电子有限公司在北京艾维克大厦联合举办了“网络化数字音频新技术与应用研讨会”。上海新启邦威电子有限

期刊

智能建筑专业数字音频应用研讨会建筑业协会邦威体矩阵副主任专家工作组系列产品系统布线

沪深股市上市公司财务危机预警实证分析——以制造业为例

随着我国加入WTO,市场竞争日益激烈,陷入财务危机的企业数量急剧上升,企业抵抗风险的能力较弱。企业财务危机的产生也不是一朝一夕造成的,而是一个长期积累和逐步发展的过程

学位

沪深股市上市公司企业财务危机财务危机预警实证分析可持续发展Logit回归模型

,The privacy protection mechanism of Hyperledger Fabric and its application in supply chain finance

Blockchain technology ensures that data is tamper-proof,traceable,and trustworthy.This article introduces a well-known blockchain technology implementation—Hyp

期刊

Privacy protectionSupply chain financeHyperledger Fabric

书籍——的精神食粮

莎士比亚说:书籍是全世界的营养品.生活里没有书籍,就好像没有阳光;智慧里没有书籍,就好像鸟儿没有翅膀.是啊,从我认字开始,我每天都在和书籍接触,小时候,爸爸总会在临睡前给

期刊

书籍莎士比亚皇帝海的女儿故事悲惨遭遇营养品美人鱼智慧阳光女孩火柴翅膀

新闻书籍雷同之处何其多

由于工作关系,笔者是不吝花钱购买新闻业务书籍的,不单自己常到书店买,朋友们出差也嘱托代买。故此,这类藏书日渐多起来。然而,这类书名目虽繁多,内容却颇多似曾相识。称“

期刊

新闻业务写作方法出版发行叙述方法理沦书读

选准入口处

写长篇小说需要有个入口处,我们写消息、通讯、特写、言论,同样也需要有个入口处。寻找和选择入口处,是采写新闻报道的一个回避不了的现实问题。第一,选妥入口处,新闻报道就

期刊

新闻报道新闻写作黄正“三北”地区报道时机新闻人物两家子《解放军报》《浙江日报》伦等

,PathMarker: protecting web contents against inside crawlers

与本文相关的学术论文