论文部分内容阅读
大数据时代催生了互联网流量的指数级增长,为了有效地管控网络资源,提高网络安全性,需要对网络流量进行快速、准确的分类,这就对流量分类技术的实时性提出了更高的要求。目前,国内外的网络流量分类研究大多是在单机环境下进行的,计算资源有限,难以应对高速网络中的(准)实时流量分类任务。本文在充分借鉴已有研究成果的基础上,吸收当前最新的思想和技术,基于Spark平台,有机结合其流处理框架Spark Streaming与机器学习算法库MLlib,提出一种大规模网络流量准实时分类方法。实验结果表明,该方法在保证高分类准确率的同时,也具有很好的实时分类能力,可以满足实际网络中流量分类任务的实时性需求。
In the era of big data, the exponential growth of internet traffic has been expedited. In order to effectively control the network resources and improve the network security, the network traffic needs to be classified quickly and accurately. This puts forward higher requirements for the real-time traffic classification technology . At present, most of the researches on network traffic classification at home and abroad are carried out in a single computer environment. The computing resources are limited and it is hard to cope with the quasi-real-time traffic classification task in high-speed networks. On the basis of fully drawing on the existing research results, based on the Spark platform and combining with its stream processing framework, Spark Streaming and the machine learning algorithm MLlib, this paper proposes a quasi-real-time classification method for large-scale network traffic . The experimental results show that this method not only guarantees high classification accuracy but also has good real-time classification ability, which can meet the real-time requirements of traffic classification tasks in the real network.