A Network Traffic Prediction Method Based on LSTM

来源 :ZTE Communications | 被引量 : 0次 | 上传用户:gwq939
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Abstract: As the network sizes continue to increase, network traffic grows exponentially. In this situation, how to accurately predict network traffic to serve customers better has become one of the issues that Internet service providers care most about. Current traditional network models cannot predict network traffic that behaves as a nonlinear system. In this paper, a long short?term memory (LSTM) neural network model is proposed to predict network traffic that behaves as a nonlinear system. According to characteristics of autocorrelation, an autocorrelation coefficient is added to the model to improve the accuracy of the prediction model. Several experiments were conducted using real?world data, showing the effectiveness of LSTM model and the improved accuracy with autocorrelation considered. The experimental results show that the proposed model is efficient and suitable for real?world network traffic prediction.
  Keywords: recurrent neural networks; time series; network traffic prediction
  1 Introduction
  s Transmission Control Protocol/Internet Protocol (TCP/IP) networks become more and more important in modern society, how to better understand and correctly predict the behavior of the network is a vital point in the development of information technology. For medium/large network providers, network prediction has become an important task and get more and more attention [1]. By improving the accuracy of network prediction, network providers can better optimize resources and provide better service quality [2]. Not only that, network traffic prediction can help detect malicious attacks in the network too. For example, denial of service or spam attacks can be detected by comparing real traffic and predicting traffic [3]. The earlier these problems are detected, the more reliable network services can be obtained [4], [5].
  With the development of computational science, some relatively reliable methods of network traffic prediction have been proposed to replace the intuitive prediction, especially in the field of sequential sequence prediction.
  Nancy and George [6] used the time series analysis method to make an accurate prediction of the National Science Foundation Network (NSFNET) traffic data, and then carried out a series of research on the method of time series data, especially network traffic data prediction [7], [8]. Modeling methods include the autoregressive moving average model (ARMA), Autoregressive Integrated Moving Average model (ARIMA), Fractional Autoregressive Integrated Moving Average model (FARIMA), etc. Large?scale network system is a complex nonlinear system, and it is influenced by many external factors. Because of that its macroscopic flow behavior is often complex and changeable, and the data contains many kinds of periodic fluctuations. In addition, it shows non?linear trend and contains uncertain random factors, which may prevent the flow model with linear characteristics from being predicted accurately. Therefore, how to select and optimize the nonlinear model is the key to predict network traffic in recent years. Support Vector Machine (SVM), Least Squares Support Vector Machine (LSSVM) , artificial neural network, echo state network and so on all have certain improvement to the prediction accuracy. However, these models ignore the network traffic data from the autocorrelation, although they consider the nonlinear characteristics of data. Therefore, how to use nonlinear time sequence data to predict network traffic linked with autocorrelation characteristic is a problem to be solved eagerly.   2 Flow Prediction Model
  This section describes recurrent neural networks (RNN), which are used for network traffic prediction, and also introduce a special type of RNN, long short?term memory networks (LSTM). On the basis of this, we put forward the model of LSTM combination with the characteristics of network traffic autocorrelation.
  2.1 Recurrent Neural Networks
  RNN is a popular learning method in the fields of machine learning and deep learning in recent years, which is different from the traditional feedforward neural network (FNN). FNN neurons transmit information through the input layer, hidden layer, and the connection of the output layer. Each input item is independent of others, and there is no connection between neurons in the same layer. However, RNN introduces the recurrent structure in the network, and establishes the connection of the neuron to itself. Through this circular structure, neurons can “remember” information from the previous moment in the neural network and influence the output of the current moment. Therefore, RNN can better deal with the data with time series, and in the prediction of timing data, it often performs better than FNN. The structure diagram of RNN is shown in Fig. 1.
  In Algorithm 1, T is the length of the input data, vt is the input for time t, zt is the output for time t. Whv, Whh, and Woh are the link matrix of input layer to hidden layer, hidden layer to hidden layer and hidden layer to output layer. Functions e and g are the activation functions of the hidden layer and the output layer respectively.
  There are some differences between RNN training and FNN. FNN is implemented by back propagation (BP) algorithm. However, RNN affects the error of the output layer because of the hidden layers of the previous moments, so the results of the backward propagation need to be propagated on the time dimension, named back propagation through time (BPTT) algorithm. The BPTT of RNN defines the partial derivative of the loss function to the t input value of the neuron j at time t firstly, and then solves the function with the derivative of the chain (Algorithm 2).
  The partial derivative between the loss function and the neuron is affected by the output layer of the current time t and the hidden layer of [t+1] at the next moment. For each time step, we use the chain rule to sum up all the results in the time dimension and get the partial derivative of the loss function to the weight w of the neural network. The weight of the recursive neural network is updated by gradient descent method until the condition is met.   The final step in the RNN training process is that the gradient is multiplied repeatedly in the propagation process. If the eigenvalues [Whh>1], this will result in a gradient explode; if the eigenvalues [Whh<1], this will result in a gradient vanish [9]-[12],. For this problem, Hochreiter et al. [13] put forward a LSTM neural network.
  2.2 Long Short Term Memory
  LSTM neural network is a variant of RNN. The key is to replace the neurons with cell states. The cell state is delivered over the time chain, with only a few linear interactions, and information is easily maintained on cell units. Each cell contains one or more memory cells and three nonlinear summation units. The nonlinear summation unit is also called the “gate”, which is divided into three kinds: “Input gate”, “Output gate” and “Forget gate”. They control the input and output of memory cells by matrix multiplication. The structure diagram of LSTM is shown in Fig. 2 [14].
  Eq. (1) is the parameter of forget gate, where 1 means “complete reservation”, while 0 means “completely abandoned”. Eqs. (2) and (3) calculate the value of the input gate. Then Eq. (4) is used to discard the information that we need to discard, plus useful information entered at time t. Eqs. (5) and (6) determine which part of the cell state we are going to output to the next neural network and the next time.
  The BPTT algorithm of LSTM is similar to that in RNN. It starts from the end of the time series (time T ), gradually reverses the gradient of each parameter, and then updates the network parameters with the gradient of each time step. The output value corresponding to the partial derivative of memory cells is calculated first, then the partial derivative of the output gate is calculated, and the corresponding partial derivatives of the memory cell state, forget gate and input gate are calculated respectively. Eventually, the model is updated using the gradient descent method connection weights.
  2.3 Long Short Term Modeling Applicable to
   Autocorrelation Sequence
  Time series is an ordered data set in time. Each data corresponds to the corresponding time. The time series model assumes that the data of the past can be applied to the future, and we can predict future data by studying the information of historical data.
  In the conventional sequential sequence prediction problem, let’s say the time series is [y1,y2,y3 ... yt], [yt] represents the data value at time t. We predict the value of [ym+1] by using[ym-k+1,ym-k+2 ... ym], where k represents the number of steps to be used for each prediction.   Four types of prediction can be defined according to the granularity of time [15]: current?time prediction, short ? term prediction, medium?term prediction, and long?term prediction. The current?time prediction requires the shortest time interval between data, which can be used to establish an online real?time forecasting system. The short term interval is usually between an hour and a few hours, often used for optimal control or abnormal detection. The medium?term forecast time interval is one day, which can be used to guide resource planning. The long term interval is between months and years and can be used as a reference for strategy and economic investment.
  The autocorrelation coefficients [rk] is used to indicate the autocorrelation between the time sequence itself and the data of the lag k period [15]:
  The traditional RNN sequence prediction model is based on the changing trend of the learning sequence to predict the value of next moment. According to our experience, the network traffic data has a certain autocorrelation in the 24 hours and 7 days cycles. This means that the data of 24 hours or 7 days at a certain time can be very good at representing the data characteristics of the present moment. However, because the data of 24 hours or 7 days ago does not constitute a continuous sequential relationship with T time steps before, the traditional RNN neural network does not apply to the modeling of autocorrelation characteristics.
  Using the improved LSTM + deep neural network (DNN) neural network and keeping the original LSTM network unchanged, the predicted value and several values before the autocorrelation cycle make the input of DNN neural network. By training the DNN network, the autocorrelation characteristics and timing characteristics of network data traffic are combined to achieve the accurate prediction of network data traffic. Fig. 3 shows the Neural network structure of LSTM and DNN .
  3 Network Traffic Prediction
  In order to verify the accuracy of network traffic prediction model, three real network traffic data sets were selected for experiment. Data set A comes from the network traffic history data of network service provider in 11 European cities. The data was collected from 06:57 am on June 7, 2004 to 11:17 am on June 29, 2004 and collected every 30 seconds. Data set B comes from the web traffic history data from the United Kingdom Education and Research Networking Associations (UKERNA) website for academic research. Data was collected from 9:30 am November 2004 to 11:11 am on January 27, 2005, every five minutes. Data set C comes from the network traffic data collected by the backbone node of China’s education network, Beijing University of Posts and Telecommunications, in 2012. Data is collected every minute   Based on the period and time of the three experimental data sets, this paper only considers the first two prediction types. According to Cortezs selection of time granularity [16], 5 minutes and 1 hour were used as current?time prediction and short?term prediction of time granularity. When we divide the training set and test set, we use the first two thirds of the data set for training, and then one third to test the accuracy of the model.
  In order to evaluate the accuracy of the prediction model, the mean absolute percentage error (MAPE) is used as the evaluation indicator of the model [17]:
  As shown in Eq. (8), the average absolute percentage error is a commonly used index for estimating the accuracy of the model. In Eq. (8), [y] represents the real value of data, [Yn] represents predicted value, and [N] is the test machine size. The smaller the error, the better the prediction.
  Fig. 4 shows the autocorrelation analysis of these data sets. For the data of 5minute granularity of time, the degree of autocorrelation is high when [k=288]. And for the data of 1 hour for particle size, [k=24] and [k=168] were the two peaks. This indicates that the autocorrelation cycle of network traffic is 24 hours and 7 days, which is correspond with our characteristics of network traffic data.
  In this paper, echo state network (ESN) is used as a comparison of LSTM model. The ESN is also a form of RNN, consisting of input layers, hidden layers, and output layers, and has a connection between the hidden layer and the hidden layer to retain information from the previous moments. Unlike RNN, the connection statuses of neurons in hidden layers are random, however, the connection weights are changeless. In the course of training, we only need to train the connection weights of the hidden layer to the output layer. After experiment, the parameters of the ESN model in Table 1 are finally selected:
  The neurons in the LSTM model were treated by Dropout in order to prevent the overfitting. Dropout was first proposed by Hintion [18], and the main idea was to let the neural network randomly make some neurons not work during the training. It is proved by experiment that this can effectively prevent the overfitting and improve the generalization ability of the model. Zaremba [19] improved the Dropout rate so that it could be applied to RNN. In the model of this paper, the probability that each neuron does not work is 10% during the training. Table 2 shows LSTM neural network parameters.   In the LSTM?DNN model, the parameters of the LSTM neural network are consistent with Table 2. Depending on the time granularity of the data set, the values of different autocorrelation and the calculated value of LSTM feedforward algorithm were selected and entered into DNN. For example, a data set that time granularity is 5 minutes, [xj-287,xj-288,xj-289] is a good indicator of the data volume characteristics. Therefore, put these data input current?time network traffic prediction model, the result we get is predicted traffic data. In a data set of time granularity 1 hour, the data [xj-23,xj-24,xj-25] is fed into the current?time network traffic prediction model to predict [xj].
  The average absolute percentage errors of the three algorithms are shown in Table 3.
  The relationship between loss function and training frequency in LSTM and LSTM?DNN is shown in Fig. 5.
  Due to the limitation of data set length, the test set divided by the data set C is not satisfied with the requirements of LSTM?DNN training and prediction. It can be seen from Table 3 that LSTM can effectively predict network traffic and have good performance at different time granularity of two data sets. According to the optimized LSTM?DNN model, the autocorrelation of data is added; when the time granularity was 5 m, the accuracy was slightly increased, but the accuracy of the prediction was significantly improved when the time granularity was 1 h. This indicates that the autocorrelation makes up for the insufficient data training that LSTM requires. The LSTM?DNN model has some advantages over traditional LSTM in the case of coarse time granularity or less data volume. We can see from Fig. 5 that as the complexity of neural network layer increases and the number of layers increases, LSTM?DNN is relatively slow and not easy to converge compared to LSTM in the training effect. Figs. 6-9 show the prediction results of LSTM?DNN.
  4 Conclusions
  Aiming at implementing the network traffic with autocorrelation, this paper proposes a model of neural network which combines LSTM with DNN. By using several real data sets our experiments show that LSTM can be used as a timing sequence forecast model well. It can provide better performance than the other traditional model. Making use of the autocorrelation features, the neural network of LSTM and DNN has certain advantages in the accuracy of the large granularity data sets, but there is a requirement for the data size. High accuracy network traffic prediction provides some support to deal with possible network congestion, abnormal attack, etc. LSTM is just one of the many variations of RNN. Some of other RNN models are also popular and well represented in many problems, such as Gated Recurrent Unit (GRU). The next step is to explore different RNN structures and combine the characteristics of network traffic data to explore the possibility of further improving the prediction accuracy.   References
  [1] BABIARZ R, BEDO J. Internet Traffic Midterm Forecasting: a Pragmatic Approach Using Statistical Analysis Tools [M]//BOAVIDA F, PLAGEMANN T, STILLER B, et al. (eds). Networking 2006. Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; Mobile and Wireless Communications Systems. Lecture Notes in Computer Science: vol 3976. Berlin, Heidelberg, Germany: Springer, 2006: 111-121
  [2] ALARCON?AQUINO V, BARRIA J A. Multiresolution FIR Neural?Network?Based Learning Algorithm Applied to Network Traffic Prediction [J]. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 2006, 36(2): 208-220. DOI: 10.1109/tsmcc.2004.843217
  [3] KRISHNAMURTHY B, SEN S, ZHANG Y, et al. Sketch?Based Change Detection: Methods, Evaluation, and Applications [C]//3rd ACM SIGCOMM Conference on Internet Measurement. Miami Beach, USA, 2003: 234-247. DOI: 10.1145/948205.948236
  [4] YANG M M, ZHU T Q, ZHOU W L, et al. Attacks and Countermeasures in Social Network Data Publishing [J]. ZTE Communications, 2016, 14(S0): 2-9. DOI: 10.3969/j.issn.1673?5188.2016.S0.001
  [5] CHRISTIAN J, MOHAMED B. A Software?Defined Approach to IoT Networking [J]. ZTE Communications, 2016, 14(1): 61-68. DOI: 10.3969/j.issn.1673?5188.2016.01.009
  [6] GROSCHWITZ N K, POLYZOS G C. A Time Series Model of Long?Term NSFNET Backbone Traffic [C]//International Conference on Communications. New Orleans, USA, 1994: 1400-1404. DOI: 10.1109/ICC.1994.368876
  [7] BASU S, MUKHERJEE A, KLIVANSKY S. Time Series Models for Internet Traffic [C]//Conference on Computer Communications. San Francisco, USA, 1996: 611-620. DOI: 10.1109/INFCOM.1996.493355
  [8] SHU Y T, WANG L, ZHANG L F, etc. Internet Network Business Forecast Based on FARIMA Model [J]. Chinese Journal of Computers, 2001, 24(1): 46-54
  [9] LIU X W, FANG X M, QIN Z H, et al. A Short?Term Forecasting Algorithm for Network Traffic Based on Chaos Theory and SVM [J]. Journal of Network and Systems Management, 2011, 19(4): 427-447. DOI: 10.1007/s10922?010?9188?3
  [10] WANG H F, HU D J. Comparison of SVM and LS?SVM for Regression [C]//International Conference on Neural Networks and Brain. Beijing, China, 2005: 279-283. DOI: 10.1109/ICNNB.2005.1614615
  [11] BENGIO Y, SIMARD P, FRASCONI P. Learning Long?Term Dependencies with Gradient Descent is Difficult [J]. IEEE Transactions on Neural Networks, 1994, 5(2): 157-166. DOI: 10.1109/72.279181   [12] HOCHREITER S. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions [J]. International Journal of Uncertainty, Fuzziness and Knowledge?Based Systems, 1998, 6(2): 107-116. DOI: 10.1142/s0218488598000094
  [13] HOCHREITER S, SCHMIDHUBER J. Long Short?Term Memory [J]. Neural Computation, 1997, 9(8): 1735-1780. DOI: 10.1162/neco.1997.9.8.1735
  [14] COLAH. Understanding LSTM Networks [EB/OL]. (2015?08?27). http://colah.github.io/posts/2015?08?Understanding?LSTMs
  [15] DING X, CANU S, DENOEUX T. Neural Network Based Models for Forecasting [C]//Proc. ADT’95. New York, USA: Wiley and Sons, 1995: 243-252
  [16] CORTEZ P, RIO M, ROCHA M, et al. Multi?Scale Internet Traffic Forecasting Using Neural Networks and Time Series Methods [J]. Expert Systems, 2010. DOI: 10.1111/j.1468?0394.2010.00568.x
  [17] FARIA A E Jr. Review of Forecasting: Methods and Applications (by MAKRIDAKIS S, WHEELWRIGHT S C, HYNDMAN R J, Third Edition) [J]. International Journal of Forecasting, 2002, 18(1): 158-59. DOI: 10.1016/s0169?2070(01)00130?3
  [18] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving Neural Networks by Preventing Co?Adaptation of Feature Detectors [EB/OL]. (2012?07?03). https://arxiv.org/abs/1207.0580
  [19] ZAREMBA W, SUTSKEVER I, VINYALS O. Recurrent Neural Network Regularization [EB/OL]. (2014?09?08). https://arxiv.org/abs/1409.2329
其他文献
心血管疾病是全球五大慢性病之一(癌症、糖尿病、精神类疾病、心血管疾病、呼吸系统疾病),在我国的发病率逐年升高。有资料报道,到2011年底为止,心脑血管疾病已成为我国居民致死、致残的“头号健康杀手”。研究证实,导致心血管系统损害的重要原因之一是氧化应激,氧化应激可以直接导致心肌损害、心肌细胞坏死和动脉粥样硬化。因此,凡是能够抗氧化的食物和药物都成为热点研究目标,可见人们对抗氧化剂寄予厚望。在众多的抗
能源是现代社会的重要物质基础和动力.能源生产和利用技术的每一次变革,都带来了生产力的巨大解放和生产关系的深刻调整.蒸汽机让煤炭取代柴薪,催生了第一次工业革命;内燃机
期刊
Estimating time-selective millimeter wave wireless channels and then deriving the optimum beam alignment for directional antennas is a challenging task. To solv
职业教育是我国现行教育体系中的一个重要组成部分,职业学校学生大多文化基础知识比较薄弱,学习缺乏动力,缺乏自信。我们应不断探讨职业学校如何实施教学、如何培养实践技能
社会在不断进步,信息技术也越来越多的融入我们的日常生活.随着信息技术融入到小学课堂中,小学阶段的教学质量及学生的知识面都在不断增长.这种更新颖的、画面性的教学方式更
气候变化可能引起气候变量的分布在各方面的变化,其中包括均值、自变量的个数以及极端值的出现。  在本文中,我们建立时空分位数回归模型,以期通过一个灵活的、可理解的方法来
在移动互联网助推的创业浪潮下,更多的创业者满怀激情地投身于创业.作为高风险的社会实践,创业者需要有效的方法论作为指导,而精益创业方法论的提出就适时地满足了这一需求,
说到高血压与性生活之间的关系,您可能知道高血压患者的注意事项有一条是“房事不可过度”。事实上,性生活立即威胁高血压患者健康的可能(比如突发心脏病)倒是其次,最重要的