论文部分内容阅读
随着Web上用户访问信息的不断增加,特别是Web服务器可提供大量的日志文件,使得有可能对这些大数据集进行知识挖掘,例如,对用户未来的访问进行预测.提出了一种利用服务器日志文件,运用N元(N-gram)预测模型对用户未来可能进行的Web访问请求进行预测.这种模型会选择性地对用户可预测的请求进行预测,从而大大提高了预测精度.实验证明,在自然语言中普遍适用的N元预测模型同样适用于网页预测.同时,采用了一种有效的简化手段,大大压缩了模型的大小,使得5元模型和传统的2元模型大小基本相同,而预测精度提高了1倍.该结果可以广泛地运用到Web上,包括网页的预发送、预取、推荐以及Web上的caching机制.试验是建立在真实的Web日志上的,该算法无论在预测精度上还是在可适用度上都优于以往的算法.
With the increasing number of user access information on the Web, in particular, the Web server can provide a large number of log files, making it possible to carry out knowledge mining on these large data sets, for example, to predict the future access of users. Log file to predict the possible future Web access requests by using the N-gram prediction model, which predicts users’ predictable requests selectively and greatly improves the prediction accuracy.Experimental evidence , The N-predictive model that is generally applicable in natural language is also suitable for webpage prediction.At the same time, an effective simplification method is adopted to greatly reduce the size of the model, making the 5-element model and the traditional 2-element model basically the same in size, While the prediction accuracy is doubled.The results can be widely used on the Web, including pre-sending web pages, prefetching, recommendation and caching mechanism on the Web.The experiment is based on real Web logs, Prediction accuracy or applicability are better than the previous algorithm.