论文部分内容阅读
为了解决大数据环境下数据日益增大且响应时间要求变短,以及串行贝叶斯分类器效率低且应用复杂度高的问题,提出了基于MapReduce的并行树增强型贝叶斯算法。本算法使用了弱化了独立性的树增强型贝叶斯算法以获得更高的分类精度,同时为了降低响应时间,引入了MapReduce模型,将本算法由串行转为并行,从而提高处理的速度。实验结果表明该算法比传统的树增强型贝叶斯算法具有更高的算法效率且随着数据节点的增加,加速比也同步增加。
In order to solve the problem of increasing data and response time in big data environment and the problem of low efficiency and high complexity of serial Bayesian classifier, a parallel tree-enhanced Bayesian algorithm based on MapReduce is proposed. This algorithm uses a tree-enhanced Bayesian algorithm that weakened independence to achieve higher classification accuracy. In order to reduce the response time, a MapReduce model is introduced to convert the algorithm from serial to parallel so as to improve the processing speed . Experimental results show that the proposed algorithm has higher efficiency than the traditional tree-enhanced Bayesian algorithm, and the speedup increases with the increase of data nodes.