论文部分内容阅读
大数据时代背景下,越来越多领域对大数据计算提出了高要求,尤其各行各业产生的大数据更多地是一种动态的流式数据形态,因此,实现实时、快速、高效的大数据流计算与分析日益紧要.在线机器学习算法是解决实时大数据流分析的有效方案.在机器学习算法中,通过核学习能够获得有效的核函数,而所选核函数又对核学习器的性能有很大影响.结合在线机器学习与核函数研究一种适用于大数据流环境下的多任务在线学习算法,探讨了算法过程中可能出现的扰动项,应用数据依赖核的构建方法提高了算法的广泛性.算法不需要对历史数据流进行存储和重新扫描,只需选择一个数据集样本,在分析新的流式大数据时能够在可接受时间内直接将当前核函数更新为最合适的核函数,非常适合应用于流式大数据环境下的核学习问题.
In the era of big data, more and more fields put forward high requirements for big data computing. In particular, big data generated from all walks of life is more of a dynamic form of streaming data. Therefore, real-time, fast and efficient Big data stream computing and analysis is increasingly important.Online machine learning algorithm is an effective solution to real-time big data flow analysis.In the machine learning algorithm, an effective kernel function can be obtained through kernel learning, Of the performance of a great impact.Combined with online machine learning and kernel function of a multi-task online learning algorithm suitable for large data stream environment, the possible disturbances in the process of the algorithm is discussed, the application of data-dependent core building method to improve The algorithm does not need to store and rescan the historical data stream. Instead of storing and rescaling the historical data stream, the algorithm only needs to select a data set sample to update the current kernel function to the nearest acceptable time when analyzing the new streaming big data The suitable kernel function is very suitable for the nuclear learning problem in streaming big data environment.