论文部分内容阅读
数据流与存贮表的连接查询经常出现在主动式数据仓库的维护中,与传统的关系数据库的连接计算不同,数据流快速处理的要求不允许将数据流先存入磁盘再计算,而计算机内存无法存储无界增长的数据流,因此数据流查询采用先处理再存储结果的计算方式。数据流与存贮表的连接计算算法重点要解决内存开销和处理速率二个问题。MESHJOIN算法最早提出将存贮表划分为若干个数据块,将数据块交替放入内存与数据流窗口完成连接计算。在MESHJOIN算法思想的基础上将存贮表的内存数据块也划分为若干逻辑分区,每次连接计算仅替换其中的一个逻辑分区,有效地降低了数据流滑动窗口所需的I/O代价,从而提高滑动窗口的计算速率。最后通过实验对二种算法在内存开销和计算速率进行了比较。
Data stream and storage table connection query often appears in the active data warehouse maintenance, and the traditional relational database connection calculation is different, the data stream processing requirements do not allow the data stream to be stored in the disk and then calculate, and the computer Memory can not store unbounded growth of data flow, so the data flow query using the first treatment and then store the results of the calculation. Data stream and storage table connection calculation algorithm to solve the memory overhead and processing rate of two issues. MESHJOIN algorithm first proposed the storage table is divided into several data blocks, the data blocks alternately placed in memory and data flow window to complete the connection calculation. On the basis of MESHJOIN algorithm, memory data blocks of storage table are also divided into several logical partitions. Each connection calculation replaces only one of the logical partitions, which effectively reduces the I / O cost required for data flow sliding window, Thus increasing the computing speed of the sliding window. Finally, the two algorithms are compared in terms of memory overhead and computing speed.