论文部分内容阅读
海量流数据具有体量大、更新速度快、多维度、多属性等特点,其存储和查询是近年来学术界和工业界的研究热点之一.HBase系统为海量流数据的存储管理提供了一套具有高可扩展性的技术方法和系统平台.然而HBase仅支持主键索引,导致非主键数据查询效率较低,尤其是对于多维的数据.针对交通流数据场景提出一种具有高插入和查询效率的索引结构TA-index.TA-index考虑数据访问时的时间和空间局部性,从而更准确地获得数据的特征,通过对时间和空间的不同分类索引,减少索引的数据量,提供实时的数据分析能力.实验表明该算法效率比现有算法更优,而且具有高可扩展性,可以同时支持高吞吐量和高效多维查询.
Massive stream data has the characteristics of large volume, fast update speed, multi-dimension, multi-attribute, etc. Its storage and query are one of the hot topics in academia and industry in recent years.HBase system provides a storage management for mass flow data However, HBase only supports the primary key index, which leads to the low efficiency of non-primary key data query, especially for multi-dimensional data.Aiming at the traffic flow data scenario, this paper proposes a method with high insertion and query efficiency Index structure TA-index.TA-index Data access to consider the time and spatial locality, so as to more accurately obtain the characteristics of the data, through the classification of the index of time and space, reduce the amount of index data to provide real-time data Analysis capability.Experiments show that this algorithm is more efficient than the existing algorithms and has high scalability to support both high throughput and efficient multidimensional queries.