论文部分内容阅读
为了解决Hadoop分布式文件系统(HDFS)平台上小文件的存在带来MapReduce程序运行能耗成本偏高问题,建立Hadoop节点集群的能耗模型进行分析推导,证明了在Hadoop平台上,存在能使程序运行能耗成本最低的最优文件大小,并在此基础上结合经济学边际分析理论提出一种基于能耗成本和访问成本考虑的最优文件大小判定策略.此策略可以对存放在HDFS上的小文件合并进行效益计算,将小文件合并为成本最优文件大小以获得最佳收益.通过实验证明了能效最优数据块大小的存在,并证明了成本和效益相结合利用边际分析理论来确定数据块大小的合理性和有效性.
In order to solve the high energy cost of MapReduce program running on small files on Hadoop Distributed File System (HDFS) platform, the energy consumption model of Hadoop node cluster is analyzed and deduced, which proves that on the Hadoop platform, Based on this, combined with the marginal analysis theory of economics, an optimal file size decision strategy based on energy cost and access cost is proposed, which can store the optimal file size stored in HDFS The small files are merged for benefit calculation and the small files are merged into the optimal cost file size to get the best benefit.The experiment proves the existence of the optimal data block size and proves that the combination of cost and benefit makes use of the theory of marginal analysis Determine the rationality and validity of the data block size.