PartialRC: A Partial Recomputing Method for Efficient Fault Recovery on GPGPUs

来源 :计算机科学技术学报(英文版) | 被引量 : 0次 | 上传用户:lenvy11
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
GPGPUs are increasingly being used to as performance accelerators for HPC (High Performance Computing) applications in CPU/GPU heterogeneous computing systems,including TianHe-lA,the world's fastest supercomputer in the TOP500 list,built at NUDT (National University of Defense Technology) last year.However,despite their performance advantages,GPGPUs do not provide built-in fault-tolerant mechanisms to offer reliability guarantees required by many HPC applications.By analyzing the SIMT (single-instruction,multiple-thread) characteristics of programs running on GPGPUs,we have developed PartialRC,a new checkpoint-based compiler-directed partial recomputing method,for achieving efficient fault recovery by leveraging the phenomenal computing power of GPGPUs.In this paper,we introduce our PartialRC method that recovers from errors detected in a code region by partially re-computing the region,describe a checkpoint-based faulttolerance framework developed on PartialRC,and discuss an implementation on the CUDA platform.Validation using a range of representative CUDA programs on NVIDIA GPGPUs against FullRC (a traditional full-recomputing CheckpointRollback-Restart fault recovery method for CPUs) shows that PartialRC reduces significantly the fault recovery overheads incurred by FulIRC,by 73.5% when errors occur earlier during execution and 74.6% when errors occur later on average.In addition,PartialRC also reduces error detection overheads incurred by FullRC during fault recovery while incurring negligible performance overheads when no fault happens.
The potential of base treated Shorea dasyphylla (BTSD) sawdust for Acid Blue 25 (AB 25) adsorption was investigated in a batch adsorption process.Various physio
The present study analyzed the electromagnetic radiation (EMR) time series of the destruction process of coal or rock sample under uniaxial loading and the moni
Endocrine disrupting chemicals (EDCs) in the secondary effluent discharged from wastewater treatment plants (WWTPs) are of great concern in the process of water
The metathesis of ethylene and 2-pentene was studied as an alternative route for propylene production over Re2O7/γ-Al2O3 and Re2O7/SiO2-Al2O3 catalysts.Both NH