PartialRC: A Partial Recomputing Method for Efficient Fault Recovery on GPGPUs

来源 :计算机科学技术学报（英文版） | 被引量 : 0次 | 上传用户：lenvy11

【摘要】

：

GPGPUs are increasingly being used to as performance accelerators for HPC (High Performance Computing) applications in CPU/GPU heterogeneous computing systems,i

【作者】

：

Xin-Hai Xu Xue-Jun Yang Jing-L

【机构】

：

National Laboratory for Parallel and Distributed Processing,Programming Languages and Compilers Grou

【出处】

：

计算机科学技术学报（英文版）

【发表日期】

：

2004年期

【关键词】

：

GPGPU partial recomputing fault tolerance CUDA checkpointing

【基金项目】

：

国家自然科学基金

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

GPGPUs are increasingly being used to as performance accelerators for HPC (High Performance Computing) applications in CPU/GPU heterogeneous computing systems,including TianHe-lA,the world's fastest supercomputer in the TOP500 list,built at NUDT (National University of Defense Technology) last year.However,despite their performance advantages,GPGPUs do not provide built-in fault-tolerant mechanisms to offer reliability guarantees required by many HPC applications.By analyzing the SIMT (single-instruction,multiple-thread) characteristics of programs running on GPGPUs,we have developed PartialRC,a new checkpoint-based compiler-directed partial recomputing method,for achieving efficient fault recovery by leveraging the phenomenal computing power of GPGPUs.In this paper,we introduce our PartialRC method that recovers from errors detected in a code region by partially re-computing the region,describe a checkpoint-based faulttolerance framework developed on PartialRC,and discuss an implementation on the CUDA platform.Validation using a range of representative CUDA programs on NVIDIA GPGPUs against FullRC (a traditional full-recomputing CheckpointRollback-Restart fault recovery method for CPUs) shows that PartialRC reduces significantly the fault recovery overheads incurred by FulIRC,by 73.5％ when errors occur earlier during execution and 74.6％ when errors occur later on average.In addition,PartialRC also reduces error detection overheads incurred by FullRC during fault recovery while incurring negligible performance overheads when no fault happens.

其他文献

上行开采顶板煤巷围岩稳定性控制技术研究

针对淮南矿区煤层群上行卸压开采被保护层回采巷道围岩控制的技术难题,采用数值模拟和物理模拟方法研究了下部煤层卸压开采后项板巷道应力场和裂隙场时空演化规律,以及卸压开

期刊

上行开采顶板巷道新型"三高"锚杆立体式锚索梁承载结构

跨文化交际在大学英语教学中发挥的作用

语言研究人文化是八十年代以来中国学术界一个研究热点.语言与文化密不可分,语言是文化的载体,文化的发展丰富了语言.作为外语教师应该熟悉跨文化交际的相关知识,在教学中有

期刊

大学英语教学语言与文化跨文化交际意义作用

PLC对交流提升系统的改造

应用PLC控制系统,对传统的TKD电气控制的交流提升绞车进行改造,提高了提升系统的安全性.

期刊

PLC控制系统提升机改造

对植树造林的技术研讨

生态重建已经成为发展的必要手段和重要目标，当前应该推广植树造林技术，这样可以在人工干预的前提下，以更为科学的方式达到生态和生存环境的快速和合理建设。本研究根据林业工作

期刊

植树造林清理整地植苗法分殖法插条法

浅议我国中小城镇污水处理

中小城镇由于发展基础较差、经济水平薄弱的原因,造成了中小城镇水污染控制的特殊性.本文针对我国小城镇的实际情况,介绍了在中小城镇易于推广的经济、高效、节能和简便易行

期刊

中小城镇水污染控制水质

Acid Blue 25 adsorption on base treated Shorea dasyphylla sawdust: Kinetic,isotherm, thermodynamic a

The potential of base treated Shorea dasyphylla (BTSD) sawdust for Acid Blue 25 (AB 25) adsorption was investigated in a batch adsorption process.Various physio

期刊

Acid Blue 25adsorptionShorea dasyphylla sawdustspectroscopythermodynamic

Fractal characteristics and its application in electromagnetic radiation signals during fracturing o

The present study analyzed the electromagnetic radiation (EMR) time series of the destruction process of coal or rock sample under uniaxial loading and the moni

期刊

Electromagnetic radiationFracral characteristicCorrelation dimensionRock burs

Degradation behavior of 17α-ethinylestradiol by ozonation in the synthetic secondary effluent

Endocrine disrupting chemicals (EDCs) in the secondary effluent discharged from wastewater treatment plants (WWTPs) are of great concern in the process of water

期刊

ozonationEE2pHnatural organic mattersecondary effluent

综述自动化运行下港城变电站的安全问题

下文作者根据连云港地区变电站的电力动作情况，以地区的变电让自动化系统为参考，分析了变电站自动化的安全运行管理，以及为我国自动化系统在变电站中的运行提供可供参考意见和建

期刊

连云港变电站自动化安全运行

Role of support nature (γ-Al2O3 and SiO2-Al2O3) on the performances of rhenium oxide catalysts in th

The metathesis of ethylene and 2-pentene was studied as an alternative route for propylene production over Re2O7/γ-Al2O3 and Re2O7/SiO2-Al2O3 catalysts.Both NH

期刊

metathesisisomerizationpropylene production2-pentenerhenium

PartialRC: A Partial Recomputing Method for Efficient Fault Recovery on GPGPUs

与本文相关的学术论文