【摘 要】
:
In this paper we present the programming of the Linpack benchmark on TianHe-1 system,the first petascale supercomputer system of China,and the largest GPU-accel
【机 构】
:
School of Computer Science, National University of Defense Technology, Changsha 410073, China
论文部分内容阅读
In this paper we present the programming of the Linpack benchmark on TianHe-1 system,the first petascale supercomputer system of China,and the largest GPU-accelerated heterogeneous system ever attempted before.A hybrid programming model consisting of MPI,OpenMP and streaming computing is described to explore the task parallel,thread parallel and data parallel of the Linpack.We explain how we optimized the load distribution across the CPUs and GPUs using the two-level adaptive method and describe the implementation in details.To overcome the low-bandwidth between the CPU and GPU communication,we present a software pipelining technique to hide the communication overhead.Combined with other traditional optimizations,the Linpack we developed achieved 196.7 GFLOPS on a single compute element of TianHe-1.This result is 70.1% of the peak compute capability,3.3 times faster than the result by using the vendor’s library.On the full configuration of TianHe-1 our optimizations resulted in a Linpack performance of 0.563 PFLOPS,which made TianHe-1 the 5th fastest supercomputer on the Top500 list in November,2009.
其他文献
A simple and effective boundary element method for stress intensity factor calculation for crack problems in a plane elastic plate is presented. The boundary el
提出了社会系统的狭义可持续性与广义可持续性的概念与数学定义,建立了社会系统的控制论模型以及狭义可持续性约束与广义可持续性衡量标准,并对社会系统可持续发展模型进行
Based upon the Hellinger-Reissner (H-R) mixed variational principle for three-dimensional elastic bodies, the modified H-R mixed variational theorem for magneto
当前“国考”持续走热,随着“公务员热”一浪高过一浪,许多刚刚毕业的大学生便随热浪扎身于进入体制内单位的大队伍之中。虽然从目前的各行各业来看,公务员等事业单位的工资虽不
1966年5月,中共中央在北京召开了中央政治局扩大会议。会议在5月16日通过了发动“文化大革命”的纲领性文件《五一六通知》;1966年8月,毛泽东亲自发表《炮打司令部---我的一张大
We consider the instability of the cometary plasma tail which is composed of a neutral sheet, two lobes of the ion tail and solar wind. The plasma is assumed to
依据汽车排气系统的独特工作环境,从材料级别、系统级别、整车级别3个方面详细地论述了排气系统材料的验证试验,建立了系统全面的汽车排气系统材料认证的方法和流程,为汽车排
The ozone budget inside the middle stratospheric polar vortex (24-36 km) during the 2002 2003 Arctic winter is studied by analyzing Michelson Interferometer for
For the Hengsha East Shoal Promoting Siltation Project in the Yangtze River Estuary,this work developed a formula for calculating sediment carrying capacity bas
A green route using a very simple and straightforward ultrasonic process under alkaline conditions,rather than a general chemical reduction process using hydraz