Towards Efficient Short-Range Pair Interaction on Sunway Many-Core Architecture

来源 :计算机科学技术学报(英文版) | 被引量 : 0次 | 上传用户:keioy
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
The short-range pair interaction consumes most of the CPU time in molecular dynamics(MD)simulations.The inherent computation sparsity makes it challenging to achieve high-performance kernel on the emerging many-core ar-chitecture.In this paper,we present a highly efficient short-range force kernel on the Sunway,a novel many-core architecture with many unique features.The parallel efficiency of this algorithm on the Sunway many-core processor is strongly limited by the poor data locality and write conflicts.To enhance the data locality,we adopt a super cluster based neighbor list with an appropriate granularity that fits in the local memory of computing cores.In the absence of a low overhead locking mechanism,using data-privatization force array is a more feasible method to avoid write conflicts,but results in the large overhead of data reduction.We adopt a dual-slice partitioning scheme for both hardware resources and computing tasks,which utilizes the on-chip data communication to reduce data reduction overhead and provide load balancing.Moreover,we exploit the single instruction multiple data(SIMD)parallelism and perform instruction reordering of the force kernel on this many-core processor.The experimental results show that the optimized force kernel obtains a performance speedup of 226x compared with the reference implementation and achieves 20%of peak flop rate on the Sunway many-core processor.
其他文献
目的基于深度学习方法开发眼前节相干光层析成像术(AS-OCT)图像分析系统,并评价其对常见角膜病变及特征的自动识别与定位效果。方法收集2011年1月至2019年8月于青岛眼科医院就诊的患者4 026例(5 617只眼),男性1 977例,女性2 049例,年龄(45±23)岁,将其AS-OCT图像作为训练集,由临床医师人工标注角膜上皮缺损、角膜上皮增厚、角膜变薄等16种特征的类型和位置,以及角膜上
目的探讨急性、迟发性及慢性假体周围感染(PJI)的病原菌分布与耐药性情况。方法回顾性分析2010年8月至2020年8月新疆医科大学第一附属医院关节外科收治的316例行初次髋、膝人工关节置换术后PJI患者的临床资料。其中男性146例,女性170例,年龄(62.3±14.2)岁(范围:22~89岁)。全髋关节置换术后患者161例,全膝关节置换术后患者155例。将患者根据术后感染时间分为急性PJI组(6
Byte-addressable persistent memory(B-APM)presents a new opportunity to bridge the performance gap between main memory and storage.In this paper,we present the u
Non-volatile memory(NVM)provides a scalable and power-efficient solution to replace dynamic random access memory(DRAM)as main memory.However,because of the rela
目的探讨超声监测下手风琴技术促进胫骨搬移术后对合端骨愈合的临床效果和影像学表现。方法回顾性分析2018年5月至2019年10月山西医科大学第二医院骨科收治的16例在超声监测下利用手风琴技术促进对合端骨愈合的胫骨搬移术后患者的临床资料。男性14例,女性2例,年龄(45.3±14.3)岁(范围:6~61岁)。在实施骨搬移之前,胫骨骨缺损长度为(6.0±2.6)cm(范围:2.0~12.1 cm)。手风
Non-Volatile Main Memories(NVMMs)have recently emerged as a promising technology for future memory systems.Generally,NVMMs have many desirable properties such a
目的探讨胆管闭锁患儿首次肝移植术中大量出血的相关因素及其对预后的影响。方法回顾性分析2015年1月至2018年12月在天津市第一中心医院儿童器官移植科行肝移植的613例胆管闭锁患儿的临床资料。男270例,女343例;受者年龄为7.4(3.9)个月(范围:3.2~148.4个月);体重为(7.8±3.5)kg(范围:4.0~43.3 kg)。以所有移植患儿的第85分位估计出血量(EBL)(74 ml
Genomic sequence alignment is the most critical and time-consuming step in genomic analysis.Alignment al-gorithms generally follow a seed-and-extend model.Accel
Iterative control structures allow the repeated execution of tasks,activities or sub-processes according to the given conditions in a process model.Iterative co
Communication and coordination between open source software(OSS)developers who do not work physically in the same location have always been the challenging issu