,Optimizing non-coalesced memory access for irregular applications with GPU computing

来源 :信息与电子工程前沿(英文版) | 被引量 : 0次 | 上传用户:a3392919
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
General purpose graphics processing units (GPGPUs) can be used to improve computing performance considerably for regular applications. However, irregular memory access exists in many applications, and the benefits of graphics processing units (GPUs) are less substantial for irregular applications. In recent years, several studies have presented some solutions to remove static irregular memory access. However, eliminating dynamic irregular memory access with software remains a serious challenge. A pure software solution without hardware extensions or offiine profiling is proposed to eliminate dynamic irregular memory access, especially for indirect memory access. Data reordering and index redirection are suggested to reduce the number of memory transactions, thereby improving the performance of GPU keels. To improve the efficiency of data reordering, an operation to reorder data is offioaded to a GPU to reduce overhead and thus transfer data. Through concurrently executing the compute unified device architecture (CUDA) streams of data reordering and the data processing keel, the overhead of data reordering can be reduced. After these optimizations, the volume of memory transactions can be reduced by 16.7%–50% compared with CUSPARSE-based benchmarks, and the performance of irregular keels can be improved by 9.64%–34.9%using an NVIDIA Tesla P4 GPU.
其他文献
本项目将人工半合成的人类胰岛素基因转化芦荟,并研究该基因在植物体内表达;试图获得有人类胰岛素表达、对糖尿病等疾病具有食疗作用的芦荟新品种,为人类疾病的防治提供一条经济有效的新途径;为以芦荟为植物反应器,生产人类胰岛素奠定基础;使芦荟这一热带药用植物资源能够更好地服务于人类。 植物表达载体的构建。选用单子叶植物启动子Ubiquitin和空白载体PUC1301,PVKH-35S-GUS-PA构建
Over the past two decades, several fluorescence- and non-fluorescence-based optical microscopes have been developed to break the diffraction limited barrier. In
小麦是世界上第一大粮食作物,在中国是第二大粮食作物.小麦白粉病是严重影响小麦生产的主要病害之一.利用抗病品种是防治该病经济、有效和环保的方法.中国现有的小麦品种中绝
We investigate the problem of fi nding optimal one-bit perturbation that maximizes the size of the basin of attractions (BOAs) of desired attractors and minimiz
In the envisioned smart grid, high penetration of uncertain renewables, unpredictable participation of (industrial) customers, and purposeful manipulation of sm
玉米抗病基因工程被认为是防治玉米病毒病的有效途径。克服玉米幼胚培养对基因型的依赖,提高玉米愈伤组织再生能力是玉米基因工程的基础。本研究从优化玉米幼胚培养遗传转化受体系统和抗病毒基因PAP的转化两个方面进行研究。以15种基因型的玉米为试验材料,通过对玉米幼胚进行变温处理显著提高了愈伤组织诱导率和整齐度,提高了愈伤组织质量;通过在分化前对愈伤组织进行干燥处理48h、震荡洗涤48h+干燥处理48h可以普
As a versatile tool for trapping and manipulating neutral particles, optical tweezers have been studied in a broad range of fields such as molecular biology, na
A grating interferometer, called the "optical encoder," is a commonly used tool for precise displacement measurements. In contrast to a laser interferometer, a
Cyber-physical systems (CPSs) are distributed assemblages of computing, communicating, and physical components that sense their environment, algorithmically ass
Deep leaing models have achieved state-of-the-art performance in named entity recognition (NER);the good performance,however,relies heavily on substantial amoun