Automatic tuning of sparse matrix-vector multiplication on multicore clusters

来源 :Science China(Information Sciences) | 被引量 : 0次 | 上传用户:agony2013
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
To have good performance and scalability, parallel applications should be sophisticatedly optimized to exploit intra-node parallelism and reduce inter-node communication on multicore clusters. This paper investigates the automatic tuning of the sparse matrix-vector(Sp MV) multiplication kernel implemented in a partitioned global address space language, which supports a hybrid thread- and process-based communication layer for multicore systems. One-sided communication is used for inter-node data exchange, while intra-node communication uses a mix of process shared memory and multithreading. We develop performance models to facilitate selecting the best configuration of threads and processes hybridization as well as the best communication pattern for Sp MV. As a result, our tuned Sp MV in the hybrid runtime environment consumes less memory and reduces inter-node communication volume, without damaging the data locality. Experiments are conducted on 12 real sparse matrices. On 16-node Xeon and 8-node Opteron clusters, our tuned Sp MV kernel gets on average 1.4X and 1.5X improvement in performance over the well-optimized process-based message-passing implementation, respectively. To have good performance and scalability, parallel applications should be sophisticatedly optimized to exploit intra-node parallelism and reduce inter-node communication on multicore clusters. This paper investigates the automatic tuning of the sparse matrix-vector (Sp MV) multiplication kernel implemented in a partitioned global address space language, which supports a hybrid thread- and process-based communication layer for multicore systems. One-sided communication is used for inter-node data exchange, while intra-node communication uses a mix of process shared memory and multithreading. We a develop performance models to facilitate selecting the best configuration of threads and processes hybridization as well as the best communication pattern for Sp MV. As a result, our tuned Sp MV in the hybrid runtime environment consumes less memory and reduces inter-node communication volume, without damaging the data locality. Experiments are conducted on 12 real sparse matrices. On 16-node Xeon and 8-node Opteron clusters, our tuned Sp MV kernel gets on average 1.4X and 1.5X improvement in performance over the well-optimized process-based message-passing implementation, respectively.
作文教学是语文教学工作中的一个“老大难”问题,怎样才能使小学生的作文能力得到提高呢,本文对此进行了深入分析。 Writing teaching is a “chronic difficulty” problem
请下载后查看,本文暂不支持在线获取查看简介。 Please download to view, this article does not support online access to view profile.
根据美国国际数据公司本月公布的一份研究报告,中国信息技术市场的规模到2000年可望达到150亿美元,从而将成为除日本以外的亚太地区最大的信 According to a research repo