Automatic tuning of sparse matrix-vector multiplication on multicore clusters

来源 :Science China(Information Sciences) | 被引量 : 0次 | 上传用户:agony2013
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
To have good performance and scalability, parallel applications should be sophisticatedly optimized to exploit intra-node parallelism and reduce inter-node communication on multicore clusters. This paper investigates the automatic tuning of the sparse matrix-vector(Sp MV) multiplication kernel implemented in a partitioned global address space language, which supports a hybrid thread- and process-based communication layer for multicore systems. One-sided communication is used for inter-node data exchange, while intra-node communication uses a mix of process shared memory and multithreading. We develop performance models to facilitate selecting the best configuration of threads and processes hybridization as well as the best communication pattern for Sp MV. As a result, our tuned Sp MV in the hybrid runtime environment consumes less memory and reduces inter-node communication volume, without damaging the data locality. Experiments are conducted on 12 real sparse matrices. On 16-node Xeon and 8-node Opteron clusters, our tuned Sp MV kernel gets on average 1.4X and 1.5X improvement in performance over the well-optimized process-based message-passing implementation, respectively. To have good performance and scalability, parallel applications should be sophisticatedly optimized to exploit intra-node parallelism and reduce inter-node communication on multicore clusters. This paper investigates the automatic tuning of the sparse matrix-vector (Sp MV) multiplication kernel implemented in a partitioned global address space language, which supports a hybrid thread- and process-based communication layer for multicore systems. One-sided communication is used for inter-node data exchange, while intra-node communication uses a mix of process shared memory and multithreading. We a develop performance models to facilitate selecting the best configuration of threads and processes hybridization as well as the best communication pattern for Sp MV. As a result, our tuned Sp MV in the hybrid runtime environment consumes less memory and reduces inter-node communication volume, without damaging the data locality. Experiments are conducted on 12 real sparse matrices. On 16-node Xeon and 8-node Opteron clusters, our tuned Sp MV kernel gets on average 1.4X and 1.5X improvement in performance over the well-optimized process-based message-passing implementation, respectively.
其他文献
目的:探讨血府逐瘀汤、蝮蛇抗栓酶及神经生长因子联合应用对多发性脑梗死的影响。方法:采用微栓子栓塞阻断法,建立多发性脑梗死大鼠模型,通过大鼠海马的组织病理学改变,探讨
作文教学是语文教学工作中的一个“老大难”问题,怎样才能使小学生的作文能力得到提高呢,本文对此进行了深入分析。 Writing teaching is a “chronic difficulty” problem
本文介绍了省交警指挥中心大直径人工挖孔桩工程的设计与施工方法,阐述了从该工程上部结构和地质资料出发,选用大直径人工挖孔桩的优点。认为在车工程中采用大直径挖孔桩是较
请下载后查看,本文暂不支持在线获取查看简介。 Please download to view, this article does not support online access to view profile.
期刊
根据美国国际数据公司本月公布的一份研究报告,中国信息技术市场的规模到2000年可望达到150亿美元,从而将成为除日本以外的亚太地区最大的信 According to a research repo
全国高等教育自学考试自1980年开办至今,累计获得大、中专毕业文凭的人数超过百万,其中通过自学考试获大专及本科毕业文凭的人数为90多万人,中专10万4千人。据统计,目前经全
秦法规定,官吏可以获得不同等级的俸禄与稟食,配有仆、养等供其差遣者。官吏出差时,有车、马、船等交通工具及随从、食宿待遇。秦在传舍、邮与亭外,还在市旁设置候馆,为官吏
本刊讯中国企业步入电子商务时代是一个大势所趋,但目前,我国中小型企业在进入互联网时还面临着一系列具体的困难,因而大大制约了企业的发展速度。近日,信海科技公司专门针
有关部门预测,今后5年内建筑陶瓷市场将持续旺销,表现如下。1.各生产厂家为适应消费者的需求,对陶瓷产品的结构作了合理的调整,尽可能从生产日用陶瓷转向生产高科技陶瓷,并
早期秦文化的研究一直是学术界关注的热点问题之一,目前在秦史学界可谓显学。从司马迁撰写《史记·秦本纪》开始,就有学者不断进行研究,寻找秦文化的早期渊源。但是过去苦于