Memory E?cient Two-Pass 3D FFT Algorithm for Intelr Xeon PhiTM Coprocessor

来源 :计算机科学技术学报（英文版） | 被引量 : 0次 | 上传用户：cool_king_wq

【摘要】

：

Equipped with 512-bit wide SIMD instructions and large numbers of computing cores, the emerging x86-based Intelr Many Integrated Core (MIC) Architecture provide

【作者】

：

刘益群李焱张云泉张先轶

【出处】

：

计算机科学技术学报（英文版）

【发表日期】

：

2014年6期

【关键词】

：

3D-FFT memory e?cient many-core Many Integrated Core Intelr Xeon PhiTM

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

Equipped with 512-bit wide SIMD instructions and large numbers of computing cores, the emerging x86-based Intelr Many Integrated Core (MIC) Architecture provides not only high floating-point performance, but also substantial off-chip memory bandwidth. The 3D FFT (three-dimensional fast Fourier transform) is a widely-studied algorithm;however, the conventional algorithm needs to traverse the data array three times. In each pass, it computes multiple 1D FFTs along one of three dimensions, giving rise to plenty of non-unit strided memory accesses. In this paper, we propose a two-pass 3D FFT algorithm, which mainly aims to reduce the amount of explicit data transfer between the memory and the on-chip cache. The main idea is to split one dimension into two sub-dimensions, and then combine the transform along each sub-dimension with one of the rest dimensions respectively. The difference in amount of TLB misses resulting from decomposition along different dimensions is analyzed in detail. Multi-level parallelism is leveraged on the many-core system for a high degree of parallelism and better data reuse of local cache. On top of this, a number of optimization techniques, such as memory padding, loop transformation and vectorization, are employed in our implementation to further enhance the performance. We evaluate the algorithm on the Intelr Xeon PhiTM coprocessor 7110P, and achieve a maximum performance of 136 Gflops with 240 threads in o?oad mode, which beats the vendor-specific Intelr MKL library by a factor of up to 2.22X.

其他文献

An Intra-Server Interconnect Fabric for Heterogeneous Computing

With the increasing diversity of application needs and computing units, the server with heterogeneous pro-cessors is more and more widespread. However, conventi

期刊

heterogeneous systeminterconnectionI/O virtualizationPCI-express

梅河口市联社送贷上门助力企业/吉林省农村信用联社辽源办事处召开当前重点工作分析会/辽源市郊区农村信用联社开展财务会计培训夯实基础

期刊

梅河口市企业吉林省农村信用联社办事处工作分析辽源市郊区财务会计培训

读片窗

患者,男性,43岁.无意中发现腹部包块入院.查体:左中上腹部扪及肿块.边缘光整,质中,活动度差,轻压痛.腹壁静脉无曲张.实验室常规检查正常.CT平扫,左肾上腺区示约10 cm×12 cm

期刊

左肾上腺高密度腹膜后肿大淋巴结占位性病变肿块邻近结构结果分析常规检查腹部包块实验室上腹部活动度分叶状低密度诊断移位胰尾平扫

柳河联社个贷部正式成立/柳河联社举办信贷管理系统培训班/强化措施加强考核圣水社“五步走”化解信贷风险

期刊

柳河信贷管理系统培训班强化措施加强考核

湾石湖镇多措并举严抓春季防火

春雪融化,山林干枯,这时正是防火的紧急时刻,通化县石湖镇有2.5万公顷林地,森林覆盖率高达93.5%,是国家和省重点火险区,森林防火具有点多、面广、战线长、火险等级高、工作难度大等特点。石湖镇政府始终把森林防火作为全镇的中心工作和保底工作,认真谋划,精心组织,强化基础,狠抓落实,森林火灾受害率始终控制在0.03‰以下,取得了连续60年无重大森林火灾的好成绩。　　一是加强宣传教育,提高全民防火意

白长川主任医师妙用消化系统引经方浅析

该文从挂篮荷载计算、施工流程、支座及临时固结施工、挂篮安装及试验、合拢段施工、模板制作安装、钢筋安装、混凝土的浇筑及养生、测量监控等方面人手,介绍了S226海滨大桥

期刊

白长川名中医消化系统

《小村夕阳》布面油画

请下载后查看，本文暂不支持在线获取查看简介。 Please download to view, this article does not support online access to view profile.

期刊

布面油画

缺氧诱导因子HIF-1α在胃癌细胞系中的表达和意义

目的　研究缺氧诱导因子HIF-1α在人多种胃癌细胞系中的表达及意义。方法　分别利用RT-PCR和Westernblot的方法检测多种胃癌细胞系中HIF-1α的表达水平。结果　常氧条件下,在

期刊

缺氧诱导因子-1基因表达蛋白表达胃肿瘤

面向间隔告警的多步网络异常定量关联方法

为准确判断复杂多步攻击的意图和下一步攻击行为,需要对入侵告警进行定量关联分析.针对复杂多步攻击产生的告警在序列中经常间隔出现的实际,提出一种间隔告警定量关联方法.利

期刊

多步攻击关联间隔告警频繁序列挖掘马尔可夫性质转移概率矩阵

假性甲状旁腺功能减退1例报告

患者 ,男 ,10岁。间歇性抽搐 7年 ,步态不稳 4年。查体发育畸形。四肢粗短 ,掌骨短小 ,以右手第 4、5掌骨为显著 ,智力障碍。血液化验 :血钙降低 1.0ｍｍｏｌ/Ｌ ,血磷 3 .0ｍｍｏｌ/Ｌ ,ＡＬＰ增高 40

期刊

甲状旁腺功能减退智力障碍掌骨血液化验发育畸形间歇性血钙四肢患者抽搐查体步态

Memory E?cient Two-Pass 3D FFT Algorithm for Intelr Xeon PhiTM Coprocessor

与本文相关的学术论文