An FFT Performance Model for Optimizing General-Purpose Processor Architecture

来源 :Journal of Computer Science & Technology | 被引量 : 0次 | 上传用户:iloveyouggyyvc
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
General-purpose processor (GPP) is an important platform for fast Fourier transform (FFT),due to its flexibility,reliability and practicality.FFT is a representative application intensive in both computation and memory access,optimizing the FFT performance of a GPP also benefits the performances of many other applications.To facilitate the analysis of FFT,this paper proposes a theoretical model of the FFT processing.The model gives out a tight lower bound of the runtime of FFT on a GPP,and guides the architecture optimization for GPP as well.Based on the model,two theorems on optimization of architecture parameters are deduced,which refer to the lower bounds of register number and memory bandwidth.Experimental results on different processor architectures (including Intel Core i7 and Godson-3B) validate the performance model.The above investigations were adopted in the development of Godson-3B,which is an industrial GPP.The optimization techniques deduced from our performance model improve the FFT performance by about 40%,while incurring only 0.8% additional area cost.Consequently,Godson-3B solves the 1024-point single-precision complex FFT in 0.368 μs with about 40 Watt power consumption,and has the highest performance-per-watt in complex FFT among processors as far as we know.This work could benefit optimization of other GPPs as well. General-purpose processor (GPP) is an important platform for fast Fourier transform (FFT), due to its flexibility, reliability and practicality. FT is a representative application intensive in both computation and memory access, optimizing the FFT performance of a GPP also benefits the performances of many other applications. To facilitate the analysis of FFT, this paper proposes a theoretical model of the FFT processing. The model gives a tight lower bound of the runtime of FFT on a GPP, and guides the architecture optimization for GPP as well.Based on the model, two theorems on optimization of architecture parameters are deduced, which refer to the lower bounds of register number and memory bandwidth. Experimental results on different processor architectures (including Intel Core i7 and Godson-3B) validate the performance model The above investigations were adopted in the development of Godson-3B, which is an industrial GPP. The optimization techniques deduced from our performance model impr ove the FFT performance by about 40% while incurring only 0.8% additional area cost. Reconstructed, Godson-3B solves the 1024-point single-precision complex FFT in 0.368 μs with about 40 Watt power consumption, and has the highest performance-per -watt in complex FFT among processors as far as we know. This work could benefit optimization of other GPPs as well.
其他文献
粘虫发生趋势预测今年1代粘虫在江淮和黄淮麦区发生面积大,局部区域虫量高,经有效防治后各地残虫密度普遍较低。结合天气条件和作物种植情况,预计今年2代粘虫总体为中等发生,
回顾性分析1993年12月至1995年12月治疗的19例颈部巨大肿块病人,分成超分割组和常规分割组治疗,总剂量、总疗程基本相同,如70~80Gy/6~8周。相比之下,超分割组局部控制率较好(有统计学意义),而二组的生存率相仿
曾有记者问国际奥委会主席罗格先生“北京奥运筹办工作的最大挑战是什么?”罗格说:“最大的挑战是北京奥组委以及中国人民一定不能松懈,自满是你们最大的敌人。直到奥运会闭
尽管肥料质量、特别是养分含量辨别起来较为困难,但购买肥料时如果记住“看、摸、嗅、烧、溶”五个字,就可以减少受骗和上当。看看包装。商品肥料的包装袋上必须注明产品名称
邯郸小麦是该区的优势粮食作物,它的丰欠直接关系到全年的粮食产量。近几年小麦赤霉病在我市的发生有逐年加重的趋势,2013年小麦播种面积564万亩,平均单产459.8公斤,但小麦赤
林老师,您好。感谢您在百忙之中接受我的专访。您是1977年恢复高考后的第一届大学生,当时您怎么会对经济学产生兴趣?在20世纪80年代早期那种大环境下,您为何选择了研究西方经济学?
大豆造桥虫(豆尺蠖)症状:以幼虫咬食豆叶,可将叶片吃成洞或缺刻,甚至吃光叶片,引起落花落荚。防治措施:(1)农业防治:冬耕田地,减少虫源;在成虫羽化初期,采用黑光灯或糖醋盆诱
国家食品药品监督管理局在2001年颁布了《医疗机构制剂配制质量管理规范》(good preparation practice,GPP),并进行了严格的实施。GPP的目标是将人为的差错降低到最低限度,防
尊重音乐理解的个性化差异,尊重不同教师的个性化要求的背后,是有着非常复杂而深刻的音乐美学层面的原因的。下面我想在这里讲一点音乐美学层面上的原理,了解了这个原理以后,
用组织切片革兰氏染色、免疫组织化学染色等方法,对85例喉癌重新切片,进行细菌L型检查,结果发现有65例革兰氏染色L型菌阳性,其阳性率为76.5%。53例(64.7%)L型抗体免疫组化染色和革兰氏染色L型菌均阳性,两