论文部分内容阅读
The advent of multi-core/many-core chip technology offers both an extraordinary opportunity and a profound challenge.In particular,computer architects and system software designers are faced with a unique opportunity to introducing new architecture features as well as adequate compiler technology-together they may have profound impact.This paper presents a case study(using the 1-D Jacobi computation)of compiler-amendable performance optimization techniques on a many-core architecture Godson-T.Godson-T architecture has several unique features that are chosen for this study: 1)chip-level global addressable memory in particular the scratchpad memories(SPM)local to the processing cores; 2)fine-grain memory based synchronization(e.g.,full-empty bit for fine-grain synchronization).Leveraging state-of-the-art performance optimization methods for 1-D stencil parallelization(e.g.,timed tiling and variants),we developed and implement a number of many-core-based optimization for Godson-T.Our experimental study shows good performance in both execution time speedup and scalability,validate the value of globally accessed SPM and fine-grain synchronization mechanism(full-empty bits)under the Godson-T,and provides some useful guidelines for future compiler technology of many-core chip architectures.