论文部分内容阅读
矩阵乘积算法在科学计算中应用十分广泛.文中给出了典型矩阵乘积算法在曙光1000上的性能比较和分析,并针对SUMMA算法研究了分块尺寸对其通信性能的影响,指出分块尺寸是影响其通信性能的一个重要因素.原算法并没有给出其分块尺寸的具体选取方法,文中通过理论和实验的分析提出了一个选取最优分块尺寸的标准.实验结果显示SUMMA算法按文中的标准选取最优分块尺寸后性能得到大幅度提高,可达机器峰值的50.7%.
Matrix product algorithm is widely used in scientific computing. The performance comparison and analysis of the typical matrix product algorithm on Dawning 1000 are given in this paper. The influence of block size on its communication performance is studied for SUMMA algorithm. It is pointed out that the size of block is an important factor affecting the communication performance. The original algorithm does not give a specific selection method for its block size. In this paper, a criterion for selecting the optimal block size is proposed through theoretical and experimental analysis. The experimental results show that the performance of SUMMA algorithm can be greatly improved by selecting the optimal block size according to the standard in the paper, up to 50.7% of the peak value of the machine.