论文部分内容阅读
矩阵运算是高性能计算中核心问题之一,矩阵分解是提高矩阵运算并行性的重要途径,飞速发展的FPGA为并行运算结构提供了有力的环境支持。该文基于子矩阵更新同一化算法实现了Cholesky分解,基于FPGA设计了相应的并行结构。实验结果表明:与通用处理器的软件实现相比,本文实现的Cholesky分解的FPGA并行结果在核心计算性能上可以取得10倍以上的加速比,该算法针对矩阵三角化计算过程具有更高的数据和流水并行性。
Matrix computing is one of the core problems in high performance computing. Matrix decomposition is an important way to improve the parallelism of matrix computing. The rapid development of FPGA provides powerful environment support for parallel computing architecture. This paper implements the Cholesky decomposition based on the sub-matrix updating and homogenization algorithm, and designs the corresponding parallel structure based on FPGA. The experimental results show that compared with the general-purpose processor software, the parallel results of the Cholesky decomposition achieved in this paper can achieve more than 10 times the speedup in the core computing performance. The algorithm has higher data for the matrix triangulation calculation process And water parallelism.