论文部分内容阅读
文中提出了一种在VLIW架构DSP上计算AVS视频标准中DCT的方法。在对DCT变换矩阵进行分解的基础上,利用复数乘法实现矩阵乘法计算,并通过合理组织数据,实现了变换矩阵打包系数的复用,减少了寄存器占用,使得算法更适于进行循环展开和软件流水,从而实现更高的并行度,执行速度得到有效提升。在计算效率上,文中提出的计算方法比AVS标准中的快速算法提高了4.28倍,并且比现有方法的计算耗时减少了31.1%。
In this paper, a method of calculating DCT in AVS video standard on VLIW architecture DSP is proposed. Based on decomposing the DCT transform matrix, the complex multiplication method is used to calculate the matrix multiplication. By reasonably organizing the data, the transformation coefficients of the transform matrix are multiplexed and the register occupation is reduced, making the algorithm more suitable for loop unrolling and software Water, so as to achieve a higher degree of parallelism, the effective implementation of the speed. In terms of computational efficiency, the proposed method is 4.28 times faster than the fast algorithm in the AVS standard and 31.1% less than the existing method.