论文部分内容阅读
目前路由器不仅要满足传输带宽的需求,还需要保证用户服务质量。业务量管理器是路由器中的一个重要芯片,负责调度网络流,为用户提供服务质量保证。单个业务量管理器所能支持的带宽受限于硬件的工作频率、数据总线、设计工艺等,难以满足高速率接口,如100Gb/s以太网的要求。又由于路由器内部的空间非常拥挤,如果在路由器的一块线卡上部署多个业务量管理器芯片以满足高带宽的需求,将使得线路板体积增大,成本和功耗也相应增大。为提高单个业务量管理器芯片的处理能力,该文设计了一种支持多线程处理的并行化调度器。并行化调度器采用二级调度策略,能够在不提高系统内部时钟频率的基础上,成倍提高业务量管理器的调度速度;而且并行化调度器共享片外存储器,可以充分利用存储器带宽,减少存储器数目。性能评价结果表明:利用支持4线程的并行化调度器实现的业务量管理器的最大支持带宽可提高3倍;相比于部署4个业务量管理器芯片的方案,4线程方案的片内存储开销、逻辑开销和存储器使用数量分别降低了7.1%、36.2%和75%,而且4线程方案的部署空间和存储器功耗开销都降低了75%。
At present, the router not only needs to meet the requirement of transmission bandwidth, but also needs to ensure the service quality of users. Traffic Manager is an important chip in the router, responsible for scheduling network traffic, to provide users with quality of service assurance. The bandwidth supported by a single traffic manager is limited by the operating frequency of the hardware, the data bus, the design process, etc., and it is difficult to meet the requirements of a high-speed interface such as 100 Gb / s Ethernet. In addition, because the space inside the router is very crowded, if multiple traffic manager chips are deployed on a line card of the router to meet the demand of high bandwidth, the board size increases and the cost and power consumption increase correspondingly. In order to improve the processing capability of a single traffic manager chip, this paper designs a parallel scheduler that supports multi-threading. The parallel scheduler adopts the second-level scheduling strategy, which can exponentially improve the scheduling speed of the traffic manager without increasing the internal clock frequency of the system. Moreover, the parallel scheduler shares the off-chip memory and can fully utilize the memory bandwidth and reduce The number of memories. The performance evaluation results show that the maximum supported bandwidth of the traffic manager can be increased by 3 times by using the 4-thread parallel scheduler. Compared with the solution of deploying 4 traffic manager chips, the 4-thread program’s on-chip memory Costs, logic overhead, and memory usage were reduced by 7.1%, 36.2%, and 75%, respectively, and the 4-thread solution reduced storage space and storage power consumption by 75%.