论文部分内容阅读
在超级计算机系统试运行过程中,找们发现当用户作业大量投入时(尤其是并行作业较多、计算量过大时),用户作业之间争抢资源额定情况时常发生;有的作业大量占用CPU。有的则需要占用大量内存,不同的作业争抢资源,常常造成换页频率过高、磁盘I/O过大、系统负荷过重,系统反映变慢,作业运行时间增长。运算效率降低;严重时造成控制台死锁,甚至会导致系统死机,给用户带来诸多不便。因此,在超级计算机系统中需要一套完整的作业和资源管理系统,进行作业的监控、调度和管理,以及资源的分析与统计,达到计算资源的
During the trial run of the supercomputer system, we found that when the user inputs a large amount of work (especially when there are many parallel jobs and the calculation cost is too large), it often happens that the user jobs compete for the rated resources; some jobs occupy a lot CPU. While others need to occupy a large amount of memory. Different jobs compete for resources. Frequent paging frequently causes excessive disk I / O, excessive system load, slow system response and increased operation time. Computing efficiency is reduced; in serious cases deadlock caused by the console, and even lead to system crashes, to the user a lot of inconvenience. Therefore, in the super computer system requires a complete set of operations and resource management system for job monitoring, scheduling and management, as well as resource analysis and statistics, to calculate the resources