论文部分内容阅读
在一组互连处理机(计算机)中任务分配的目的是使资源的有效使用最大化,并由此而减少作业的解题周期。本文提出的在多计算机系统中分配任务的简单而有效的方法旨在系统和设计者确定的资源限制条件下,使处理机间的通信成本最小。因为每个任务的执行时间、可用处理机数目、处理机速度和存储容量对系统或设计者来说是已知的,因此,限制可看作为是由负载平衡引起的。随着处理机数目的增加,在任何时间在系统某处出现故障的概率也随之增加。几乎没有已建立的任务分配模型考虑了可靠性性质。在多计算机系统中,我们定义系统可靠性为系统可成功地运行任务的概率。在确定(非冗余)任务调度策略以后,任务静态和冗余地再分配给处理机。这是一种时间冗余形式,在这种形式中,如果在执行期间某些处理机故障,那么所有任务可以在剩余的处理机上(但以更长的时间)完成。由于是任务的静态预分配,这种方法比众所周知的多计算机系统中的动态再配置和滚回恢复技术更简单,因此也更实际。通过把该方法应用于不同的例子和实际的通信网络多处理机系统,我们验证了硬件容错任务分配和再分配的有效性。
The purpose of task allocation in a group of interconnected processors (computers) is to maximize the efficient use of resources and thereby reduce the problem-solving cycle for a job. The simple and effective method proposed in this paper to distribute tasks in a multi-computer system is to minimize the communication costs between the processors under the resource constraints set by the system and the designer. Because of the execution time of each task, the number of available processors, processor speed, and storage capacity are known to the system or designer, the limitations can be seen as being caused by load balancing. As the number of processors increases, the probability of a failure somewhere in the system increases at any time. Few established task assignment models take into account the nature of reliability. In a multi-computer system, we define the system reliability as the probability that the system will successfully run the task. After a (non-redundant) task scheduling policy is determined, the task is reassigned to the processor statically and redundantly. This is a form of temporal redundancy in which all tasks can be done on the remaining handlers (but for a longer period of time) if some of the processors fail during execution. Because of the static pre-assignment of tasks, this approach is simpler and therefore more practical than the well-known dynamic reconfiguration and roll-back recovery techniques in multi-computer systems. By applying the method to different examples and actual communication network multiprocessor systems, we verified the effectiveness of hardware fault-tolerant task allocation and redistribution.