论文部分内容阅读
工作站机群系统已成为分布式并行处理发展的主流方向之一 .随着机群系统应用领域的逐渐拓展和规模的不断扩大 ,人们对其可靠性的要求日益提高 .设计高可靠的群机系统 ,需要着重研究其系统容错技术 .本文叙述了并行异构环境回卷恢复和检查点派生 .实现透明的可移植容错和负载均衡能力 .避免调整检查点就可以构成全局一致性状态 .不仅使 BSP应用程序自治容错能力 ,而且能够在机群 (Clusters)间迁移 ,保持系统负载均衡 .重点介绍检查点设置、检查点派生、卷回、进程迁移技术
Workstation cluster system has become one of the main trends in the development of distributed parallel processing.With the gradual expansion of the field of application of the cluster system and its expanding scale, people are increasingly demanding on its reliability.Designing a highly reliable group machine system requires Focusing on the system fault-tolerant technology.This paper describes the parallel heterogeneous environment rollback recovery and checkpoint derivation.To achieve transparent and portable fault tolerance and load balancing.Avoid adjusting the checkpoint to form a global consistency status.Not only to BSP applications Autonomous fault tolerance, and can migrate between Clusters to maintain system load balancing, with emphasis on checkpoint setup, checkpointing, rollback, and process migration