论文部分内容阅读
针对对策论框架下的诸多强化学习方法在复杂环境多Agent任务中存在的缺乏理性、难以保证收敛、计算复杂度较高和效率偏低等问题,文中在CE-Q强化算法的基本理论上,提出了加入对于动作过程的即时奖赏的CE-Q改进强化算法,有效地改善了上述问题,并在执行任务过程中对Agent进行指导,很好地提高了系统效率。最后以多Agent觅食为任务,Matlab为平台进行仿真实试验,并与普通CE-Q及FF-Q算法进行对比,验证了其在复杂环境下对于多Agent系统的有效性和优越性。
Aiming at the problems such as lack of rationality, difficulty in guaranteeing convergence, high computational complexity and low efficiency in the multi-agent tasks in complex environment, many reinforcement learning methods under the framework of game theory are used. In the basic theory of CE-Q, The CE-Q improvement and reinforcement algorithm that adds instant reward to the action process is proposed to effectively improve the above-mentioned problems and to guide the Agent in the process of carrying out the task, which improves the system efficiency. At last, taking the multi-agent foraging as the task, Matlab carries on the simulation test for the platform, and compares it with the common CE-Q and FF-Q algorithms to verify its effectiveness and superiority to the multi-agent system under complicated environment.