Discrete-time dynamic graphical games:model-free reinforcement learning solution

来源 :Control Theory and Technology | 被引量 : 0次 | 上传用户:nn18
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from multi-agent dynamical systems, where pinning control is used to make all the agents synchronize to the state of a command generator or a leader agent. Novel coupled Bellman equations and Hamiltonian functions are developed for the dynamic graphical games. The Hamiltonian mechanics are used to derive the necessary conditions for optimality. The solution for the dynamic graphical game is given in terms of the solution to a set of coupled Hamilton-Jacobi-Bellman equations developed herein. Nash equilibrium solution for the graphical game is given in terms of the solution to the underlying coupled Hamilton-Jacobi-Bellman equations. An online model-free policy iteration algorithm is developed to learn the Nash solution for the dynamic graphical game. This algorithm does not require any knowledge of the agents’ dynamics. A proof of convergence for this multi-agent learning algorithm is given under mild assumption about the inter-connectivity properties of the graph. A gradient descent technique with critic network structures is used to implement the policy iteration algorithm to solve the graphical game online in real-time. This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from multi-agent dynamical systems, where pinning control is used to make all the agents synchronize to the The state of a command generator or a leader agent. Novel coupled Bellman equations and Hamiltonian functions are developed for the dynamic graphical games. The Hamiltonian mechanics are used to derive the necessary conditions for optimality. The solution for the dynamic graphical game is given in terms of The solution to a set of coupled Hamilton-Jacobi-Bellman equations developed herein. Nash equilibrium solution for the graphical game is given in terms of the solution to the underlying coupled Hamilton-Jacobi-Bellman equations. An online model-free policy iteration algorithm is developed to learn the Nash solution for the dynamic graphical game. This algorithm does not require any knowledge of the agents’ dy namics. A proof of convergence for this multi-agent learning algorithm is given under mild assumption about the inter-connectivity properties of the graph. A gradient descent technique with critic network structures is used to implement the policy iteration algorithm to solve the graphical game online in real-time.
其他文献
In this paper, we focus on circle formation control of multi-agent systems(MAS) with a leader. The circle formation is achieved based on the lead-following and
An immersion and invariance(I&I) manifold based adaptive control algorithm is presented for a class of continuous stirred tank reactors(CSTR) to realize perform
人力资源管理是企业发展的基础,而负责人力资源管理工作的管理者都是公司的高层管理者。人力资源管理从传统的人事档案管理逐渐转变为新型的企业文化管理模式,这种新型的管理模
Since the proposal for pangenomic study, there have been a dozen software tools actively in use for pangenomic analysis. By the end of 2014, Panseq and the pan-
农用车辆大多行驶在农村道路上,由于路况不好,尤其是一些山区公路,道路崎岖,路面高低不平,农机手在驾驶农用车的过程中,难免会遇到制动失灵、转向失控等一些特殊的紧急情况。如不能
加强注意经济失调,汽车是国民经济的重要产业,中国销售市场很热也是一个国家科技水平的写照。汽车维修业随着汽车工业的发展而发展,高质量,主意成交方式高效率的汽车维修不仅能提
为了了解胜任力的描述特征与实际内涵,本文根据胜任力内涵、特征与形成原因出发,提出了胜任力与人力资源绩效之间存在的关系。为了了解胜任力模型的建设方案,本次研究以绩效结构
思想政治工作是我们党的政治优势。在新的形势下,做好煤矿工会企业思想政治工作,要坚持以科学发展观为指导,与时俱进,解放思想,转变观念,借鉴和吸收现代的管理思想和信息技术,坚持求
我们该如何把真空排水系统在室外污水处理中加以最科学合理的利用呢?笔者将在本文中为大家揭晓答案。
CAXA 制造工程师、CAXA 数控车等系列软件作为国产绘图软件的领头羊,在国内工科院校已得到了广泛的使用,笔者根据自己多年来对该系列软件的操作经验,谈谈如何利用CAXA软件方便快