Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-bas

来源 :控制理论与应用(英文版) | 被引量 : 0次 | 上传用户:xingke198621
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
A primary challenge of agent-based policy learning in complex and uncertain environments is escalating computational complexity with the size of the task space (action choices and world states) and the number of agents.Nonetheless,there is ample evidence in the natural world that high-functioning social mammals learn to solve complex problems with ease,both individually and cooperatively.This ability to solve computationally intractable problems stems from both brain circuits for hierarchical representation of state and action spaces and learned policies as well as constraints imposed by social cognition.Using biologically derived mechanisms for state representation and mammalian social intelligence,we constrain state-action choices in reinforcement learning in order to improve learning efficiency.Analysis results bound the reduction in computational complexity due to state abstraction,hierarchical representation,and socially constrained action selection in agent-based learning problems that can be described as variants of Markov decision processes.Investigation of two task domains,single-robot herding and multirobot foraging,shows that theoretical bounds hold and that acceptable policies emerge,which reduce task completion time,computational cost,and/or memory resources compared to learning without hierarchical representations and with no social knowledge.
其他文献
时下,中国很多地方都已把建设旅游城市列入十二五规划。但旅游城市不仅在硬件设施要过硬,文明服务、价格公道、诫信经营等“软件设施”同样良好,才能吸引四面八方的游客,聚集人气
首次提出四次Bernstein基函数的一种新扩展--舍有一个形状参数的λQ-Bernstein基函数,与以往的基函数相比较,基函数的次敷一次性升高两次,且具有四次多项式基函数和带一个形
摘要:设A为n阶本原矩阵,若存在正整数k,使得对于A^K的任意两行,都在某一列上的元素为正,这样的最小正整数称为本原矩阵A的scrambling指数.本文采用图理论来研究对称本原矩阵A的scra
宇宙充满了对称,对称性普遍存在于丰富多彩的世界。对称是物质存在和发展变化过程中具有的客观属性,它体现了物质运动规律的和谐与简洁。在物理学中,普遍存在“对称性”。利用对
传统的教学模式因缺少实践活动,而有着很大的不足.教学实验课可以弥补这方面的不足.本文探讨了教学实验课开设的必要性,通过教学实验,提高了学生学习的积极性及对数学思想方
The controlled volume method of operation is especially suitable for large-scale water delivery canal system with complex operation requirements. An operating s
目的:建立一种高效液相色谱法测定照山白中杜鹃素含量的方法.方法:采用C(18)柱,流动相:甲醇-水(60:40).检测波长295.柱温:35℃.结果:线性范围32.29-403.625μg,r=0.9995.结论
实施九年制义务教育后一方面小学毕业生全部升入初中,因此同一教学班的学生尽管他们处于同一年龄段,受到几乎相同的教育,在许多方面有共同点,但同时也表现出明显的个性差异.
和谐社会是一个以人为本、全面协调可持续发展的社会。和谐公路作为和谐社会的重要组成部分和题中之义,承担着繁重的任务。结合烟台公路发展现状,对和谐公路的内涵和必要性进行
针对功能梯度材料层/均匀材料基体的物理弱间断线上斜交裂纹,通过分离变量和级数展开法构造位移函数,求得了裂纹尖端高阶渐近场.其界面裂纹尖端高阶渐近应力场具有与均匀材料