Hybrid MDP based integrated hierarchical Q-learning

来源 :Science China(Information Sciences) | 被引量 : 0次 | 上传用户:whiterain
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
As a widely used reinforcement learning method,Q-learning is bedeviled by the curse of dimensionality:The computational complexity grows dramatically with the size of state-action space.To combat this difficulty,an integrated hierarchical Q-learning framework is proposed based on the hybrid Markov decision process (MDP) using temporal abstraction instead of the simple MDP.The learning process is naturally organized into multiple levels of learning,e.g.,quantitative (lower) level and qualitative (upper) level,which are modeled as MDP and semi-MDP (SMDP),respectively.This hierarchical control architecture constitutes a hybrid MDP as the model of hierarchical Q-learning,which bridges the two levels of learning.The proposed hierarchical Q-learning can scale up very well and speed up learning with the upper level learning process.Hence this approach is an effective integral learning and control scheme for complex problems.Several experiments are carried out using a puzzle problem in a gridworld environment and a navigation control problem for a mobile robot.The experimental results demonstrate the effectiveness and efficiency of the proposed approach. As a widely used reinforcement learning method, Q-learning is bedeviled by the curse of dimensionality: The computational complexity grows dramatically with the size of state-action space. To combat this difficulty, an integrated hierarchical Q-learning framework is proposed based on the Hybrid Markov decision process (MDP) using temporal abstraction instead of the simple MDP. The learning process is orchestrated into multiple levels of learning, eg, quantitative (lower) level and qualitative (upper) level, which are modeled as MDP and semi- MDP (SMDP), respectively. This hierarchical control architecture constitutes a hybrid MDP as the model of hierarchical Q-learning, which bridges the two levels of learning. The proposed hierarchical Q-learning can scale up very well and speed up learning with the upper level learning process .ence this approach is an effective integral learning and control scheme for complex problems. Several experiments are carried out using a puzzle problem in a gridw orld environment and a navigation control problem for a mobile robot. the experimental results demonstrate the effectiveness and efficiency of the proposed approach.
其他文献
改革开放后,中国朝鲜族由于其人缘及地缘的特殊优势,形成了大批出国务工人员,留守儿童的问题也随之涌现。朝鲜族留守儿童大多由隔代家长进行抚养、教育,他们的家庭教育质量和水平
当前养殖业的发展走势越发向上,养殖场的数量也是越来越庞大.养鸡业的发展更是包含在内,而由于养鸡行业的飞速发展,病毒出现的可能性以及带来的问题也逐渐增加.对此,养鸡业的
数字化校园给中等职业学校师生的教学和学习观念,管理方式等带来了巨大的变化,促进了学校教育教学的改革和发展,提升了学校的办学水平。但在中职学校的数字化校园建设和应用过程
在道德教育领域生命体验已成为研究的一个新视点。生命体验对学生德性发展有很大的影响,这已受到道德教育理论研究者们的关注。生命体验与德性发展有密切联系,在学校道德教育
1937年七七事变爆发后,随着国民党军队在正面战场的节节败退,战局形势急转直下。日本侵略者的铁蹄在中华大地上肆意践踏,一路烧杀抢掠,很快侵占了中国的半壁河山。北平、天津
“以研促教”的观念得到越多越多幼教工作者的认同。课题作为幼儿园“研”的一种重要形式,开展课题研究活动已然成了一种趋势。课题管理伴随课题而生,并贯穿于课题开展的全过
中国农大王志敏教授等人经过多年研究,已经建立了一套新的“节水省肥高产简化”栽培技术体系。2004年通过河北省科技厅组织的专家鉴定。2006年在吴桥县示范田里创下小麦平均
运用文献资料法、实地考查法、访谈法等,深入分析了2018(首届)、2019(第二届)“环粤港澳大湾区城市自行车挑战赛”的背景意义和基本概况.介绍了环粤港澳大湾区城市自行车挑战
宁波电视台少儿频道开播两周年了,少体中心自办的几档颇具特色的栏目已渐渐地深入人心,也趋于形成自身风格。地方城市电视台少儿频道要提高质量,主要问题还是要从节目入 Nin
小学阶段作为九年义务教育的初始阶段,对于学生培养良好学习生活习惯以及基本技能都起着十分关键的作用。在这个阶段,写字教学作为小学语文教学中的重要组成部分,引起教育界