Hybrid MDP based integrated hierarchical Q-learning

来源 :Science China(Information Sciences) | 被引量 : 0次 | 上传用户：whiterain

【摘要】

：

As a widely used reinforcement learning method,Q-learning is bedeviled by the curse of dimensionality:The computational complexity grows dramatically with the s

【作者】

：

TARN Tzyh-Jong

【机构】

：

Department of Electrical and Systems Engineering,Washington University in St.Louis,St.Louis,MO 63130

【出处】

：

Science China(Information Sciences)

【发表日期】

：

2011年11期

【关键词】

：

hierarchical qualitative dimensionality robot navigation instead puzzle difficul

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

As a widely used reinforcement learning method,Q-learning is bedeviled by the curse of dimensionality:The computational complexity grows dramatically with the size of state-action space.To combat this difficulty,an integrated hierarchical Q-learning framework is proposed based on the hybrid Markov decision process (MDP) using temporal abstraction instead of the simple MDP.The learning process is naturally organized into multiple levels of learning,e.g.,quantitative (lower) level and qualitative (upper) level,which are modeled as MDP and semi-MDP (SMDP),respectively.This hierarchical control architecture constitutes a hybrid MDP as the model of hierarchical Q-learning,which bridges the two levels of learning.The proposed hierarchical Q-learning can scale up very well and speed up learning with the upper level learning process.Hence this approach is an effective integral learning and control scheme for complex problems.Several experiments are carried out using a puzzle problem in a gridworld environment and a navigation control problem for a mobile robot.The experimental results demonstrate the effectiveness and efficiency of the proposed approach. As a widely used reinforcement learning method, Q-learning is bedeviled by the curse of dimensionality: The computational complexity grows dramatically with the size of state-action space. To combat this difficulty, an integrated hierarchical Q-learning framework is proposed based on the Hybrid Markov decision process (MDP) using temporal abstraction instead of the simple MDP. The learning process is orchestrated into multiple levels of learning, eg, quantitative (lower) level and qualitative (upper) level, which are modeled as MDP and semi- MDP (SMDP), respectively. This hierarchical control architecture constitutes a hybrid MDP as the model of hierarchical Q-learning, which bridges the two levels of learning. The proposed hierarchical Q-learning can scale up very well and speed up learning with the upper level learning process .ence this approach is an effective integral learning and control scheme for complex problems. Several experiments are carried out using a puzzle problem in a gridw orld environment and a navigation control problem for a mobile robot. the experimental results demonstrate the effectiveness and efficiency of the proposed approach.

其他文献

朝鲜族留守儿童隔代家庭教育不足问题研究——以延吉市为例

改革开放后，中国朝鲜族由于其人缘及地缘的特殊优势，形成了大批出国务工人员，留守儿童的问题也随之涌现。朝鲜族留守儿童大多由隔代家长进行抚养、教育，他们的家庭教育质量和水平

学位

农村社会留守儿童家庭教育社会调查

探讨养鸡防疫消毒科学对策

当前养殖业的发展走势越发向上,养殖场的数量也是越来越庞大.养鸡业的发展更是包含在内,而由于养鸡行业的飞速发展,病毒出现的可能性以及带来的问题也逐渐增加.对此,养鸡业的

期刊

养鸡防疫消毒科学对策探讨

中职学校数字化校园建设和应用问题研究——以甘肃省理工中等专业学校为例

数字化校园给中等职业学校师生的教学和学习观念，管理方式等带来了巨大的变化，促进了学校教育教学的改革和发展，提升了学校的办学水平。但在中职学校的数字化校园建设和应用过程

学位

中职学校数字化校园办学水平

生命体验：学校道德教育的应有之义

在道德教育领域生命体验已成为研究的一个新视点。生命体验对学生德性发展有很大的影响,这已受到道德教育理论研究者们的关注。生命体验与德性发展有密切联系,在学校道德教育

学位

生命体验学校道德教育缺失

抗战时期陈诚推进湖北教育述论

1937年七七事变爆发后,随着国民党军队在正面战场的节节败退,战局形势急转直下。日本侵略者的铁蹄在中华大地上肆意践踏,一路烧杀抢掠,很快侵占了中国的半壁河山。北平、天津

学位

陈诚抗战时期计划教育影响评价

幼儿园课题管理的个案研究

“以研促教”的观念得到越多越多幼教工作者的认同。课题作为幼儿园“研”的一种重要形式,开展课题研究活动已然成了一种趋势。课题管理伴随课题而生,并贯穿于课题开展的全过

学位

幼儿园管理课题课题管理个案研究

冬小麦节水栽培技术

中国农大王志敏教授等人经过多年研究,已经建立了一套新的“节水省肥高产简化”栽培技术体系。2004年通过河北省科技厅组织的专家鉴定。2006年在吴桥县示范田里创下小麦平均

期刊

节水栽培技术中国农大冬小麦河北省科技厅高产记录吴桥县王志敏籽粒发育晚播亩基本苗

粤港澳大湾区背景下体育与旅游协同发展研究r——以“环粤港澳大湾区城市自行车挑战赛”为例

运用文献资料法、实地考查法、访谈法等,深入分析了2018(首届)、2019(第二届)“环粤港澳大湾区城市自行车挑战赛”的背景意义和基本概况.介绍了环粤港澳大湾区城市自行车挑战

期刊

粤港澳大湾区体育与旅游协同发展

树立正确理念办好少儿电视节目

宁波电视台少儿频道开播两周年了,少体中心自办的几档颇具特色的栏目已渐渐地深入人心,也趋于形成自身风格。地方城市电视台少儿频道要提高质量,主要问题还是要从节目入 Nin

期刊

少儿电视少儿频道宁波电视台提高质量城市电视台电视节目电视创作少儿节目节目创作成人化

小学写字教学的现状及策略研究

小学阶段作为九年义务教育的初始阶段,对于学生培养良好学习生活习惯以及基本技能都起着十分关键的作用。在这个阶段,写字教学作为小学语文教学中的重要组成部分,引起教育界

学位

小学写字教学现状策略

Hybrid MDP based integrated hierarchical Q-learning

与本文相关的学术论文