Greedy feature replacement for online value function approximation

来源 :Journal of Zhejiang University-Science C(Computers & Electro | 被引量 : 0次 | 上传用户:shishaofei
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Reinforcement learning(RL) in real-world problems requires function approximations that depend on selecting the appropriate feature representations. Representational expansion techniques can make linear approximators represent value functions more effectively; however, most of these techniques function well only for low dimensional problems. In this paper, we present the greedy feature replacement(GFR), a novel online expansion technique, for value-based RL algorithms that use binary features. Given a simple initial representation, the feature representation is expanded incrementally. New feature dependencies are added automatically to the current representation and conjunctive features are used to replace current features greedily. The virtual temporal difference(TD) error is recorded for each conjunctive feature to judge whether the replacement can improve the approximation. Correctness guarantees and computational complexity analysis are provided for GFR. Experimental results in two domains show that GFR achieves much faster learning and has the capability to handle large-scale problems. Representational expansion techniques can make linear approximators represent value functions more effectively; however, most of these techniques function well only for low dimensional problems. In this paper, we present the greedy feature replacement (GFR), a novel online expansion technique, for value-based RL algorithms that use binary features. Given a simple initial representation, the feature representation is expanded incrementally. New feature dependencies are added automatically to the current representation and conjunctive features are used to replace current features greedily. The virtual temporal difference (TD) error is recorded for each conjunctive feature to judge whether the replacement can improve improve the approximation. Correctness guarantees and computational complexity analysis are provided for GFR. Experimental results in two domains show that GFR achieves much faster learning and has the capability to handle large-scale problems.
其他文献
现在回忆起来,我们当时宣讲党课的主要内容大致为:什么是共产主义、什么是社会主义、什么是共产党、怎样做一个共产党员、过渡时期党的总路线、党的民族政策、党在牧区的各项
在上学学习的时候,总觉得一些教育理论用处不大,经过几年的教学实践,我深切体会到:要想不断提高自己的教学水平,就不能没有一个明确的教育思想作为指导,现在,我认为应当用教育思想来贯穿教学过程的始终。  一、把爱给学生  没有爱就没有教育。如果教师没有对祖国和人民的爱,就无法培养学生的高尚情操;没有对生活和事业的爱,就无法引导学生对生活充满爱;没有对家人、朋友的爱,就不可能塑造学生善良的心;没有对学生的
红景天苷是景天科红景天属植物最为重要的药效成分,由于其具有多种重要的药用功效而成为当今天然产物研究的热点之一。在全面总结前人研究成果的基础上,综述了红景天苷生物合
在纪念伟大的五四运动80周年,重温五四运动及其后来的历史经验时,不能不强烈地感受到:中国革命和建设事业的不断胜利,中国青年运动的健康发展,须臾离不开中国共产党坚强正确的领导;要
六、担任国家进出口委副主任的江泽民,与广东省委负责同志研究决定,国家拿出3000万元贷款,专供开发深圳经济特区用。荒土变成了金子。 特区应该怎么建? 圈出一块地方,搞一个
过去气候变化是全球变化研究的重要组成部分。探讨地球环境在地质时期的变化规律是预测朱来气候变化、应对当今和将来日益严竣的环境问题的迫切需求。由于我国的气候环境变化
请下载后查看,本文暂不支持在线获取查看简介。 Please download to view, this article does not support online access to view profile.
爱数控的博客该博客创建于公元2012年,致力于分享数控操作,机床维修,系统维护等方面的内容。由机床参数引起的无报警故障。一台FANUC 18i-W慢走丝,开机后CRT显示X、Y、U、V坐
商业是城市最重要的功能之一,始终是城市经济、社会生活的最基本内容。 商业网点布局规划工作是各国政府普遍关注的一个重要问题之一,合理的商业网点布局不仅会促进地区商业
1932年6月下旬,蒋介石纠集15万兵力,对湘鄂西革命根据地进行大举进攻,红军被迫离开根据地作战略转移。9月,敌人对洪湖地区进行大规模的清剿,湘鄂西省委政治秘书长兼文化部副