Greedy feature replacement for online value function approximation

来源 :Journal of Zhejiang University-Science C(Computers & Electro | 被引量 : 0次 | 上传用户:shishaofei
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
Reinforcement learning(RL) in real-world problems requires function approximations that depend on selecting the appropriate feature representations. Representational expansion techniques can make linear approximators represent value functions more effectively; however, most of these techniques function well only for low dimensional problems. In this paper, we present the greedy feature replacement(GFR), a novel online expansion technique, for value-based RL algorithms that use binary features. Given a simple initial representation, the feature representation is expanded incrementally. New feature dependencies are added automatically to the current representation and conjunctive features are used to replace current features greedily. The virtual temporal difference(TD) error is recorded for each conjunctive feature to judge whether the replacement can improve the approximation. Correctness guarantees and computational complexity analysis are provided for GFR. Experimental results in two domains show that GFR achieves much faster learning and has the capability to handle large-scale problems. Representational expansion techniques can make linear approximators represent value functions more effectively; however, most of these techniques function well only for low dimensional problems. In this paper, we present the greedy feature replacement (GFR), a novel online expansion technique, for value-based RL algorithms that use binary features. Given a simple initial representation, the feature representation is expanded incrementally. New feature dependencies are added automatically to the current representation and conjunctive features are used to replace current features greedily. The virtual temporal difference (TD) error is recorded for each conjunctive feature to judge whether the replacement can improve improve the approximation. Correctness guarantees and computational complexity analysis are provided for GFR. Experimental results in two domains show that GFR achieves much faster learning and has the capability to handle large-scale problems.
在上学学习的时候,总觉得一些教育理论用处不大,经过几年的教学实践,我深切体会到:要想不断提高自己的教学水平,就不能没有一个明确的教育思想作为指导,现在,我认为应当用教育思想来贯穿教学过程的始终。  一、把爱给学生  没有爱就没有教育。如果教师没有对祖国和人民的爱,就无法培养学生的高尚情操;没有对生活和事业的爱,就无法引导学生对生活充满爱;没有对家人、朋友的爱,就不可能塑造学生善良的心;没有对学生的
六、担任国家进出口委副主任的江泽民,与广东省委负责同志研究决定,国家拿出3000万元贷款,专供开发深圳经济特区用。荒土变成了金子。 特区应该怎么建? 圈出一块地方,搞一个
请下载后查看,本文暂不支持在线获取查看简介。 Please download to view, this article does not support online access to view profile.
爱数控的博客该博客创建于公元2012年,致力于分享数控操作,机床维修,系统维护等方面的内容。由机床参数引起的无报警故障。一台FANUC 18i-W慢走丝,开机后CRT显示X、Y、U、V坐
商业是城市最重要的功能之一,始终是城市经济、社会生活的最基本内容。 商业网点布局规划工作是各国政府普遍关注的一个重要问题之一,合理的商业网点布局不仅会促进地区商业