论文部分内容阅读
针对连续空间强化学习问题,提出一种基于局部加权学习的增量最近邻时域差分(TD)学习框架.通过增量方式在线选取部分已观测状态构建实例词典,采用新观测状态的范围最近邻实例逼近其值函数与策略,并结合TD算法对词典中各实例的值函数和资格迹迭代更新.就框架各主要组成部分给出多种设计方案,并对其收敛性进行理论分析.对24种方案组合进行仿真验证的实验结果表明,SNDN组合具有较好的学习性能和计算效率.
In order to solve the problem of continuous space reinforcement learning, an incremental Nearest Neighbor (TD) learning framework based on local weighted learning is proposed. An example dictionary is constructed by incremental selection of partially observed states, and the nearest neighbors The examples are approximated by their value functions and strategies, and combined with the TD algorithm iterative updating of the value function and qualification trajectory of each instance in the dictionary, a variety of design solutions are given to the major components of the framework and the convergence is analyzed theoretically. Experimental results show that the SNDN combination has good learning performance and computational efficiency.