论文部分内容阅读
基本Q-学习算法总是利用当前最优策略进行动作的选取,这样容易陷入局部最优。文章在模拟退火强化学习基础上提出了基于探索区域扩张的Q-学习,加入原地探索策略,提高了找到目标的效率;引入了探索区域扩张策略,避免了初始时在整个环境中加入探索的盲目性,提高了学习效率;加入算法的自主学习结束条件,避免了找到最优路径后的重复学习,节省了学习时间。仿真实验验证了算法的有效性。
The basic Q-learning algorithm always uses the current optimal strategy to select the action, so easy to fall into the local optimum. Based on the simulated annealing reinforcement learning, this paper proposes Q-learning based on exploration area expansion and adds in-situ exploration strategy to improve the efficiency of finding the target. The strategy of exploring area expansion is introduced to avoid the initial exploration of the whole environment Blindness, improve learning efficiency; join the algorithm of autonomous learning end conditions, to avoid repeated learning to find the optimal path, saving on learning time. Simulation results show the effectiveness of the algorithm.