论文部分内容阅读
$options是一种与SMDP模型紧密相关的引入时间抽象的强化学习算法!该算法一个重要且仍待解决的问题是如何能使agent自主找到合适的options.本文首先提出了一种基于访问落差变化率的子目标找寻算法,该算法克服了现有算法的低精确性和部分依赖人为因素的弊病,然后在该算法的基础上,提出了构造options的算法流程,并把这一算法运用于迷宫问题之中。实验结果表明利用实验生成的options可以大大加快学习的效率。
$ options is an intensive learning algorithm that introduces time abstraction closely related to the SMDP model! An important and still unresolved issue is how to make the agent find its own suitable options. First of all, Rate sub-target search algorithm, the algorithm overcomes the low accuracy of the existing algorithm and the drawbacks of partially dependent on human factors, and then based on the algorithm, proposes the algorithm flow of constructing the options and applies the algorithm to the maze Among the problems. The experimental results show that the use of experiment-generated options can greatly speed up the learning efficiency.