论文部分内容阅读
Policy iteration,which evaluates and improves the control policy iteratively,is a reinforcement learning method.Policy evaluation with the least-squares method can draw more useful information from the empirical data and therefore improve the data validit