论文部分内容阅读
Markov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper introduces an extension of MDP,namely quantum MDP (qMDP),that can serve as a mathematical model of decision making about quantum systems.We develop dy-namic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon.The results ob-tained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.