论文部分内容阅读
In this talk we introduce a mean-variance problem for Markov decision processes (MDPs).Different from the usual goal in MDPs toward an optimal policy for a specified class of policies, we aim to obtain a so-called mean-variance optimal policy that minimizes the variance over a set of policies with a given expected reward.