论文部分内容阅读
Traditional reinforcement learning (RL) uses the return, also known as the expected value of cumulative random rewards, for training an agent to learn an optimal policy. However, recent research indicates that learning the distribution over returns has di