Convergence analysis of an incremental approach to online inverse reinforcement learning

来源 :Journal of Zhejiang University-Science C(Computers & Electro | 被引量 : 0次 | 上传用户:wshzzhy
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Interest in inverse reinforcement learning (IRL) has recently increased,that is,interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert.This paper deals with an incremental approach to online IRL.First,the convergence property of the incremental method for the IRL problem was investigated,and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof.Then an online algorithm based on incremental error correcting was derived to deal with the IRL problem.The key idea is to add an increment to the current reward estimate each time an action mismatch occurs.This leads to an estimate that approaches a target optimal value.The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function. Interest in inverse reinforcement learning (IRL) has recently increased, that is, interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert. This paper deals with an incremental approach to online IRL.First, the convergence property of the incremental method for the IRL problem was investigated, and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof. Chen an online algorithm based on incremental error correcting was derived to deal with the IRL problem. The key idea is to add an increment to the current reward estimate each time an action mismatch occurs. This leads to an estimate that approaches a target optimal value. The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function.
其他文献
平流层飞艇是能够携带一定有效载荷、在平流层高度进行长时间定点悬停的工作平台。平流层飞艇作为超高空平台,飞行任务包括放飞和回收、定点悬停、巡航等。由于平流层大气环境
虹膜具有丰富的、独特的纹理特征,而且与其他生物特征相比,虹膜特征更稳定、更可靠,并且不具有侵犯性,所以非常适用于身份鉴别。虹膜识别就是基于眼睛中的虹膜来进行身份识别,被认
本文阐述了DCS通信及相关软件设计和系统调试的技术细节。系统主要由现场控制站、工程师站、操作员站以及现场被控对象等组成。现场控制站以MIC-2000为硬件基础,主要实现实时
目前,我国经济正处在飞速发展的时代.尤其是在信息化的新时期,为了提高经济效益,实现高质量发展,我国经济正在由“有没有”转向“好不好”的重要时期,其经济结构体系在新旧时
近年来,未成年人犯罪的比例逐年增高,同时未成年人受到侵害的机率也在不断增大.因此,对未成年被害人的保护具有紧迫性和必要性.rn近年来,大同市浑源县人民法院审理的多起抢劫
本文根据中国及世界的水资源状况,阐述了节水的必要性和意义。进一步讲述了如何利用节水智能系统来节约水资源以及这一系统的组成原理,工作原理。 Based on the water resou
利用双目立体视觉进行场景三维重建是计算机视觉的一个热门研究领域。其主要包含标定、匹配、重建三个环节,其中又以匹配问题最为困难。同时,匹配也是计算机视觉中的基础问题,往
本课题是针对国内白炭黑工业生产中,单一的方式生产白炭黑致使其生产效率低下、精度低、难度大等问题而提出的。  虽然中国白炭黑工业发展已超过三十年,在总生产能力、产品品
壹认识郝景芳是在2013年对“新概念”作家群的采访中。不同于其他同龄作家,郝景芳拿了“新概念”以后放弃了文科加分,而是选择理科考入了清华大学物理系,直到本科毕业以后,才
近年来,随着互联网的飞速发展,网络已成为人们获取信息、传递信息的重要途径,随之而来的是网络信息呈指数级的爆炸性增长。互联网的发展虽然极大地方便了人们的生活,但由于其