Convergence analysis of an incremental approach to online inverse reinforcement learning

来源 :Journal of Zhejiang University-Science C(Computers & Electro | 被引量 : 0次 | 上传用户：wshzzhy

【摘要】

：

Interest in inverse reinforcement learning (IRL) has recently increased,that is,interest in the problem of recovering the reward function underlying a Markov de

【出处】

：

Journal of Zhejiang University-Science C(Computers & Electro

【发表日期】

：

2011年01期

【关键词】

：

incremental reinforcement inverse correcting bounds proof mistake recover demons

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

Interest in inverse reinforcement learning (IRL) has recently increased,that is,interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert.This paper deals with an incremental approach to online IRL.First,the convergence property of the incremental method for the IRL problem was investigated,and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof.Then an online algorithm based on incremental error correcting was derived to deal with the IRL problem.The key idea is to add an increment to the current reward estimate each time an action mismatch occurs.This leads to an estimate that approaches a target optimal value.The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function. Interest in inverse reinforcement learning (IRL) has recently increased, that is, interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert. This paper deals with an incremental approach to online IRL.First, the convergence property of the incremental method for the IRL problem was investigated, and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof. Chen an online algorithm based on incremental error correcting was derived to deal with the IRL problem. The key idea is to add an increment to the current reward estimate each time an action mismatch occurs. This leads to an estimate that approaches a target optimal value. The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function.

其他文献

平流层飞艇轨迹优化设计方法研究

平流层飞艇是能够携带一定有效载荷、在平流层高度进行长时间定点悬停的工作平台。平流层飞艇作为超高空平台，飞行任务包括放飞和回收、定点悬停、巡航等。由于平流层大气环境

学位

平流层飞艇六自由度数学模型轨迹优化混合遗传算法

基于DE_BBO_ELM和多粒度特征的虹膜识别算法研究

虹膜具有丰富的、独特的纹理特征，而且与其他生物特征相比，虹膜特征更稳定、更可靠，并且不具有侵犯性，所以非常适用于身份鉴别。虹膜识别就是基于眼睛中的虹膜来进行身份识别，被认

学位

虹膜识别多粒度特征极限学习机灰度共生矩阵支持向量机

DCS通信及相关软件设计和系统调试

本文阐述了DCS通信及相关软件设计和系统调试的技术细节。系统主要由现场控制站、工程师站、操作员站以及现场被控对象等组成。现场控制站以MIC-2000为硬件基础,主要实现实时

学位

监控系统Windows CE嵌入式系统控制算法

促进新疆经济高质量发展的有效路径研究

目前,我国经济正处在飞速发展的时代.尤其是在信息化的新时期,为了提高经济效益,实现高质量发展,我国经济正在由“有没有”转向“好不好”的重要时期,其经济结构体系在新旧时

期刊

新疆经济高质量发展经济结构转变

关于如何保护未成年被害人身心建康的司法实践

近年来,未成年人犯罪的比例逐年增高,同时未成年人受到侵害的机率也在不断增大.因此,对未成年被害人的保护具有紧迫性和必要性.rn近年来,大同市浑源县人民法院审理的多起抢劫

期刊

保护未成年被害人未成年人自杀严重暴力犯罪未成年人犯罪学校刑事案件心理创伤伤害抢劫强奸父母法院审理财物住校生生活费大同市阴影学

节水智能系统设计

本文根据中国及世界的水资源状况,阐述了节水的必要性和意义。进一步讲述了如何利用节水智能系统来节约水资源以及这一系统的组成原理,工作原理。 Based on the water resou

期刊

智能系统节水水位传感器可编程控制器磁阀语音报警器水资源状况智能磁阀门水资源节约用水

基于过分割的自适应精匹配算法研究

利用双目立体视觉进行场景三维重建是计算机视觉的一个热门研究领域。其主要包含标定、匹配、重建三个环节，其中又以匹配问题最为困难。同时，匹配也是计算机视觉中的基础问题，往

学位

双目视觉匹配分割置信度传播高阶马尔可夫随机场

基于径向基函数神经网络的白炭黑反应釜过程控制系统的研究与设计

本课题是针对国内白炭黑工业生产中，单一的方式生产白炭黑致使其生产效率低下、精度低、难度大等问题而提出的。　　虽然中国白炭黑工业发展已超过三十年，在总生产能力、产品品

学位

径向基函数神经网络白炭黑反应釜过程控制系统生产效率

郝景芳:科幻是另一种写实

壹认识郝景芳是在2013年对“新概念”作家群的采访中。不同于其他同龄作家,郝景芳拿了“新概念”以后放弃了文科加分,而是选择理科考入了清华大学物理系,直到本科毕业以后,才

期刊

作家群清华大学概念物理系选择文科科考科幻毕业本科北京

基于聚焦爬虫的网上药品信息监测系统

近年来,随着互联网的飞速发展,网络已成为人们获取信息、传递信息的重要途径,随之而来的是网络信息呈指数级的爆炸性增长。互联网的发展虽然极大地方便了人们的生活,但由于其

学位

聚焦爬虫药品信息监测页面搜索算法相关度分析算法

Convergence analysis of an incremental approach to online inverse reinforcement learning

与本文相关的学术论文