Experience Replay for Least-Squares Policy Iteration

来源 :自动化学报：英文版 | 被引量 : 0次 | 上传用户：macgrady2006

【摘要】

：

Policy iteration,which evaluates and improves the control policy iteratively,is a reinforcement learning method.Policy evaluation with the least-squares method

【作者】

：

Quan Liu Xin Zhou Fei Zhu Qimi

【机构】

：

theSchoolofComputerScienceandTechnology,theKeyLaboratoryofSymbolicComputationandKnowledgeEngineering

【出处】

：

自动化学报：英文版

【发表日期】

：

2014年3期

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

Policy iteration,which evaluates and improves the control policy iteratively,is a reinforcement learning method.Policy evaluation with the least-squares method can draw more useful information from the empirical data and therefore improve the data validit

其他文献

对保险代理人基本资格考试培训工作的思考

在保险行业，培训工作，尤其是保险代理人基本资格考试培训工作，是影响公司业绩的最主要因素之一，优秀的考前培训可以大大加强保险公司的市场竞争力。所以，各家公司都在保险代理人资

期刊

保险代理人资格考试培训工作保险业中国保险监督管理委员会

Relief Materials Vehicles Planning in Natural Disasters

期刊

长白山植物区系项目建设调查分析

通过对长白山区系延边朝鲜族自治州项目建设内容的调查分析，提出项目建设对环境和林业发展的定性影响，为今后长自山区系的可持续发展提供科学依据．

期刊

长白山项目建设生态环境Changbai Mountain project construction ecological environment

Bad-scenario-set Robust Optimization Framework With Two Objectives for Uncertain Scheduling Systems

This paper proposes a robust optimization framework generally for scheduling systems subject to uncertain input data, which is described by discrete scenarios.

期刊

Fractional Modeling and SOC Estimation of Lithium-ion Battery

This paper proposes a state of charge(SOC) estimator of Lithium-ion battery based on a fractional order impedance spectra model. Firstly, a battery fractional o

期刊

太平洋安泰人寿保险有限公司拥有国内保险业首位国际“金融风险管理师”

日前，太平洋安泰人寿保险有限公司投资部毛kai先生正式被国际金融风险投资专家协会(the Global Association of Risk Professionals——GARP)授予“金融风险管理师”称号(Fina

期刊

太平洋安泰人寿保险有限公司金融风险管理师毛恺中围GARP

Road Pricing Design Based on Game Theory and Multi-agent Consensus

Consensus theory and noncooperative game theory respectively deal with cooperative and noncooperative interactions among multiple players/agents. They provide a

期刊

AVERAGEstrategyfictitiousplay(ASFP)gametheoryMULTI-AGENTCONSENSUSROADPRI

新险种的知识产权保护问题分析

<正> 现阶段,保险公司为争取产品竞争的优势地位,投入大量的人力、物力,不断研制、开发出新型的保险服务产品。因而,新设计的保险险种的条款和相关费率规章是属于开发该险种

期刊

保险公司险种开发险种知识产权保护

依法给付保险金是保险公司应尽的义务

期刊

保险公司义务保险金给付理赔

长白山区野生软枣猕猴桃种质RAPD分析

以长白山区8个县市30个取样点采集的软枣猕猴桃样品叶片为试验材料，利用RAPD技术对其进行遗传多样性分析．用14个RAPD引物共扩增出104条带，其中多态带占82％．聚类分析结果表明，长白山

期刊

长白山区软枣猕猴桃遗传多样性RAPDChangbai Mountain area A. arguta genetic diversityRAPD

Experience Replay for Least-Squares Policy Iteration

与本文相关的学术论文