Least-Squares Temporal Difference Learning with Eligibility Traces based on Regularized Extreme Lear

来源 :第26届中国过程控制会议 | 被引量 : 0次 | 上传用户：xufei037

【摘要】

：

【作者】

：

Dazi Li Luntong Li Tianheng Song

【机构】

：

College of Information Science and Technology,Beijing University of Chemical Technology,Beijing 1000

【出处】

：

第26届中国过程控制会议

【发表日期】

：

2015年10期

【关键词】

：

Reinforcement learning Markov decision processes Function approximation Least-sq

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

　　The task of learning the value function under a fixed policy in continuous Markov decision processes(MDPs)is considered.Although ELM based approximator can achieve better performance in learning prediction problems than traditional method(such as LSTD with radial basis function)for high dimension problems,the random initialization of ELM parameters would result in fluctuating performance.In order to overcome this problem,a least-squares temporal difference algorithm with eligibility traces is proposed based on regularized extreme learning machine(RELM-LSTD(λ)).The regularized extreme learning machine(RELM)is used to approximate value functions.Furthermore,the eligibility trace term is introduced to increase data efficiency.In experiments,the performances of the proposed algorithm are demonstrated and compared with those of LSTD and ELM-LSTD.Experiment results show that the proposed algorithm can achieve a more stable and better performance in approximating the value function under a fixed policy.

其他文献

Improved K-means algorithm and priori knowledge based intelligent recognition of flotation working c

On the basis of correlation between four typical bubble status and different ore grades,a new recognition method based on the improved k-means algorithm and integrating priori knowledge is proposed fo

会议

K-Meansclusterantimony flotationrecognition of working conditions

Impacts of Feedback Control on Actuator Fault Detection

In this paper,the impacts of feedback control on actuator fault detection for closed-loop systems with model uncertainty are considered.The existing works have proved that,when the nominal model is fa

会议

Model uncertaintyFeedback controlActuator fault detection

Hybrid Control Dynamics and Switching-control of Electric Chassis Integrated Power Steering System f

Range-extender electric vehicle(RE-EV)is a new type of new energy vehicle which fills the gap between the conventional internal-combustion engine vehicle and the electric vehicle.Electric Chassis Inte

会议

Range-extende Electric VehicleElectric Chassis Integrated SystemElectric Power

Improving Performance Stability of Echo State Network by IP rule

Echo state network(ESN)is a classical reservoir computing model,but the performances of ESN vary greatly among different realizations.This paper proposes a pre-adapting mechanism-intrinsic plasticity(

会议

Intelligent Control Simulation Platform Based on Cyber Physical System for Cement Raw Meal Calcinati

In cement raw meal calcination process,product quanlity index(i.e.,raw meal decomposition ratio,hereinafter referred to as RMDR)is difficult to be measure online,and this process have complex dynamic

会议

Cyber physical systemraw meal calcination processintelligent control simulatio

Input-output Data Filter based Recursive Bayesian Algorithm with Covariance Resetting for Identifica

To identify systems with Non-uniformly sampled input data,a filter based recursive identification algorithm with covariance resetting is proposed.Using estimated noise transfer function as a dynamic f

会议

Non-Uniformly Sampled DataFilterRecursive Bayesian AlgorithmCovariance Resett

Intelligent Setting Control Method Based on CBR and iterative learning for Laminar Cooling Process

The hot rolled strip laminar cooling system is a complex industrial process,associated with features of strong nonlinear and changing operating conditions.So,the process is hard to control with tradit

会议

laminar coolingiterative learningCBR(case based reasoning)strip coiling tempe

Learning Control of Fermentation Process with an Improved DHP Algorithm

Control of the fed-batch ethanol fermentation processes to produce maximum product ethanol is one of the key issues in the bioreactor system.However,ethanol fermentation processes exhibit complex beha

会议

Dual Heuristic ProgrammingBatch ProcessEthanol Fermentation ProcessLearning C

Iterative Identification of Discrete-time Output-error Model with Time Delay

This paper investigates the problem of discrete-time(DT)model identification of industrial processes with time delay from sampled data.An iterative method is proposed to solve this problem,that is,the

会议

System identificationOutput-error modelInstrumental variable methodTime delay

Learning Personalized Thermal Comfort Profile of HVAC System Based on DENFIS

This paper introduces a novel approach to learn occupants personalized thermal comfort profile of HVAC system based on the relationship between users thermal preference and ambient environmental condi

会议

Human learningHVAC systemDENFISThermal comfort profile

Least-Squares Temporal Difference Learning with Eligibility Traces based on Regularized Extreme Lear

与本文相关的学术论文