Least-Squares Temporal Difference Learning with Eligibility Traces based on Regularized Extreme Lear

来源 :第26届中国过程控制会议 | 被引量 : 0次 | 上传用户:xufei037
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  The task of learning the value function under a fixed policy in continuous Markov decision processes(MDPs)is considered.Although ELM based approximator can achieve better performance in learning prediction problems than traditional method(such as LSTD with radial basis function)for high dimension problems,the random initialization of ELM parameters would result in fluctuating performance.In order to overcome this problem,a least-squares temporal difference algorithm with eligibility traces is proposed based on regularized extreme learning machine(RELM-LSTD(λ)).The regularized extreme learning machine(RELM)is used to approximate value functions.Furthermore,the eligibility trace term is introduced to increase data efficiency.In experiments,the performances of the proposed algorithm are demonstrated and compared with those of LSTD and ELM-LSTD.Experiment results show that the proposed algorithm can achieve a more stable and better performance in approximating the value function under a fixed policy.
其他文献
On the basis of correlation between four typical bubble status and different ore grades,a new recognition method based on the improved k-means algorithm and integrating priori knowledge is proposed fo
In this paper,the impacts of feedback control on actuator fault detection for closed-loop systems with model uncertainty are considered.The existing works have proved that,when the nominal model is fa
Range-extender electric vehicle(RE-EV)is a new type of new energy vehicle which fills the gap between the conventional internal-combustion engine vehicle and the electric vehicle.Electric Chassis Inte
Echo state network(ESN)is a classical reservoir computing model,but the performances of ESN vary greatly among different realizations.This paper proposes a pre-adapting mechanism-intrinsic plasticity(
会议
In cement raw meal calcination process,product quanlity index(i.e.,raw meal decomposition ratio,hereinafter referred to as RMDR)is difficult to be measure online,and this process have complex dynamic
To identify systems with Non-uniformly sampled input data,a filter based recursive identification algorithm with covariance resetting is proposed.Using estimated noise transfer function as a dynamic f
The hot rolled strip laminar cooling system is a complex industrial process,associated with features of strong nonlinear and changing operating conditions.So,the process is hard to control with tradit
Control of the fed-batch ethanol fermentation processes to produce maximum product ethanol is one of the key issues in the bioreactor system.However,ethanol fermentation processes exhibit complex beha
This paper investigates the problem of discrete-time(DT)model identification of industrial processes with time delay from sampled data.An iterative method is proposed to solve this problem,that is,the
This paper introduces a novel approach to learn occupants personalized thermal comfort profile of HVAC system based on the relationship between users thermal preference and ambient environmental condi