Balance Control of a Biped Robot on a Rotating Platform Based on Efficient Reinforcement Learning

来源 :自动化学报(英文版) | 被引量 : 0次 | 上传用户:nihaohaoya
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
In this work, we combined the model based rein-forcement leing (MBRL) and model free reinforcement le-ing (MFRL) to stabilize a biped robot (NAO robot) on a rotating platform, where the angular velocity of the platform is unknown for the proposed leing algorithm and treated as the extal disturbance. Nonparametric Gaussian processes normally re-quire a large number of training data points to deal with the dis-continuity of the estimated model. Although some improved method such as probabilistic inference for leing control (PILCO) does not require an explicit global model as the actions are obtained by directly searching the policy space, the overfit-ting and lack of model complexity may still result in a large devi-ation between the prediction and the real system. Besides, none of these approaches consider the data error and measurement noise during the training process and test process, respectively. We propose a hierarchical Gaussian processes (GP) models, contain-ing two layers of independent GPs, where the physically continu-ous probability transition model of the robot is obtained. Due to the physically continuous estimation, the algorithm overcomes the overfitting problem with a guaranteed model complexity, and the number of training data is also reduced. The policy for any given initial state is generated automatically by minimizing the expec-ted cost according to the predefined cost function and the ob-tained probability distribution of the state. Furthermore, a novel Q(λ) based MFRL method scheme is employed to improve the policy. Simulation results show that the proposed RL algorithm is able to balance NAO robot on a rotating platform, and it is cap-able of adapting to the platform with varying angular velocity.
其他文献
至于测量要求频率响应的任何器件或网络,这一过程要采用扫频信号发生器和检波器或扫频接收机.为了测量大的衰减值,需要采用信号电平很高或很灵敏的检波器-接收机,或者是二者
视频目标跟踪在计算机视觉领域有着广泛应用,由于目标自身和外界环境变化的复杂性和难以预知性,使得复杂场景下鲁棒实时目标跟踪成为一项亟待解决的关键问题.由于视觉信息可
动脉导管未闭是先天性心脏病中较常见的一种,手术治疗较简单且效果较好。如不及时作手术治疗则影响生长发育,容易继发细菌性动脉内膜炎,有的病例可引起肺动脉高压可造成右至
所有大型IP供应商都在为新兴国家提供IP,例如中国和印度,以及某种程度上在俄罗斯.但是,如果你是在这些国家搞设计,不要惊讶于IP公司拒绝提供自己的RTL而只提供硬内核,或者他
恶性组织细胞增生症,亦称恶性网状细胞病,自1938年Robb-Smith首次以组织细胞性髓性网状细胞增生症为名报告以来,国内外文献有关本病的报告日益增多。现将我院儿科经过反复检
Advances on bidirectional intelligence are overviewed along three threads, with extensions and new perspectives. The first thread is about bidirectional leing a
随着我国高新技术产业的快速发展,国内的产业结构发生了重大调整。对高新技术人才的需求量逐年递增。这为高等职业技术院校带来了发展的重要契机。在国家的大力支持下,我国的
中原经济区是以中原城市群为依托、主体功能规划明确、范围覆盖河南省全境以及周边地区的经济区域。该区拥有深厚的文化底蕴且地理位置优越,经济总量较大,市场发展前景好。20
A combined algorithm for the loosely fused ultra wide band (UWB) and inertial navigation system (INS)-based meas-urements is designed under the indoor human nav
作者报告接受化疗的ALL年青男性病人的妻子两次妊娠的结果。病史:18岁男性病人于1973年3月诊断为ALL。以长春新碱(2毫克/周)、强的松(75毫克/日)治疗4周,继之10天一疗程的左