Conflict-Aware Safe Reinforcement Learning:A Meta-Cognitive Learning Framework

来源 :自动化学报(英文版) | 被引量 : 0次 | 上传用户:jzlh6890
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
In this paper, a data-driven conflict-aware safe reinforcement learning (CAS-RL) algorithm is presented for control of autonomous systems. Existing safe RL results with pre-defined performance functions and safe sets can only provide safety and performance guarantees for a single environment or circumstance. By contrast, the presented CAS-RL algorithm provides safety and performance guarantees across a variety of circumstances that the system might encounter. This is achieved by utilizing a bilevel learning control architecture: A higher meta-cognitive layer leverages a data-driven receding-horizon attentional controller (RHAC) to adapt relative attention to different system\'s safety and performance requirements, and, a lower-layer RL controller designs control actuation signals for the system. The presented RHAC makes its meta decisions based on the reaction curve of the lower-layer RL controller using a meta-model or knowledge. More specifically, it leverages a prediction meta-model (PMM) which spans the space of all future meta trajectories using a given finite number of past meta trajectories. RHAC will adapt the system\'s aspiration towards performance metrics (e.g., performance weights) as well as safety boundaries to resolve conflicts that arise as mission scenarios develop. This will guarantee safety and feasibility (i.e., performance boundness) of the lower-layer RL-based control solution. It is shown that the interplay between the RHAC and the lower-layer RL controller is a bilevel optimization problem for which the leader (RHAC) operates at a lower rate than the follower (RL-based controller) and its solution guarantees feasibility and safety of the control solution. The effectiveness of the proposed framework is verified through a simulation example.
其他文献
With the booming of cyber attacks and cyber criminals against cyber-physical systems (CPSs), detecting these attacks remains challenging. It might be the worst of times, but it might be the best of times because of opportunities brought by machine learnin
The paper deals with the consensus problem in a leaderless network of agents that have to reach a common velocity while forming a uniformly spaced string. Moreover, the final common velocity (reference velocity) is determined by the agents in a distribute
The event-triggered fault accommodation problem for a class of nonlinear uncertain systems is considered in this paper. The control signal transmission from the controller to the system is determined by an event-triggering scheme with relative and constan
Sliding mode control (SMC) has been studied since the 1950s and widely used in practical applications due to its insensitivity to matched disturbances. The aim of this paper is to present a review of SMC describing the key developments and examining the n
In this paper, we review and analyze intrusion detection systems for Agriculture 4.0 cyber security. Specifically, we present cyber security threats and evaluation metrics used in the performance evaluation of an intrusion detection system for Agriculture
This paper investigates the stabilization of underactuated vehicles moving in a three-dimensional vector space. The vehicle\'s model is established on the matrix Lie group SE(3), which describes the configuration of rigid bodies globally and uniquely. W
Sampling-based planning algorithms play an important role in high degree-of-freedom motion planning (MP) problems, in which rapidly-exploring random tree (RRT) and the faster bidirectional RRT (named RRT-Connect) algorithms have achieved good results in m
This paper presents learning-enabled barrier-certified safe controllers for systems that operate in a shared environment for which multiple systems with uncertain dynamics and behaviors interact. That is, safety constraints are imposed by not only the ego
This paper shows that the aerodynamic effects can be compensated in a quadrotor system by means of a control allocation approach using neural networks. Thus, the system performance can be improved by replacing the classic allocation matrix, without using
Traditional cubature Kalman filter (CKF) is a preferable tool for the inertial navigation system (INS)/global positioning system (GPS) integration under Gaussian noises. The CKF, however, may provide a significantly biased estimate when the INS/GPS system