1
|
Zhao M, Wang D, Qiao J. Neural-network-based accelerated safe Q-learning for optimal control of discrete-time nonlinear systems with state constraints. Neural Netw 2025; 186:107249. [PMID: 39955957 DOI: 10.1016/j.neunet.2025.107249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 01/11/2025] [Accepted: 02/02/2025] [Indexed: 02/18/2025]
Abstract
For unknown nonlinear systems with state constraints, it is difficult to achieve the safe optimal control by using Q-learning methods based on traditional quadratic utility functions. To solve this problem, this article proposes an accelerated safe Q-learning (SQL) technique that addresses the concurrent requirements of safety and optimality for discrete-time nonlinear systems within an integrated framework. First, an adjustable control barrier function is designed and integrated into the cost function, aiming to facilitate the transformation of constrained optimal control problems into unconstrained cases. The augmented cost function is closely linked to the next state, enabling quicker deviation of the state from constraint boundaries. Second, leveraging offline data that adheres to safety constraints, we introduce an off-policy value iteration SQL approach for searching a safe optimal policy, thus mitigating the risk of unsafe interactions that may result from suboptimal iterative policies. Third, the vast amounts of offline data and the complex augmented cost function can hinder the learning speed of the algorithm. To address this issue, we integrate historical iteration information into the current iteration step to accelerate policy evaluation, and introduce the Nesterov Momentum technique to expedite policy improvement. Additionally, the theoretical analysis demonstrates the convergence, optimality, and safety of the SQL algorithm. Finally, under the influence of different parameters, simulation outcomes of two nonlinear systems with state constraints reveal the efficacy and advantages of the accelerated SQL approach. The proposed method requires fewer iterations while enabling the system state to converge to the equilibrium point more rapidly.
Collapse
Affiliation(s)
- Mingming Zhao
- School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Ding Wang
- School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Junfei Qiao
- School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
2
|
Ma H, Liu C, Li SE, Zheng S, Sun W, Chen J. Learn Zero-Constraint-Violation Safe Policy in Model-Free Constrained Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2327-2341. [PMID: 38231811 DOI: 10.1109/tnnls.2023.3348422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
We focus on learning the zero-constraint-violation safe policy in model-free reinforcement learning (RL). Existing model-free RL studies mostly use the posterior penalty to penalize dangerous actions, which means they must experience the danger to learn from the danger. Therefore, they cannot learn a zero-violation safe policy even after convergence. To handle this problem, we leverage the safety-oriented energy functions to learn zero-constraint-violation safe policies and propose the safe set actor-critic (SSAC) algorithm. The energy function is designed to increase rapidly for potentially dangerous actions, locating the safe set on the action space. Therefore, we can identify the dangerous actions prior to taking them and achieve zero-constraint violation. Our major contributions are twofold. First, we use the data-driven methods to learn the energy function, which releases the requirement of known dynamics. Second, we formulate a constrained RL problem to solve the zero-violation policies. We prove that our Lagrangian-based constrained RL solutions converge to the constrained optimal zero-violation policies theoretically. The proposed algorithm is evaluated on the complex simulation environments and a hardware-in-loop (HIL) experiment with a real autonomous vehicle controller. Experimental results suggest that the converged policies in all environments achieve zero-constraint violation and comparable performance with model-based baseline.
Collapse
|
3
|
Zhang C, Lin S, Wang H, Chen Z, Wang S, Kan Z. Data-Driven Safe Policy Optimization for Black-Box Dynamical Systems With Temporal Logic Specifications. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3870-3877. [PMID: 38109255 DOI: 10.1109/tnnls.2023.3339885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Learning-based policy optimization methods have shown great potential for building general-purpose control systems. However, existing methods still struggle to achieve complex task objectives while ensuring policy safety during learning and execution phases for black-box systems. To address these challenges, we develop data-driven safe policy optimization (D2SPO), a novel reinforcement learning (RL)-based policy improvement method that jointly learns a control barrier function (CBF) for system safety and a linear temporal logic (LTL) guided RL algorithm for complex task objectives. Unlike many existing works that assume known system dynamics, by carefully constructing the data sets and redesigning the loss functions of D2SPO, a provably safe CBF is learned for black-box dynamical systems, which continuously evolves for improved system safety as RL interacts with the environment. To deal with complex task objectives, we take advantage of the capability of LTL in representing the task progress and develop LTL-guided RL policy for efficient completion of various tasks with LTL objectives. Extensive numerical and experimental studies demonstrate that D2SPO outperforms most state-of-the-art (SOTA) baselines and can achieve over 95% safety rate and nearly 100% task completion rates. The experiment video is available at https://youtu.be/2RgaH-zcmkY.
Collapse
|
4
|
Du B, Xie W, Li Y, Yang Q, Zhang W, Negenborn RR, Pang Y, Chen H. Safe Adaptive Policy Transfer Reinforcement Learning for Distributed Multiagent Control. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:1939-1946. [PMID: 37917524 DOI: 10.1109/tnnls.2023.3326867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/04/2023]
Abstract
Multiagent reinforcement learning (RL) training is usually difficult and time-consuming due to mutual interference among agents. Safety concerns make an already difficult training process even harder. This study proposes a safe adaptive policy transfer RL approach for multiagent cooperative control. Specifically, a pioneer and follower off-policy policy transfer learning (PFOPT) method is presented to help follower agents acquire knowledge and experience from a single well-trained pioneer agent. Notably, the designed approach can transfer both the policy representation and sample experience provided by the pioneer policy in the off-policy learning. More importantly, the proposed method can adaptively adjust the learning weight of prior experience and exploration according to the Wasserstein distance between the policy probability distributions of the pioneer and the follower. Case studies show that the distributed agents trained by the proposed method can complete a collaborative task and acquire the maximum rewards while minimizing the violation of constraints. Moreover, the proposed method can also achieve satisfactory performance in terms of learning speed and success rate.
Collapse
|
5
|
Guo Z, Zhou Q, Ren H, Ma H, Li H. ADP-based fault-tolerant consensus control for multiagent systems with irregular state constraints. Neural Netw 2024; 180:106737. [PMID: 39316952 DOI: 10.1016/j.neunet.2024.106737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 08/03/2024] [Accepted: 09/11/2024] [Indexed: 09/26/2024]
Abstract
This paper investigates the consensus control issue for nonlinear multiagent systems (MASs) subject to irregular state constraints and actuator faults using an adaptive dynamic programming (ADP) algorithm. Unlike the regular state constraints considered in previous studies, this paper addresses irregular state constraints that may exhibit asymmetry, time variation, and can emerge or disappear during operation. By developing a system transformation method based on one-to-one state mapping, equivalent unconstrained MASs can be obtained. Subsequently, a finite-time distributed observer is designed to estimate the state information of the leader, and the consensus control problem is transformed into the tracking control problem for each agent to ensure that actuator faults of any agent cannot affect its neighboring agents. Then, a critic-only ADP-based fault tolerant control strategy, which consists of the optimal control policy for nominal system and online fault compensation for time-varying addictive faults, is proposed to achieve optimal tracking control. To enhance the learning efficiency of critic neural networks (NNs), an improved weight learning law utilizing stored historical data is employed, ensuring the convergence of critic NN weights towards ideal values under a finite excitation condition. Finally, a practical example of multiple manipulator systems is presented to demonstrate the effectiveness of the developed control method.
Collapse
Affiliation(s)
- Zijie Guo
- School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, 510665, Guangdong, China
| | - Qi Zhou
- School of Automation, Guangdong-Hong Kong Joint Laboratory for Intelligent Decision and Cooperative Control, and Guangdong Province Key Laboratory of Intelligent Decision and Cooperative Control, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China.
| | - Hongru Ren
- School of Automation, Guangdong-Hong Kong Joint Laboratory for Intelligent Decision and Cooperative Control, and Guangdong Province Key Laboratory of Intelligent Decision and Cooperative Control, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
| | - Hui Ma
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
| | - Hongyi Li
- College of Electronic and Information Engineering and Chongqing Key Laboratory of Generic Technology and System of Service Robots, Southwest University, Chongqing, 400715, Chongqing, China
| |
Collapse
|
6
|
Zhang D, Wang Y, Meng L, Yan J, Qin C. Adaptive critic design for safety-optimal FTC of unknown nonlinear systems with asymmetric constrained-input. ISA TRANSACTIONS 2024; 155:309-318. [PMID: 39306561 DOI: 10.1016/j.isatra.2024.09.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 09/11/2024] [Accepted: 09/11/2024] [Indexed: 12/13/2024]
Abstract
Safe fault tolerant control is one of the key technologies to improve the reliability of dynamic complex nonlinear systems with limited inputs, which is hard to solve and definitely a great challenge to tackle. Thus the paper presents a novel safety-optimal FTC (Fault Tolerant Control) approach for a category of completely unknown nonlinear systems incorporating actuator fault and asymmetric constrained-input, which can guarantee the system's operation within a safe range while showcasing optimal performance. Firstly, a CBF (Control Barrier Function) is incorporated into the cost function to penalize unsafe behaviors, and then we translate the intractable safety-optimal FTC problem into a differential ZSG (Zero-Sum Game) problem by defining the control input and the actuator fault as two opposing sides. Secondly, a neural-network-based identifier is employed to reconstruct system dynamics using system data, and the resolution of handling asymmetric constrained-input with the introduced non-quadratic cost function is achieved through the design of an adaptive critic scheme, aiming to reduce computational expenses accordingly. Finally, through the theoretical stability analysis, it is demonstrated that all signals in the closed-loop system are consistently UUB (Uniformly Ultimately Bounded). Furthermore, the proposed method's effectiveness is also verified in the simulation experiments conducted on a model of a single-link robotic arm system with actuator failure. The result shows that the algorithm can fulfill the safety-optimal demand of fault tolerant control in fault system with asymmetric constrained-input.
Collapse
Affiliation(s)
- Dehua Zhang
- School of Artificial Intelligence, Henan University, Zhengzhou, 450046, China.
| | - Yuchen Wang
- School of Artificial Intelligence, Henan University, Zhengzhou, 450046, China.
| | - Lei Meng
- School of Artificial Intelligence, Henan University, Zhengzhou, 450046, China.
| | - Jiayuan Yan
- School of Artificial Intelligence, Henan University, Zhengzhou, 450046, China.
| | - Chunbin Qin
- School of Artificial Intelligence, Henan University, Zhengzhou, 450046, China.
| |
Collapse
|
7
|
Guo Z, Li H, Ma H, Meng W. Distributed Optimal Attitude Synchronization Control of Multiple QUAVs via Adaptive Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8053-8063. [PMID: 36446013 DOI: 10.1109/tnnls.2022.3224029] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
This article proposes a distributed optimal attitude synchronization control strategy for multiple quadrotor unmanned aerial vehicles (QUAVs) through the adaptive dynamic programming (ADP) algorithm. The attitude systems of QUAVs are modeled as affine nominal systems subject to parameter uncertainties and external disturbances. Considering attitude constraints in complex flying environments, a one-to-one mapping technique is utilized to transform the constrained systems into equivalent unconstrained systems. An improved nonquadratic cost function is constructed for each QUAV, which reflects the requirements of robustness and the constraints of control input simultaneously. To overcome the issue that the persistence of excitation (PE) condition is difficult to meet, a novel tuning rule of critic neural network (NN) weights is developed via the concurrent learning (CL) technique. In terms of the Lyapunov stability theorem, the stability of the closed-loop system and the convergence of critic NN weights are proved. Finally, simulation results on multiple QUAVs show the effectiveness of the proposed control strategy.
Collapse
|
8
|
Jiang H, Zhou B, Duan GR. Modified λ-Policy Iteration Based Adaptive Dynamic Programming for Unknown Discrete-Time Linear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:3291-3301. [PMID: 37027626 DOI: 10.1109/tnnls.2023.3244934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In this article, the λ -policy iteration ( λ -PI) method for the optimal control problem of discrete-time linear systems is reconsidered and restated from a novel aspect. First, the traditional λ -PI method is recalled, and some new properties of the traditional λ -PI are proposed. Based on these new properties, a modified λ -PI algorithm is introduced with its convergence proven. Compared with the existing results, the initial condition is further relaxed. The data-driven implementation is then constructed with a new matrix rank condition for verifying the feasibility of the proposed data-driven implementation. A simulation example verifies the effectiveness of the proposed method.
Collapse
|
9
|
Zhu P, Jin S, Bu X, Hou Z. Improved Model-Free Adaptive Control for MIMO Nonlinear Systems With Event-Triggered Transmission Scheme and Quantization. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:5867-5880. [PMID: 36170394 DOI: 10.1109/tcyb.2022.3203036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this article, an improved model-free adaptive control (iMFAC) is proposed for discrete-time multi-input multioutput (MIMO) nonlinear systems with an event-triggered transmission scheme and quantization (ETQ). First, an event-triggered scheme is designed, and the structure of the uniform quantizer with an encoding-decoding mechanism is given. With the concept of partial form dynamic linearization based on event-triggered and quantization (PFDL-ETQ), a linearized data model of the MIMO nonlinear system is constructed. Then, an improved model-free adaptive controller with the ETQ process is designed. By this design, the update of the pseudo partitioned Jacobean matrix (PPJM) estimates and control inputs occurs only when the trigger conditions are met, which reduces the network transmission burden and saves the computing resources. Theoretical analysis shows that the proposed iMFAC with the ETQ process can achieve a bounded convergence of tracking error. Finally, a numerical simulation and a biaxial gantry motor contour tracking control system simulation are given to illustrate the feasibility of the proposed iMFAC method with the ETQ process.
Collapse
|
10
|
Wang D, Ren J, Ha M, Qiao J. System Stability of Learning-Based Linear Optimal Control With General Discounted Value Iteration. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6504-6514. [PMID: 34986105 DOI: 10.1109/tnnls.2021.3137524] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
For discounted optimal regulation design, the stability of the controlled system is affected by the discount factor. If an inappropriate discount factor is employed, the optimal control policy might be unstabilizing. Therefore, in this article, the effect of the discount factor on the stabilization of control strategies is discussed. We develop the system stability criterion and the selection rules of the discount factor with respect to the linear quadratic regulator problem under the general discounted value iteration algorithm. Based on the monotonicity of the value function sequence, the method to judge the stability of the controlled system is established during the iteration process. In addition, once some stability conditions are satisfied at a certain iteration step, all control policies after this iteration step are stabilizing. Furthermore, combined with the undiscounted optimal control problem, the practical rule of how to select an appropriate discount factor is constructed. Finally, several simulation examples with physical backgrounds are conducted to demonstrate the present theoretical results.
Collapse
|
11
|
Qin C, Wu Y, Zhang J, Zhu T. Reinforcement Learning-Based Decentralized Safety Control for Constrained Interconnected Nonlinear Safety-Critical Systems. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1158. [PMID: 37628188 PMCID: PMC10453656 DOI: 10.3390/e25081158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/21/2023] [Accepted: 07/01/2023] [Indexed: 08/27/2023]
Abstract
This paper addresses the problem of decentralized safety control (DSC) of constrained interconnected nonlinear safety-critical systems under reinforcement learning strategies, where asymmetric input constraints and security constraints are considered. To begin with, improved performance functions associated with the actuator estimates for each auxiliary subsystem are constructed. Then, the decentralized control problem with security constraints and asymmetric input constraints is transformed into an equivalent decentralized control problem with asymmetric input constraints using the barrier function. This approach ensures that safety-critical systems operate and learn optimal DSC policies within their safe global domains. Then, the optimal control strategy is shown to ensure that the entire system is uniformly ultimately bounded (UUB). In addition, all signals in the closed-loop auxiliary subsystem, based on Lyapunov theory, are uniformly ultimately bounded, and the effectiveness of the designed method is verified by practical simulation.
Collapse
Affiliation(s)
- Chunbin Qin
- School of Artificial Intelligence, Henan University, Zhengzhou 450046, China; (C.Q.); (Y.W.); (T.Z.)
| | - Yinliang Wu
- School of Artificial Intelligence, Henan University, Zhengzhou 450046, China; (C.Q.); (Y.W.); (T.Z.)
| | - Jishi Zhang
- School of Software, Henan University, Kaifeng 475000, China
| | - Tianzeng Zhu
- School of Artificial Intelligence, Henan University, Zhengzhou 450046, China; (C.Q.); (Y.W.); (T.Z.)
| |
Collapse
|
12
|
Qin C, Jiang K, Zhang J, Zhu T. Critic Learning-Based Safe Optimal Control for Nonlinear Systems with Asymmetric Input Constraints and Unmatched Disturbances. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1101. [PMID: 37510048 PMCID: PMC10378920 DOI: 10.3390/e25071101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 07/01/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023]
Abstract
In this paper, the safe optimal control method for continuous-time (CT) nonlinear safety-critical systems with asymmetric input constraints and unmatched disturbances based on the adaptive dynamic programming (ADP) is investigated. Initially, a new non-quadratic form function is implemented to effectively handle the asymmetric input constraints. Subsequently, the safe optimal control problem is transformed into a two-player zero-sum game (ZSG) problem to suppress the influence of unmatched disturbances, and a new Hamilton-Jacobi-Isaacs (HJI) equation is introduced by integrating the control barrier function (CBF) with the cost function to penalize unsafe behavior. Moreover, a damping factor is embedded in the CBF to balance safety and optimality. To obtain a safe optimal controller, only one critic neural network (CNN) is utilized to tackle the complex HJI equation, leading to a decreased computational load in contrast to the utilization of the conventional actor-critic network. Then, the system state and the parameters of the CNN are uniformly ultimately bounded (UUB) through the application of the Lyapunov stability method. Lastly, two examples are presented to confirm the efficacy of the presented approach.
Collapse
Affiliation(s)
- Chunbin Qin
- School of Artificial Intelligence, Henan University, Zhengzhou 450000, China
| | - Kaijun Jiang
- School of Artificial Intelligence, Henan University, Zhengzhou 450000, China
| | - Jishi Zhang
- School of Software, Henan University, Kaifeng 475000, China
| | - Tianzeng Zhu
- School of Artificial Intelligence, Henan University, Zhengzhou 450000, China
| |
Collapse
|
13
|
Safe reinforcement learning for discrete-time fully cooperative games with partial state and control constraints using control barrier functions. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2022.10.058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
14
|
Yang Y, Modares H, Vamvoudakis KG, He W, Xu CZ, Wunsch DC. Hamiltonian-Driven Adaptive Dynamic Programming With Approximation Errors. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13762-13773. [PMID: 34495864 DOI: 10.1109/tcyb.2021.3108034] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, we consider an iterative adaptive dynamic programming (ADP) algorithm within the Hamiltonian-driven framework to solve the Hamilton-Jacobi-Bellman (HJB) equation for the infinite-horizon optimal control problem in continuous time for nonlinear systems. First, a novel function, "min-Hamiltonian," is defined to capture the fundamental properties of the classical Hamiltonian. It is shown that both the HJB equation and the policy iteration (PI) algorithm can be formulated in terms of the min-Hamiltonian within the Hamiltonian-driven framework. Moreover, we develop an iterative ADP algorithm that takes into consideration the approximation errors during the policy evaluation step. We then derive a sufficient condition on the iterative value gradient to guarantee closed-loop stability of the equilibrium point as well as convergence to the optimal value. A model-free extension based on an off-policy reinforcement learning (RL) technique is also provided. Finally, numerical results illustrate the efficacy of the proposed framework.
Collapse
|
15
|
Safe Reinforcement Learning for Affine Nonlinear Systems with State Constraints and Input Saturation Using Control Barrier Functions. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
16
|
Ran M, Li J, Xie L. Reinforcement-Learning-Based Disturbance Rejection Control for Uncertain Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9621-9633. [PMID: 33729973 DOI: 10.1109/tcyb.2021.3060736] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article investigates the reinforcement-learning (RL)-based disturbance rejection control for uncertain nonlinear systems having nonsimple nominal models. An extended state observer (ESO) is first designed to estimate the system state and the total uncertainty, which represents the perturbation to the nominal system dynamics. Based on the output of the observer, the control compensates for the total uncertainty in real time, and simultaneously, online approximates the optimal policy for the compensated system using a simulation of experience-based RL technique. Rigorous theoretical analysis is given to show the practical convergence of the system state to the origin and the developed policy to the ideal optimal policy. It is worth mentioning that the widely used restrictive persistence of excitation (PE) condition is not required in the established framework. Simulation results are presented to illustrate the effectiveness of the proposed method.
Collapse
|
17
|
Jiang R, Wang Z, He B, Zhou Y, Li G, Zhu Z. A data-efficient goal-directed deep reinforcement learning method for robot visuomotor skill. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.08.023] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
18
|
Yang Y, Zhu H, Zhang Q, Zhao B, Li Z, Wunsch DC. Sparse online kernelized actor-critic Learning in reproducing kernel Hilbert space. Artif Intell Rev 2021. [DOI: 10.1007/s10462-021-10045-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
19
|
Sliding mode-based online fault compensation control for modular reconfigurable robots through adaptive dynamic programming. COMPLEX INTELL SYST 2021. [DOI: 10.1007/s40747-021-00364-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractIn this paper, a sliding mode (SM)-based online fault compensation control scheme is investigated for modular reconfigurable robots (MRRs) with actuator failures via adaptive dynamic programming. It consists of a SM-based iterative controller, an adaptive robust term and an online fault compensator. For fault-free MRR systems, the SM surface-based Hamilton–Jacobi–Bellman equation is solved by online policy iteration algorithm. The adaptive robust term is added to guarantee the reachable condition of SM surface. For faulty MRR systems, the actuator failure is compensated online to avoid the fault detection and isolation mechanism. The closed-loop MRR system is guaranteed to be asymptotically stable under the developed fault compensation control scheme. Simulation results verify the effectiveness of the present fault compensation control approach.
Collapse
|
20
|
Yang Y, Liu Z, Xiong H, Yin Y. Adaptive singularity-free controller design of constrained nonlinear systems with prescribed performance. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.07.029] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|