1
|
Xue S, Zhang W, Luo B, Liu D. Integral Reinforcement Learning-Based Dynamic Event-Triggered Nonzero-Sum Games of USVs. IEEE TRANSACTIONS ON CYBERNETICS 2025; 55:1706-1716. [PMID: 40031610 DOI: 10.1109/tcyb.2025.3533139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
In this article, an integral reinforcement learning (IRL) method is developed for dynamic event-triggered nonzero-sum (NZS) games to achieve the Nash equilibrium of unmanned surface vehicles (USVs) with state and input constraints. Initially, a mapping function is designed to map the state and control of the USV into a safe environment. Subsequently, IRL-based coupled Hamilton-Jacobi equations, which avoid dependence on system dynamics, are derived to solve the Nash equilibrium. To conserve computational resources and reduce network transmission burdens, a static event-triggered control is initially designed, followed by the development of a more flexible dynamic form. Finally, a critic neural network is designed for each player to approximate its value function and control policy. Rigorous proofs are provided for the uniform ultimate boundedness of the state and the weight estimation errors. The effectiveness of the present method is demonstrated through simulation experiments.
Collapse
|
2
|
Xin P, Wang D, Liu A, Qiao J. Neural critic learning with accelerated value iteration for nonlinear model predictive control. Neural Netw 2024; 176:106364. [PMID: 38754288 DOI: 10.1016/j.neunet.2024.106364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 01/27/2024] [Accepted: 04/30/2024] [Indexed: 05/18/2024]
Abstract
In practical industrial processes, the receding optimization solution of nonlinear model predictive control (NMPC) is always a very knotty problem. Based on adaptive dynamic programming, the accelerated value iteration predictive control (AVI-PC) algorithm is developed in this paper. Integrating iteration learning with the receding horizon mechanism of NMPC, a novel receding optimization solution pattern is exploited to resolve the optimal control law in each prediction horizon. Besides, the basic architecture and the specific form of the AVI-PC algorithm are demonstrated, including the relationship among the iterative learning process, the prediction process, and the control process. On this basis, the convergence and admissibility conditions are established, and the relevant properties are comprehensively analyzed when the accelerated factor satisfies the established conditions. Furthermore, the accelerated value iterative function is approximated through the single critic network constructed by utilizing the multiple linear regression method. Finally, the plentiful simulation experiments are conducted from various perspectives to verify the effectiveness and progressiveness of the AVI-PC algorithm.
Collapse
Affiliation(s)
- Peng Xin
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Ding Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Ao Liu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Junfei Qiao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
3
|
Zhang H, Zhao X, Wang H, Zong G, Xu N. Hierarchical Sliding-Mode Surface-Based Adaptive Actor-Critic Optimal Control for Switched Nonlinear Systems With Unknown Perturbation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:1559-1571. [PMID: 35834452 DOI: 10.1109/tnnls.2022.3183991] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This article studies the hierarchical sliding-mode surface (HSMS)-based adaptive optimal control problem for a class of switched continuous-time (CT) nonlinear systems with unknown perturbation under an actor-critic (AC) neural networks (NNs) architecture. First, a novel perturbation observer with a nested parameter adaptive law is designed to estimate the unknown perturbation. Then, by constructing an especial cost function related to HSMS, the original control issue is further converted into the problem of finding a series of optimal control policies. The solution to the HJB equation is identified by the HSMS-based AC NNs, where the actor and critic updating laws are developed to implement the reinforcement learning (RL) strategy simultaneously. The critic update law is designed via the gradient descent approach and the principle of standardization, such that the persistence of excitation (PE) condition is no longer needed. Based on the Lyapunov stability theory, all the signals of the closed-loop switched nonlinear systems are strictly proved to be bounded in the sense of uniformly ultimate boundedness (UUB). Finally, the simulation results are presented to verify the validity of the proposed adaptive optimal control scheme.
Collapse
|
4
|
Zhao B, Zhang Y, Liu D. Adaptive Dynamic Programming-Based Cooperative Motion/Force Control for Modular Reconfigurable Manipulators: A Joint Task Assignment Approach. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10944-10954. [PMID: 35544490 DOI: 10.1109/tnnls.2022.3171828] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This article develops a cooperative motion/force control (CMFC) scheme based on adaptive dynamic programming (ADP) for modular reconfigurable manipulators (MRMs) with the joint task assignment approach. By separating terms depending on local variables only, the dynamic model of the entire MRM system can be regarded as a set of joint modules interconnected by coupling torque. In addition, the Jacobian matrix, which reflects the interaction force of the MRM end-effector, can be mapped into each joint. Using this approach, both the motion and force tasks on the end-effector of the entire MRM system can be assigned to each joint module cooperatively. Then, by substituting the actual states of coupled joint modules with their desired ones, the norm-boundedness assumption on the interconnection of joint module can be relaxed. By using the measured input-output data of each joint module, a neural network (NN)-based robust decentralized observer, which guarantees the observation error to be asymptotically stable is established. An improved local value function is constructed for each joint module to reflect the interconnection. Then, the local Hamilton-Jacobi-Bellman equation is solved by constructing a local critic NN with a nested learning structure. Hereafter, the ADP-based CMFC is obtained by the assistance of force feedback compensation. Based on the Lyapunov stability analysis, the closed-loop MRM system is guaranteed to be uniformly ultimately bounded under the present ADP-based CMFC scheme. The simulation on a two-degree of freedom MRM system demonstrates the effectiveness of the present control approach.
Collapse
|
5
|
Zhu L, Guo P, Wei Q. Synergetic learning for unknown nonlinear H ∞ control using neural networks. Neural Netw 2023; 168:287-299. [PMID: 37774514 DOI: 10.1016/j.neunet.2023.09.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 08/24/2023] [Accepted: 09/15/2023] [Indexed: 10/01/2023]
Abstract
The well-known H∞ control design gives robustness to a controller by rejecting perturbations from the external environment, which is difficult to do for completely unknown affine nonlinear systems. Accordingly, the immediate objective of this paper is to develop an on-line real-time synergetic learning algorithm, so that a data-driven H∞ controller can be received. By converting the H∞ control problem into a two-player zero-sum game, a model-free Hamilton-Jacobi-Isaacs equation (MF-HJIE) is first derived using off-policy reinforcement learning, followed by a proof of equivalence between the MF-HJIE and the conventional HJIE. Next, by applying the temporal difference to the MF-HJIE, a synergetic evolutionary rule with experience replay is designed to learn the optimal value function, the optimal control, and the worst perturbation, that can be performed on-line and in real-time along the system state trajectory. It is proven that the synergistic learning system constructed by the system plant and the evolutionary rule is uniformly ultimately bounded. Finally, simulation results on an F16 aircraft system and a nonlinear system back up the tractability of the proposed method.
Collapse
Affiliation(s)
- Liao Zhu
- International Academic Center of Complex Systems, Beijing Normal University, Zhuhai, 519087, Guangdong, China; School of Systems Science, Beijing Normal University, Beijing, 100875, China.
| | - Ping Guo
- International Academic Center of Complex Systems, Beijing Normal University, Zhuhai, 519087, Guangdong, China; School of Systems Science, Beijing Normal University, Beijing, 100875, China.
| | - Qinglai Wei
- The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China; Institute of Systems Engineering, Macau University of Science and Technology, 999078, Macao Special Administrative Region of China.
| |
Collapse
|
6
|
Wang Z, Lee J, Wei Q, Zhang A. Event-Triggered Near-Optimal Tracking Control based on Adaptive Dynamic Programming for Discrete-Time Systems. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.03.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
7
|
Lin M, Zhao B, Liu D. Policy gradient adaptive dynamic programming for nonlinear discrete-time zero-sum games with unknown dynamics. Soft comput 2023. [DOI: 10.1007/s00500-023-07817-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
8
|
Xia H, Zhao B, Guo P. Synergetic learning structure-based neuro-optimal fault tolerant control for unknown nonlinear systems. Neural Netw 2022; 155:204-214. [DOI: 10.1016/j.neunet.2022.08.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 06/02/2022] [Accepted: 08/08/2022] [Indexed: 10/31/2022]
|
9
|
Xue S, Luo B, Liu D, Gao Y. Neural network-based event-triggered integral reinforcement learning for constrained H∞ tracking control with experience replay. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
10
|
Zhang Y, Li S, Weng J. Learning and Near-Optimal Control of Underactuated Surface Vessels With Periodic Disturbances. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:7453-7463. [PMID: 33400666 DOI: 10.1109/tcyb.2020.3041368] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, we propose a novel learning and near-optimal control approach for underactuated surface (USV) vessels with unknown mismatched periodic external disturbances and unknown hydrodynamic parameters. Given a prior knowledge of the periods of the disturbances, an analytical near-optimal control law is derived through the approximation of the integral-type quadratic performance index with respect to the tracking error, where the equivalent unknown parameters are generated online by an auxiliary system that can learn the dynamics of the controlled system. It is proved that the state differences between the auxiliary system and the corresponding controlled USV vessel are globally asymptotically convergent to zero. Besides, the approach theoretically guarantees asymptotic optimality of the performance index. The efficacy of the method is demonstrated via simulations based on the real parameters of an USV vessel.
Collapse
|
11
|
Zhang H, Wang H, Niu B, Zhang L, Ahmad AM. Sliding-mode surface-based adaptive actor-critic optimal control for switched nonlinear systems with average dwell time. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.08.062] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
12
|
Zhang S, Zhao B, Liu D, Zhang Y. Observer-based event-triggered control for zero-sum games of input constrained multi-player nonlinear systems. Neural Netw 2021; 144:101-112. [PMID: 34478940 DOI: 10.1016/j.neunet.2021.08.012] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 07/18/2021] [Accepted: 08/09/2021] [Indexed: 11/18/2022]
Abstract
In this paper, an event-triggered control (ETC) method is investigated to solve zero-sum game (ZSG) problems of unknown multi-player continuous-time nonlinear systems with input constraints by using adaptive dynamic programming (ADP). To relax the requirement of system dynamics, a neural network (NN) observer is constructed to identify the dynamics of multi-player system via the input and output data. Then, the event-triggered Hamilton-Jacobi-Isaacs (HJI) equation of the ZSG can be solved by constructing a critic NN, and the approximated optimal control law and the worst disturbance law can be obtained directly. A triggering scheme which determines the updating time instants of the control law and the disturbance law is developed. Thus, the proposed ADP-based ETC method cannot only reduce the computational burden, but also save communication resource and bandwidths. Furthermore, we prove that the signals of the closed-loop system and the approximate errors of the critic NN weights are uniformly ultimately bounded by using Lyapunov's direct method, and the Zeno behavior is excluded. Finally, two simulation examples are provided to demonstrate the effectiveness of the proposed ETC scheme.
Collapse
Affiliation(s)
- Shunchao Zhang
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| | - Bo Zhao
- School of Systems Science, Beijing Normal University, Beijing 100875, China.
| | - Derong Liu
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| | - Yongwei Zhang
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| |
Collapse
|
13
|
Sliding mode-based online fault compensation control for modular reconfigurable robots through adaptive dynamic programming. COMPLEX INTELL SYST 2021. [DOI: 10.1007/s40747-021-00364-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractIn this paper, a sliding mode (SM)-based online fault compensation control scheme is investigated for modular reconfigurable robots (MRRs) with actuator failures via adaptive dynamic programming. It consists of a SM-based iterative controller, an adaptive robust term and an online fault compensator. For fault-free MRR systems, the SM surface-based Hamilton–Jacobi–Bellman equation is solved by online policy iteration algorithm. The adaptive robust term is added to guarantee the reachable condition of SM surface. For faulty MRR systems, the actuator failure is compensated online to avoid the fault detection and isolation mechanism. The closed-loop MRR system is guaranteed to be asymptotically stable under the developed fault compensation control scheme. Simulation results verify the effectiveness of the present fault compensation control approach.
Collapse
|
14
|
Ma B, Li Y. Compensator-critic structure-based event-triggered decentralized tracking control of modular robot manipulators: theory and experimental verification. COMPLEX INTELL SYST 2021. [DOI: 10.1007/s40747-021-00359-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
AbstractThis paper presents a novel compensator-critic structure-based event-triggered decentralized tracking control of modular robot manipulators (MRMs). On the basis of subsystem dynamics under joint torque feedback (JTF) technique, the proposed tracking error fusion function, which includes position error and velocity error, is utilized to construct performance index function. By analyzing the dynamic uncertainties, a local dynamic information-based robust controller is designed to engage the model uncertainty compensation. Based on adaptive dynamic programming (ADP) algorithm and the event-triggered mechanism, the decentralized tracking control is obtained by solving the event-triggered Hamilton–Jacobi–Bellman equation (HJBE) with the critic neural network (NN). The tracking error of the closed-loop manipulators system is proved to be ultimately uniformly bounded (UUB) using the Lyapunov stability theorem. Finally, experimental results illustrate the effectiveness of the developed control method.
Collapse
|