1
|
Yang X, Wang D. Reinforcement Learning for Robust Dynamic Event-Driven Constrained Control. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6067-6079. [PMID: 38700967 DOI: 10.1109/tnnls.2024.3394251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
We consider a robust dynamic event-driven control (EDC) problem of nonlinear systems having both unmatched perturbations and unknown styles of constraints. Specifically, the constraints imposed on the nonlinear systems' input could be symmetric or asymmetric. Initially, to tackle such constraints, we construct a novel nonquadratic cost function for the constrained auxiliary system. Then, we propose a dynamic event-triggering mechanism relied on the time-based variable and the system states simultaneously for cutting down the computational load. Meanwhile, we show that the robust dynamic EDC of original nonlinear-constrained systems could be acquired by solving the event-driven optimal control problem of the constrained auxiliary system. After that, we develop the corresponding event-driven Hamilton-Jacobi-Bellman equation, and then solve it through a unique critic neural network (CNN) in the reinforcement learning framework. To relax the persistence of excitation condition in tuning CNN's weights, we incorporate experience replay into the gradient descent method. With the aid of Lyapunov's approach, we prove that the closed-loop auxiliary system and the weight estimation error are uniformly ultimately bounded stable. Finally, two examples, including a nonlinear plant and the pendulum system, are utilized to validate the theoretical claims.
Collapse
|
2
|
Liu N, Zhang K, Xie X, Yue D. UKF-Based Optimal Tracking Control for Uncertain Dynamic Systems With Asymmetric Input Constraints. IEEE TRANSACTIONS ON CYBERNETICS 2024; 54:7224-7235. [PMID: 39401122 DOI: 10.1109/tcyb.2024.3471987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
To enhance system robustness in the face of uncertainty and achieve adaptive optimization of control strategies, a novel algorithm based on the unscented Kalman filter (UKF) is developed. This algorithm addresses the finite-horizon optimal tracking control problem (FHOTCP) for nonlinear discrete-time (DT) systems with uncertainty and asymmetric input constraints. An augmented system is constructed with asymmetric control constraints being considered. The augmented problem is addressed with a DT Hamilton-Jacobi-Bellman equation (DTHJBE). By analyzing convergence with regard to the cost function and control law, the UKF-based iterative adaptive dynamic programming (ADP) algorithm is proposed. This algorithm approximates the solution of the DTHJBE, ensuring that the cost function converges to its optimal value within a bounded range. To execute the UKF-based iterative ADP algorithm, the actor-estimator-critic framework is built, in which the estimator refers to system state estimation through the application of UKF. Ultimately, simulation examples are presented to show the performance of the proposed method.
Collapse
|
3
|
Guo Z, Zhou Q, Ren H, Ma H, Li H. ADP-based fault-tolerant consensus control for multiagent systems with irregular state constraints. Neural Netw 2024; 180:106737. [PMID: 39316952 DOI: 10.1016/j.neunet.2024.106737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 08/03/2024] [Accepted: 09/11/2024] [Indexed: 09/26/2024]
Abstract
This paper investigates the consensus control issue for nonlinear multiagent systems (MASs) subject to irregular state constraints and actuator faults using an adaptive dynamic programming (ADP) algorithm. Unlike the regular state constraints considered in previous studies, this paper addresses irregular state constraints that may exhibit asymmetry, time variation, and can emerge or disappear during operation. By developing a system transformation method based on one-to-one state mapping, equivalent unconstrained MASs can be obtained. Subsequently, a finite-time distributed observer is designed to estimate the state information of the leader, and the consensus control problem is transformed into the tracking control problem for each agent to ensure that actuator faults of any agent cannot affect its neighboring agents. Then, a critic-only ADP-based fault tolerant control strategy, which consists of the optimal control policy for nominal system and online fault compensation for time-varying addictive faults, is proposed to achieve optimal tracking control. To enhance the learning efficiency of critic neural networks (NNs), an improved weight learning law utilizing stored historical data is employed, ensuring the convergence of critic NN weights towards ideal values under a finite excitation condition. Finally, a practical example of multiple manipulator systems is presented to demonstrate the effectiveness of the developed control method.
Collapse
Affiliation(s)
- Zijie Guo
- School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, 510665, Guangdong, China
| | - Qi Zhou
- School of Automation, Guangdong-Hong Kong Joint Laboratory for Intelligent Decision and Cooperative Control, and Guangdong Province Key Laboratory of Intelligent Decision and Cooperative Control, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China.
| | - Hongru Ren
- School of Automation, Guangdong-Hong Kong Joint Laboratory for Intelligent Decision and Cooperative Control, and Guangdong Province Key Laboratory of Intelligent Decision and Cooperative Control, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
| | - Hui Ma
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
| | - Hongyi Li
- College of Electronic and Information Engineering and Chongqing Key Laboratory of Generic Technology and System of Service Robots, Southwest University, Chongqing, 400715, Chongqing, China
| |
Collapse
|
4
|
Liu L, Song R. Adaptive sampling artificial-actual control for non-zero-sum games of constrained systems. Neural Netw 2024; 178:106413. [PMID: 38850637 DOI: 10.1016/j.neunet.2024.106413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 04/21/2024] [Accepted: 05/27/2024] [Indexed: 06/10/2024]
Abstract
Considering physical constraints encountered by actuators, this paper addresses the non-zero-sum game of continuous nonlinear systems with symmetric and asymmetric input constraints through aperiodic sampling artificial-actual control. Initially, the artificial system built by the improved Elman dynamic neural networks (EDNNs) has artificial-actual interaction with the physical system, which provides a new perspective for predicting the system state. By constantly learning and adjusting parameters, EDNNs can gradually approximate the dynamic behavior of the real system to achieve more effective control. Aiming at accommodating diverse input constraints, the non-quadratic value function constructed from a smoothly bounded function is devised. Then, the polynomial parameterized adaptive dynamic programming (ADP) is employed to approximate the solution of the coupled Hamilton-Jacobi equation (HJE), deriving optimal control laws for two players. To improve the efficiency of data communication, three adaptive sampling mechanisms including event-triggered mechanism (ETM) with relative threshold, dynamic ETM (DETM) and self-triggered mechanism (STM) are introduced in turn during the iterative learning process of control sequences. DETM further extends sampling intervals by incorporating internal dynamic variables, while STM determines the next trigger time through soft calculation without hardware monitoring. All three trigger modes can ensure the system stability while avoiding the Zeno phenomenon, and relevant proofs are given. Finally, the simulation validates the effectiveness of the designed algorithm and highlights the unique characteristics of each trigger mode.
Collapse
Affiliation(s)
- Lu Liu
- Beijing Engineering Research Center of Industrial Spectrum Imaging, School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China.
| | - Ruizhuo Song
- Beijing Engineering Research Center of Industrial Spectrum Imaging, School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China.
| |
Collapse
|
5
|
Li M, Wang D, Ren J, Qiao J. Advanced optimal tracking integrating a neural critic technique for asymmetric constrained zero-sum games. Neural Netw 2024; 177:106388. [PMID: 38776760 DOI: 10.1016/j.neunet.2024.106388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/14/2024] [Accepted: 05/12/2024] [Indexed: 05/25/2024]
Abstract
This paper investigates the optimal tracking issue for continuous-time (CT) nonlinear asymmetric constrained zero-sum games (ZSGs) by exploiting the neural critic technique. Initially, an improved algorithm is constructed to tackle the tracking control problem of nonlinear CT multiplayer ZSGs. Also, we give a novel nonquadratic function to settle the asymmetric constraints. One thing worth noting is that the method used in this paper to solve asymmetric constraints eliminates the strict restriction on the control matrix compared to the previous ones. Further, the optimal controls, the worst disturbances, and the tracking Hamilton-Jacobi-Isaacs equation are derived. Next, a single critic neural network is built to estimate the optimal cost function, thus obtaining the approximations of the optimal controls and the worst disturbances. The critic network weight is updated by the normalized steepest descent algorithm. Additionally, based on the Lyapunov method, the stability of the tracking error and the weight estimation error of the critic network is analyzed. In the end, two examples are offered to validate the theoretical results.
Collapse
Affiliation(s)
- Menghua Li
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Ding Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Jin Ren
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Junfei Qiao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
6
|
Guo Z, Li H, Ma H, Meng W. Distributed Optimal Attitude Synchronization Control of Multiple QUAVs via Adaptive Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8053-8063. [PMID: 36446013 DOI: 10.1109/tnnls.2022.3224029] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
This article proposes a distributed optimal attitude synchronization control strategy for multiple quadrotor unmanned aerial vehicles (QUAVs) through the adaptive dynamic programming (ADP) algorithm. The attitude systems of QUAVs are modeled as affine nominal systems subject to parameter uncertainties and external disturbances. Considering attitude constraints in complex flying environments, a one-to-one mapping technique is utilized to transform the constrained systems into equivalent unconstrained systems. An improved nonquadratic cost function is constructed for each QUAV, which reflects the requirements of robustness and the constraints of control input simultaneously. To overcome the issue that the persistence of excitation (PE) condition is difficult to meet, a novel tuning rule of critic neural network (NN) weights is developed via the concurrent learning (CL) technique. In terms of the Lyapunov stability theorem, the stability of the closed-loop system and the convergence of critic NN weights are proved. Finally, simulation results on multiple QUAVs show the effectiveness of the proposed control strategy.
Collapse
|
7
|
Qiao J, Li M, Wang D. Asymmetric Constrained Optimal Tracking Control With Critic Learning of Nonlinear Multiplayer Zero-Sum Games. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5671-5683. [PMID: 36191112 DOI: 10.1109/tnnls.2022.3208611] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
By utilizing a neural-network-based adaptive critic mechanism, the optimal tracking control problem is investigated for nonlinear continuous-time (CT) multiplayer zero-sum games (ZSGs) with asymmetric constraints. Initially, we build an augmented system with the tracking error system and the reference system. Moreover, a novel nonquadratic function is introduced to address asymmetric constraints. Then, we derive the tracking Hamilton-Jacobi-Isaacs (HJI) equation of the constrained nonlinear multiplayer ZSG. However, it is extremely hard to get the analytical solution to the HJI equation. Hence, an adaptive critic mechanism based on neural networks is established to estimate the optimal cost function, so as to obtain the near-optimal control policy set and the near worst disturbance policy set. In the process of neural critic learning, we only utilize one critic neural network and develop a new weight updating rule. After that, by using the Lyapunov approach, the uniform ultimate boundedness stability of the tracking error in the augmented system and the weight estimation error of the critic network is verified. Finally, two simulation examples are provided to demonstrate the efficacy of the established mechanism.
Collapse
|
8
|
Li B, Chen N, Luo B, Chen J, Yang C, Gui W. ADP-Based Event-Triggered Constrained Optimal Control on Spatiotemporal Process: Application to Temperature Field in Roller Kiln. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:3229-3241. [PMID: 37195852 DOI: 10.1109/tnnls.2023.3267516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
The precise control of the spatiotemporal process in a roller kiln is crucial in the production of Ni-Co-Mn layered cathode material of lithium-ion batteries. Since the product is extremely sensitive to temperature distribution, temperature field control is of great significance. In this article, an event-triggered optimal control (ETOC) method with input constraints for the temperature field is proposed, which takes up an important position in reducing the communication and computation costs. A nonquadratic cost function is adopted to describe the system performance with input constraints. First, we present the problem description of the temperature field event-triggered control, where this field is described by a partial differential equation (PDE). Then, the event-triggered condition is designed according to the information of system states and control inputs. On this basis, a framework of the event-triggered adaptive dynamic programming (ETADP) method that is based on the model reduction technology is proposed for the PDE system. A critic network is used to approach the optimal performance index by a neural network (NN) together with that an actor network is used to optimize the control strategy. Furthermore, an upper bound of the performance index and a lower bound of interexecution times, as well as the stabilities of the impulsive dynamic system and the closed-loop PDE system, are also proved. Simulation verification demonstrates the effectiveness of the proposed method.
Collapse
|
9
|
Qin C, Jiang K, Zhang J, Zhu T. Critic Learning-Based Safe Optimal Control for Nonlinear Systems with Asymmetric Input Constraints and Unmatched Disturbances. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1101. [PMID: 37510048 PMCID: PMC10378920 DOI: 10.3390/e25071101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 07/01/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023]
Abstract
In this paper, the safe optimal control method for continuous-time (CT) nonlinear safety-critical systems with asymmetric input constraints and unmatched disturbances based on the adaptive dynamic programming (ADP) is investigated. Initially, a new non-quadratic form function is implemented to effectively handle the asymmetric input constraints. Subsequently, the safe optimal control problem is transformed into a two-player zero-sum game (ZSG) problem to suppress the influence of unmatched disturbances, and a new Hamilton-Jacobi-Isaacs (HJI) equation is introduced by integrating the control barrier function (CBF) with the cost function to penalize unsafe behavior. Moreover, a damping factor is embedded in the CBF to balance safety and optimality. To obtain a safe optimal controller, only one critic neural network (CNN) is utilized to tackle the complex HJI equation, leading to a decreased computational load in contrast to the utilization of the conventional actor-critic network. Then, the system state and the parameters of the CNN are uniformly ultimately bounded (UUB) through the application of the Lyapunov stability method. Lastly, two examples are presented to confirm the efficacy of the presented approach.
Collapse
Affiliation(s)
- Chunbin Qin
- School of Artificial Intelligence, Henan University, Zhengzhou 450000, China
| | - Kaijun Jiang
- School of Artificial Intelligence, Henan University, Zhengzhou 450000, China
| | - Jishi Zhang
- School of Software, Henan University, Kaifeng 475000, China
| | - Tianzeng Zhu
- School of Artificial Intelligence, Henan University, Zhengzhou 450000, China
| |
Collapse
|
10
|
Li M, Wang D, Zhao M, Qiao J. Event-triggered constrained neural critic control of nonlinear continuous-time multiplayer nonzero-sum games. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.02.081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2023]
|
11
|
Lu J, Wei Q, Zhou T, Wang Z, Wang FY. Event-Triggered Near-Optimal Control for Unknown Discrete-Time Nonlinear Systems Using Parallel Control. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1890-1904. [PMID: 35522632 DOI: 10.1109/tcyb.2022.3164977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This article uses parallel control to investigate the problem of event-triggered near-optimal control (ETNOC) for unknown discrete-time (DT) nonlinear systems. First, to achieve parallel control, an augmented nonlinear system (ANS) with an augmented performance index (API) is proposed to introduce the control input into the feedback system. The control stability relationship between the ANS and the original system is analyzed, and it is shown that, by choosing a proper API, optimal control of the ANS with the API can be seen as near-optimal control of the original system with the original performance index (OPI). Second, based on parallel control, a novel event-triggered scheme is proposed, and then a novel ETNOC method is developed using the time-triggered optimal value function of the ANS with the API. The control stability is proved, and an upper bound, which is related to the design parameter, is provided for the actual performance index in advance. Then, to implement the developed ETNOC method for unknown DT nonlinear systems, a novel online learning algorithm is developed without reconstructing unknown systems, and neural network (NN) and adaptive dynamic programming (ADP) techniques are employed in the developed algorithm. The convergence of the signals in the closed-loop system (CLS) is shown using the Lyapunov approach, and the assumption of boundedness of input dynamics is not required. Finally, two simulations justify the theoretical conjectures.
Collapse
|
12
|
Qin C, Wang J, Zhu H, Zhang J, Hu S, Zhang D. Neural network-based safe optimal robust control for affine nonlinear systems with unmatched disturbances. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.07.072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
13
|
Trigger-Based K-Band Microwave Ranging System Thermal Control with Model-Free Learning Process. ELECTRONICS 2022. [DOI: 10.3390/electronics11142173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Micron-level accuracy K-band microwave ranging in space relies on the stability of the payload thermal control on-board; however, large quantities of thermal sensors and heating devices around the deployed instruments consume the precious inner communication resources of the central computer. Another problem arises, which is that the payload thermal protection environment can deteriorate gradually through years operating. In this paper, a new trigger-based thermal system controller design is proposed, with consideration of spaceborne communication burden reduction and actuator saturation, which guarantees stable temperature fluctuations of microwave payloads in space missions. The controller combines a nominal constant sampling PID inner loop and a trigger-based outer loop structure under constraints of heating device saturation. Moreover, an iterative model-free reinforcement learning process is adopted that can approximate the estimation of thermal dynamic modeling uncertainty online. Via extensive experiment in a laboratory environment, the performance of the proposed trigger thermal control is verified, with smaller temperature fluctuations compared to the nominal control, and obvious efficiency in system communications. The online learning algorithm is also tested with deliberate thermal conditions that deviate from the original system—the results can quickly converge to normal when the thermal disturbance is removed. Finally, the ranging accuracy is tested for the whole system, and a 25% (RMS) performance improvement can be realized by using a trigger-based control strategy—about 2.2 µm, compared to the nominal control method.
Collapse
|
14
|
Yang X, Xu M, Wei Q. Dynamic Event-Sampled Control of Interconnected Nonlinear Systems Using Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:923-937. [PMID: 35666792 DOI: 10.1109/tnnls.2022.3178017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
We develop a decentralized dynamic event-based control strategy for nonlinear systems subject to matched interconnections. To begin with, we introduce a dynamic event-based sampling mechanism, which relies on the system's states and the variables generated by time-based differential equations. Then, we prove that the decentralized event-based controller for the whole system is composed of all the optimal event-based control policies of nominal subsystems. To derive these optimal event-based control policies, we design a critic-only architecture to solve the related event-based Hamilton-Jacobi-Bellman equations in the reinforcement learning framework. The implementation of such an architecture uses only critic neural networks (NNs) with their weight vectors being updated through the gradient descent method together with concurrent learning. After that, we demonstrate that the asymptotic stability of closed-loop nominal subsystems and the uniformly ultimate boundedness stability of critic NNs' weight estimation errors are guaranteed by using Lyapunov's approach. Finally, we provide simulations of a matched nonlinear-interconnected plant to validate the present theoretical claims.
Collapse
|
15
|
|
16
|
Liu K, Chen J. Robust adaptive neural network event-triggered compensation control for continuous stirred tank reactors with prescribed performance and actuator failures. Chem Eng Sci 2021. [DOI: 10.1016/j.ces.2021.116953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|