1
|
Yang X, Wang D. Reinforcement Learning for Robust Dynamic Event-Driven Constrained Control. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6067-6079. [PMID: 38700967 DOI: 10.1109/tnnls.2024.3394251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
We consider a robust dynamic event-driven control (EDC) problem of nonlinear systems having both unmatched perturbations and unknown styles of constraints. Specifically, the constraints imposed on the nonlinear systems' input could be symmetric or asymmetric. Initially, to tackle such constraints, we construct a novel nonquadratic cost function for the constrained auxiliary system. Then, we propose a dynamic event-triggering mechanism relied on the time-based variable and the system states simultaneously for cutting down the computational load. Meanwhile, we show that the robust dynamic EDC of original nonlinear-constrained systems could be acquired by solving the event-driven optimal control problem of the constrained auxiliary system. After that, we develop the corresponding event-driven Hamilton-Jacobi-Bellman equation, and then solve it through a unique critic neural network (CNN) in the reinforcement learning framework. To relax the persistence of excitation condition in tuning CNN's weights, we incorporate experience replay into the gradient descent method. With the aid of Lyapunov's approach, we prove that the closed-loop auxiliary system and the weight estimation error are uniformly ultimately bounded stable. Finally, two examples, including a nonlinear plant and the pendulum system, are utilized to validate the theoretical claims.
Collapse
|
2
|
An T, Dong B, Yan H, Liu L, Ma B. Dynamic Event-Triggered Strategy-Based Optimal Control of Modular Robot Manipulator: A Multiplayer Nonzero-Sum Game Perspective. IEEE TRANSACTIONS ON CYBERNETICS 2024; 54:7514-7526. [PMID: 39374285 DOI: 10.1109/tcyb.2024.3468875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/09/2024]
Abstract
Due to the limited computing and processing ability of modular robot manipulator (MRM) components, such as sensors and controllers, event-triggered mechanisms are considered a crucial communication paradigm shift in resource constrained applications. Dynamic event-triggered mechanism is developing into a new technology by reason of its higher resource utilization efficiency and more flexible system design requirements than traditional event-triggered. Therefore, an optimal control scheme of multiplayer nonzero-sum game based on dynamic event-triggered is developed for MRM systems with uncertain disturbances. First, dynamic model of the MRM is established according to joint torque feedback technique and model uncertainty is estimated by data-driven-based neural network identifier. In the framework of differential game, the tracking control problem of MRM system is transformed into the optimal control problem for multiplayer nonzero-sum game with the control input of each joint module as the player. Then, the static event-triggered control problem of MRM system is studied based on adaptive dynamic programming algorithm. On this basis, the internal dynamic variable describing the previous state of the system is introduced, and the characteristics of dynamic trigger rule and its relationship with static rule are revealed theoretically. By designing an exponential attenuation signal, the minimum sampling interval of the system is always positive, so that Zeno behavior is excluded. Lyapunov theory proves that the system is asymptotically stable and the experimental results verify the validity of the proposed method.
Collapse
|
3
|
Song R, Yang G, Lewis FL. Nearly Optimal Control for Mixed Zero-Sum Game Based on Off-Policy Integral Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2793-2804. [PMID: 35877793 DOI: 10.1109/tnnls.2022.3191847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this article, we solve a class of mixed zero-sum game with unknown dynamic information of nonlinear system. A policy iterative algorithm that adopts integral reinforcement learning (IRL), which does not depend on system information, is proposed to obtain the optimal control of competitor and collaborators. An adaptive update law that combines critic-actor structure with experience replay is proposed. The actor function not only approximates optimal control of every player but also estimates auxiliary control, which does not participate in the actual control process and only exists in theory. The parameters of the actor-critic structure are simultaneously updated. Then, it is proven that the parameter errors of the polynomial approximation are uniformly ultimately bounded. Finally, the effectiveness of the proposed algorithm is verified by two given simulations.
Collapse
|
4
|
Luo R, Peng Z, Hu J. Optimal Robust Control of Nonlinear Systems with Unknown Dynamics via NN Learning with Relaxed Excitation. ENTROPY (BASEL, SWITZERLAND) 2024; 26:72. [PMID: 38248197 PMCID: PMC11154462 DOI: 10.3390/e26010072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 01/04/2024] [Accepted: 01/10/2024] [Indexed: 01/23/2024]
Abstract
This paper presents an adaptive learning structure based on neural networks (NNs) to solve the optimal robust control problem for nonlinear continuous-time systems with unknown dynamics and disturbances. First, a system identifier is introduced to approximate the unknown system matrices and disturbances with the help of NNs and parameter estimation techniques. To obtain the optimal solution of the optimal robust control problem, a critic learning control structure is proposed to compute the approximate controller. Unlike existing identifier-critic NNs learning control methods, novel adaptive tuning laws based on Kreisselmeier's regressor extension and mixing technique are designed to estimate the unknown parameters of the two NNs under relaxed persistence of excitation conditions. Furthermore, theoretical analysis is also given to prove the significant relaxation of the proposed convergence conditions. Finally, effectiveness of the proposed learning approach is demonstrated via a simulation study.
Collapse
Affiliation(s)
- Rui Luo
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; (R.L.); (J.H.)
| | - Zhinan Peng
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; (R.L.); (J.H.)
- Institute of Electronic and Information Engineering, University of Electronic Science and Technology of China, Dongguan 523808, China
| | - Jiangping Hu
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; (R.L.); (J.H.)
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou 313001, China
| |
Collapse
|
5
|
Wang S, Wen S, Yang Y, Shi K, Huang T. Suboptimal Leader-to-Coordination Control for Nonlinear Systems With Switching Topologies: A Learning-Based Method. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10578-10588. [PMID: 35486552 DOI: 10.1109/tnnls.2022.3169417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In the cooperative control for multiagent systems (MASs), the key issues of distributed interaction, nonlinear characteristics, and optimization should be considered simultaneously, which, however, remain intractable theoretically even to this day. Considering these factors, this article investigates leader-to-formation control and optimization for nonlinear MASs using a learning-based method. Under time-varying switching topology, a fully distributed state observer based on neural networks is designed to reconstruct the dynamics and the state trajectory of the leader signal with arbitrary precision under jointly connected topology assumption. Benefitted from the observers, formation for MASs under switching topologies is transformed into tracking control for each subsystem with continuous state generated by the observers. An augmented system with discounted infinite LQR performance index is considered to optimize the control effect. Due to the complexity of solving the Hamilton-Jacobi-Bellman equation, the optimal value function is approximated by a critic network via the integral reinforcement learning method without the knowledge of drift dynamics. Meanwhile, an actor network is also presented to assure stability. The tracking errors and estimation weighted matrices are proven to be uniformly ultimately bounded. Finally, two illustrative examples are given to show the effectiveness of this method.
Collapse
|
6
|
Wu L, Li Z, Liu S, Li Z, Sun D. An improved compact-form antisaturation model-free adaptive control algorithm for a class of nonlinear systems with time delays. Sci Prog 2023; 106:368504231210361. [PMID: 37933475 PMCID: PMC10631356 DOI: 10.1177/00368504231210361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
To solve the time-delay problem and actuator saturation problem of nonlinear plants in industrial processes, an improved compact-form antisaturation model-free adaptive control (ICF-AS-MFAC) method is proposed in this work. The ICF-AS-MFAC scheme is based on the concept of the pseudo partial derivative (PPD) and adopts equivalent dynamic linearization technology. Then, a tracking differentiator is used to predict the future output of a time-delay system to effectively control the system. Additionally, the concept of the saturation parameter is proposed, and the ICF-AS-MFAC controller is designed to ensure that the control system will not exhibit actuator saturation. The proposed algorithm is more flexible, has faster output responses for time-delay systems, and solves the problem of actuator saturation. The convergence and stability of the proposed method are rigorously proven mathematically. The effectiveness of the proposed method is verified by numerical simulations, and the applicability of the proposed method is verified by a series of experimental results based on double tanks.
Collapse
Affiliation(s)
- Lipu Wu
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
| | - Zhen Li
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
| | - Shida Liu
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
| | - Zhijun Li
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
| | - Dehui Sun
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
| |
Collapse
|
7
|
Lv Y, Na J, Zhao X, Huang Y, Ren X. Multi-H∞ Controls for Unknown Input-Interference Nonlinear System With Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:5601-5613. [PMID: 34874874 DOI: 10.1109/tnnls.2021.3130092] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article studies the multi- [Formula: see text] controls for the input-interference nonlinear systems via adaptive dynamic programming (ADP) method, which allows for multiple inputs to have the individual selfish component of the strategy to resist weighted interference. In this line, the ADP scheme is used to learn the Nash-optimization solutions of the input-interference nonlinear system such that multiple [Formula: see text] performance indices can reach the defined Nash equilibrium. First, the input-interference nonlinear system is given and the Nash equilibrium is defined. An adaptive neural network (NN) observer is introduced to identify the input-interference nonlinear dynamics. Then, the critic NNs are used to learn the multiple [Formula: see text] performance indices. A novel adaptive law is designed to update the critic NN weights by minimizing the Hamiltonian-Jacobi-Isaacs (HJI) equation, which can be used to directly calculate the multi- [Formula: see text] controls effectively by using input-output data such that the actor structure is avoided. Moreover, the control system stability and updated parameter convergence are proved. Finally, two numerical examples are simulated to verify the proposed ADP scheme for the input-interference nonlinear system.
Collapse
|
8
|
Peng Z, Ji H, Zou C, Kuang Y, Cheng H, Shi K, Ghosh BK. Optimal H ∞ tracking control of nonlinear systems with zero-equilibrium-free via novel adaptive critic designs. Neural Netw 2023; 164:105-114. [PMID: 37148606 DOI: 10.1016/j.neunet.2023.04.021] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 02/16/2023] [Accepted: 04/12/2023] [Indexed: 05/08/2023]
Abstract
In this paper, a novel adaptive critic control method is designed to solve an optimal H∞ tracking control problem for continuous nonlinear systems with nonzero equilibrium based on adaptive dynamic programming (ADP). To guarantee the finiteness of a cost function, traditional methods generally assume that the controlled system has a zero equilibrium point, which is not true in practical systems. In order to overcome such obstacle and realize H∞ optimal tracking control, this paper proposes a novel cost function design with respect to disturbance, tracking error and the derivative of tracking error. Based on the designed cost function, the H∞ control problem is formulated as two-player zero-sum differential games, and then a policy iteration (PI) algorithm is proposed to solve the corresponding Hamilton-Jacobi-Isaacs (HJI) equation. In order to obtain the online solution to the HJI equation, a single-critic neural network structure based on PI algorithm is established to learn the optimal control policy and the worst-case disturbance law. It is worth mentioning that the proposed adaptive critic control method can simplify the controller design process when the equilibrium of the systems is not zero. Finally, simulations are conducted to evaluate the tracking performance of the proposed control methods.
Collapse
Affiliation(s)
- Zhinan Peng
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Hanqi Ji
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Chaobin Zou
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Yiqun Kuang
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
| | - Hong Cheng
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Kaibo Shi
- School of Information Science and Engineering, Chengdu University, Chengdu, 610106, China
| | - Bijoy Kumar Ghosh
- Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX, 79409-1042, USA
| |
Collapse
|
9
|
Wang T, Wang Y, Yang X, Yang J. Further Results on Optimal Tracking Control for Nonlinear Systems With Nonzero Equilibrium via Adaptive Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1900-1910. [PMID: 34428163 DOI: 10.1109/tnnls.2021.3105646] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article develops a novel cost function (performance index function) to overcome the obstacles in solving the optimal tracking control problem for a class of nonlinear systems with known system dynamics via adaptive dynamic programming (ADP) technique. For the traditional optimal control problems, the assumption that the controlled system has zero equilibrium is generally required to guarantee the finiteness of an infinite horizon cost function and a unique solution. In order to solve the optimal tracking control problem of nonlinear systems with nonzero equilibrium, a specific cost function related to tracking errors and their derivatives is designed in this article, in which the aforementioned assumption and related obstacles are removed and the controller design process is simplified. Finally, comparative simulations are conducted on an inverted pendulum system to illustrate the effectiveness and advantages of the proposed optimal tracking control strategy.
Collapse
|
10
|
Yang X, Zhou Y, Gao Z. Reinforcement learning for robust stabilization of nonlinear systems with asymmetric saturating actuators. Neural Netw 2023; 158:132-141. [PMID: 36455428 DOI: 10.1016/j.neunet.2022.11.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Revised: 08/11/2022] [Accepted: 11/07/2022] [Indexed: 11/17/2022]
Abstract
We study the robust stabilization problem of a class of nonlinear systems with asymmetric saturating actuators and mismatched disturbances. Initially, we convert such a robust stabilization problem into a nonlinear-constrained optimal control problem by constructing a discounted cost function for the auxiliary system. Then, for the purpose of solving the nonlinear-constrained optimal control problem, we develop a simultaneous policy iteration (PI) in the reinforcement learning framework. The implementation of the simultaneous PI relies on an actor-critic architecture, which employs actor and critic neural networks (NNs) to separately approximate the control policy and the value function. To determine the actor and critic NNs' weights, we use the approach of weighted residuals together with the typical Monte-Carlo integration technique. Finally, we perform simulations of two nonlinear plants to validate the established theoretical claims.
Collapse
Affiliation(s)
- Xiong Yang
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China.
| | - Yingjiang Zhou
- College of Automation and College of Artificial Intelligence, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.
| | - Zhongke Gao
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
11
|
Xia L, Li Q, Song R, Ge SS. Distributed optimized dynamic event-triggered control for unknown heterogeneous nonlinear MASs with input-constrained. Neural Netw 2022; 154:1-12. [PMID: 35839533 DOI: 10.1016/j.neunet.2022.06.033] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 05/27/2022] [Accepted: 06/25/2022] [Indexed: 11/27/2022]
Abstract
The distributed optimized dynamic event-triggered controller is investigated for completely unknown heterogeneous nonlinear multi-agent systems (MASs) on a directed graph subject to input-constrained. First, the distributed observer is designed to estimate the information of the leader for each follower, and a network of the augmented system is constructed by employing the dynamics of the followers and the observers. An identifier with a compensator is designed to approximate the unknown augmented system (agent) with an arbitrarily small identifier error. Then, consider that the input-constrained optimal controller, along with Hamilton-Jacobi-Bellman (HJB) equation, is under pressure to execute in certain systems associated with bottlenecks such as communication and computing burdens. A critic-actor-based optimized dynamic event-triggered controller, which tunes the parameters of critic-actor neural networks (NNs) by the dynamic triggering mechanism, is leveraged to determine the rule of aperiodic sampling and maintain the desired synchronization service. In addition, the existence of a positive minimum inter-event time (MIET) between consecutive events is also proved. Finally, the applications in non-identical nonlinear MAS and 2-DOF robots illustrate the availability of the proposed theoretical results.
Collapse
Affiliation(s)
- Lina Xia
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China; Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, Beijing, China; The Department of Electrical and Computer Engineering, National University of Singapore, 117576, Singapore.
| | - Qing Li
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China; Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, Beijing, China.
| | - Ruizhuo Song
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China; Beijing Engineering Research Center of Industrial Spectrum Imaging, Beijing, China.
| | - Shuzhi Sam Ge
- The Department of Electrical and Computer Engineering, National University of Singapore, 117576, Singapore.
| |
Collapse
|
12
|
Qin C, Qiao X, Wang J, Zhang D. Robust Trajectory Tracking Control for Continuous-Time Nonlinear Systems with State Constraints and Uncertain Disturbances. ENTROPY 2022; 24:e24060816. [PMID: 35741537 PMCID: PMC9222594 DOI: 10.3390/e24060816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 06/07/2022] [Accepted: 06/07/2022] [Indexed: 02/04/2023]
Abstract
In this paper, a robust trajectory tracking control method with state constraints and uncertain disturbances on the ground of adaptive dynamic programming (ADP) is proposed for nonlinear systems. Firstly, the augmented system consists of the tracking error and the reference trajectory, and the tracking control problems with uncertain disturbances is described as the problem of robust control adjustment. In addition, considering the nominal system of the augmented system, the guaranteed cost tracking control problem is transformed into the optimal control problem by using the discount coefficient in the nominal system. A new safe Hamilton-Jacobi-Bellman (HJB) equation is proposed by combining the cost function with the control barrier function (CBF), so that the behavior of violating the safety regulations for the system states will be punished. In order to solve the new safe HJB equation, a critic neural network (NN) is used to approximate the solution of the safe HJB equation. According to the Lyapunov stability theory, in the case of state constraints and uncertain disturbances, the system states and the parameters of the critic neural network are guaranteed to be uniformly ultimately bounded (UUB). At the end of this paper, the feasibility of the proposed method is verified by a simulation example.
Collapse
|
13
|
Fu Y, Hong C, Fu J, Chai T. Approximate Optimal Tracking Control of Nondifferentiable Signals for a Class of Continuous-Time Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:4441-4450. [PMID: 33141675 DOI: 10.1109/tcyb.2020.3027344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, for a class of continuous-time nonlinear nonaffine systems with unknown dynamics, a robust approximate optimal tracking controller (RAOTC) is proposed in the framework of adaptive dynamic programming (ADP). The distinguishing contribution of this article is that a new Lyapunov function is constructed, by using which the derivative information of tracking errors is not required in computing its time derivative along with the solution of the closed-loop system. Thus, the proposed method can make the system states follow nondifferentiable reference signals, which removes the common assumption that the reference signals have to be continuous for tracking control of continuous-time nonlinear systems in the literature. The theoretical analysis, simulation, and application results well illustrate the effectiveness and superiority of the proposed method.
Collapse
|
14
|
Guan C, Jiang Y. A tractor-trailer parking control scheme using adaptive dynamic programming. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-021-00330-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractThis paper studies the online learning control of a truck-trailer parking problem via adaptive dynamic programming (ADP). The contribution is twofold. First, a novel ADP method is developed for systems with parametric nonlinearities. It learns the optimal control policy of the linearized system at the origin, while the learning process utilizes online measurements of the full system and is robust with respect to nonlinear disturbances. Second, a control strategy is formulated for a commonly seen truck-trailer parallel parking problem, and the proposed ADP method is integrated into the strategy to provide online learning capabilities and to handle uncertainties. A numerical simulation is conducted to demonstrate the effectiveness of the proposed methodology.
Collapse
|
15
|
Liu C, Zhang H, Luo Y, Su H. Dual Heuristic Programming for Optimal Control of Continuous-Time Nonlinear Systems Using Single Echo State Network. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1701-1712. [PMID: 32396118 DOI: 10.1109/tcyb.2020.2984952] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article presents an improved online adaptive dynamic programming (ADP) algorithm to solve the optimal control problem of continuous-time nonlinear systems with infinite horizon cost. The Hamilton-Jacobi-Bellman (HJB) equation is iteratively approximated by a novel critic-only structure which is constructed using the single echo state network (ESN). Inspired by the dual heuristic programming (DHP) technique, ESN is designed to approximate the costate function, then to derive the optimal controller. As the ESN is characterized by the echo state property (ESP), it is proved that the ESN can successfully approximate the solution to the HJB equation. Besides, to eliminate the requirement for the initial admissible control, a new weight tuning law is designed by adding an alternative condition. The stability of the closed-loop optimal control system and the convergence of the out weights of the ESN are guaranteed by using the Lyapunov theorem in the sense of uniformly ultimately bounded (UUB). Two simulation examples, including linear system and nonlinear system, are given to illustrate the availability and effectiveness of the proposed approach by comparing it with the polynomial neural-network scheme.
Collapse
|
16
|
Shen H, Liu X, Xia J, Chen X, Wang J. Finite-time energy-to-peak fuzzy filtering for persistent dwell-time switched nonlinear systems with unreliable links. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.07.081] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
17
|
Model-Free Optimal Consensus Control for Multi-agent Systems Based on DHP Algorithm. Neural Process Lett 2021. [DOI: 10.1007/s11063-021-10641-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
18
|
Liu C, Zhang H, Sun S, Ren H. Online H∞ control for continuous-time nonlinear large-scale systems via single echo state network. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.017] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
19
|
Zero-sum game-based neuro-optimal control of modular robot manipulators with uncertain disturbance using critic only policy iteration. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.04.032] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
20
|
Song R, Wei Q, Zhang H, Lewis FL. Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2929-2943. [PMID: 31902792 DOI: 10.1109/tcyb.2019.2957406] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, off-policy reinforcement learning (RL) algorithm is established to solve the discrete-time N -player nonzero-sum (NZS) games with completely unknown dynamics. The N -coupled generalized algebraic Riccati equations (GARE) are derived, and then policy iteration (PI) algorithm is used to obtain the N -tuple of iterative control and iterative value function. As the system dynamics is necessary in PI algorithm, off-policy RL method is developed for discrete-time N -player NZS games. The off-policy N -coupled Hamilton-Jacobi (HJ) equation is derived based on quadratic value functions. According to the Kronecker product, the N -coupled HJ equation is decomposed into unknown parameter part and the system operation data part, which makes the N -coupled HJ equation solved independent of system dynamics. The least square is used to calculate the iterative value function and N -tuple of iterative control. The existence of Nash equilibrium is proved. The result of the proposed method for discrete-time unknown dynamics NZS games is indicated by the simulation examples.
Collapse
|
21
|
Mu C, Peng J, Tang Y. Learning‐based control for discrete‐time constrained nonzero‐sum games. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2021. [DOI: 10.1049/cit2.12015] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Chaoxu Mu
- School of Electrical and Information Engineering Tianjin University Tianjin China
| | - Jiangwen Peng
- School of Electrical and Information Engineering Tianjin University Tianjin China
| | - Yufei Tang
- Department of Computer Electrical Engineering and Computer Science Florida Atlantic University USA
| |
Collapse
|
22
|
Yang X, Wei Q. Adaptive Critic Learning for Constrained Optimal Event-Triggered Control With Discounted Cost. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:91-104. [PMID: 32167914 DOI: 10.1109/tnnls.2020.2976787] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article studies an optimal event-triggered control (ETC) problem of nonlinear continuous-time systems subject to asymmetric control constraints. The present nonlinear plant differs from many studied systems in that its equilibrium point is nonzero. First, we introduce a discounted cost for such a system in order to obtain the optimal ETC without making coordinate transformations. Then, we present an event-triggered Hamilton-Jacobi-Bellman equation (ET-HJBE) arising in the discounted-cost constrained optimal ETC problem. After that, we propose an event-triggering condition guaranteeing a positive lower bound for the minimal intersample time. To solve the ET-HJBE, we construct a critic network under the framework of adaptive critic learning. The critic network weight vector is tuned through a modified gradient descent method, which simultaneously uses historical and instantaneous state data. By employing the Lyapunov method, we prove that the uniform ultimate boundedness of all signals in the closed-loop system is guaranteed. Finally, we provide simulations of a pendulum system and an oscillator system to validate the obtained optimal ETC strategy.
Collapse
|
23
|
Su H, Zhang H, Liang X, Liu C. Decentralized Event-Triggered Online Adaptive Control of Unknown Large-Scale Systems Over Wireless Communication Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4907-4919. [PMID: 31940563 DOI: 10.1109/tnnls.2019.2959005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, a novel online decentralized event-triggered control scheme is proposed for a class of nonlinear interconnected large-scale systems subject to unknown internal system dynamics and interconnected terms. First, by designing a neural network-based identifier, the unknown internal dynamics of the interconnected systems is reconstructed. Then, the adaptive critic design method is used to learn the approximate optimal control policies in the context of event-triggered mechanism. Specifically, the event-based control processes of different subsystems are independent, asynchronous, and decentralized. That is, the decentralized event-triggering conditions and the controllers only rely on the local state information of the corresponding subsystems, which avoids the transmissions of the state information between the subsystems over the wireless communication networks. Then, with the help of Lyapunov's theorem, the states of the developed closed-loop control system and the critic weight estimation errors are proved to be uniformly ultimately bounded. Finally, the effectiveness and applicability of the event-based control method are verified by an illustrative numerical example and a practical example.
Collapse
|
24
|
Guo X, Yan W, Cui R. Reinforcement Learning-Based Nearly Optimal Control for Constrained-Input Partially Unknown Systems Using Differentiator. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4713-4725. [PMID: 31880567 DOI: 10.1109/tnnls.2019.2957287] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, a synchronous reinforcement-learning-based algorithm is developed for input-constrained partially unknown systems. The proposed control also alleviates the need for an initial stabilizing control. A first-order robust exact differentiator is employed to approximate unknown drift dynamics. Critic, actor, and disturbance neural networks (NNs) are established to approximate the value function, the control policy, and the disturbance policy, respectively. The Hamilton-Jacobi-Isaacs equation is solved by applying the value function approximation technique. The stability of the closed-loop system can be ensured. The state and weight errors of the three NNs are all uniformly ultimately bounded. Finally, the simulation results are provided to verify the effectiveness of the proposed method.
Collapse
|
25
|
Lan X, Liu Y, Zhao Z. Cooperative control for swarming systems based on reinforcement learning in unknown dynamic environment. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.06.038] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
26
|
Jiang H, Zhang H, Xie X. Critic-only adaptive dynamic programming algorithms' applications to the secure control of cyber-physical systems. ISA TRANSACTIONS 2020; 104:138-144. [PMID: 30853105 DOI: 10.1016/j.isatra.2019.02.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 01/22/2019] [Accepted: 02/14/2019] [Indexed: 06/09/2023]
Abstract
Industrial cyber-physical systems generally suffer from the malicious attacks and unmatched perturbation, and thus the security issue is always the core research topic in the related fields. This paper proposes a novel intelligent secure control scheme, which integrates optimal control theory, zero-sum game theory, reinforcement learning and neural networks. First, the secure control problem of the compromised system is converted into the zero-sum game issue of the nominal auxiliary system, and then both policy-iteration-based and value-iteration-based adaptive dynamic programming methods are introduced to solve the Hamilton-Jacobi-Isaacs equations. The proposed secure control scheme can mitigate the effects of actuator attacks and unmatched perturbation, and stabilize the compromised cyber-physical systems by tuning the system performance parameters, which is proved through the Lyapunov stability theory. Finally, the proposed approach is applied to the Quanser helicopter to verify the effectiveness.
Collapse
Affiliation(s)
- He Jiang
- College of Information Science and Engineering, Northeastern University, Box 134, 110819, Shenyang, PR China.
| | - Huaguang Zhang
- College of Information Science and Engineering, Northeastern University, Box 134, 110819, Shenyang, PR China.
| | - Xiangpeng Xie
- Institute of Advanced Technology, Nanjing University of Posts and Telecommunications, 210003, Nanjing, PR China.
| |
Collapse
|
27
|
Event-driven H ∞ control with critic learning for nonlinear systems. Neural Netw 2020; 132:30-42. [PMID: 32861146 DOI: 10.1016/j.neunet.2020.08.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 08/03/2020] [Accepted: 08/10/2020] [Indexed: 11/22/2022]
Abstract
In this paper, we study an event-driven H∞ control problem of continuous-time nonlinear systems. Initially, with the introduction of a discounted cost function, we convert the nonlinear H∞ control problem into an event-driven nonlinear two-player zero-sum game. Then, we develop an event-driven Hamilton-Jacobi-Isaacs equation (HJIE) related to the two-player zero-sum game. After that, we propose a novel event-triggering condition guaranteeing Zeno behavior not to happen. The triggering threshold in the newly proposed event-triggering condition can be kept positive without requiring to properly choose the prescribed level of disturbance attenuation. To solve the event-driven HJIE, we employ an adaptive critic architecture which contains a unique critic neural network (NN). The weight parameters used in the critic NN are tuned via the gradient descent method. After that, we carry out stability analysis of the hybrid closed-loop system based on Lyapunov's direct approach. Finally, we provide two nonlinear plants, including the pendulum system, to validate the proposed event-driven H∞ control scheme.
Collapse
|
28
|
Zhang K, Zhang H, Liang Y, Wen Y. A new robust output tracking control for discrete-time switched constrained-input systems with uncertainty via a critic-only iteration learning method. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2018.07.095] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
29
|
Zhang C, Gan M, Zhao J, Xue C. Data-Driven Suboptimal Scheduling of Switched Systems. SENSORS (BASEL, SWITZERLAND) 2020; 20:s20051287. [PMID: 32120901 PMCID: PMC7085537 DOI: 10.3390/s20051287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Revised: 02/24/2020] [Accepted: 02/25/2020] [Indexed: 06/10/2023]
Abstract
In this paper, a data-driven optimal scheduling approach is investigated for continuous-time switched systems with unknown subsystems and infinite-horizon cost functions. Firstly, a policy iteration (PI) based algorithm is proposed to approximate the optimal switching policy online quickly for known switched systems. Secondly, a data-driven PI-based algorithm is proposed online solely from the system state data for switched systems with unknown subsystems. Approximation functions are brought in and their weight vectors can be achieved step by step through different data in the algorithm. Then the weight vectors are employed to approximate the switching policy and the cost function. The convergence and the performance are analyzed. Finally, the simulation results of two examples validate the effectiveness of the proposed approaches.
Collapse
|
30
|
Tan F. The Algorithms of Distributed Learning and Distributed Estimation about Intelligent Wireless Sensor Network. SENSORS 2020; 20:s20051302. [PMID: 32121025 PMCID: PMC7085642 DOI: 10.3390/s20051302] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 02/15/2020] [Accepted: 02/20/2020] [Indexed: 11/20/2022]
Abstract
The intelligent wireless sensor network is a distributed network system with high “network awareness”. Each intelligent node (agent) is connected by the topology within the neighborhood which not only can perceive the surrounding environment, but can adjusts its own behavior according to its local perception information to constructs a distributed learning algorithms. Therefore, three basic intelligent network topologies of centralized, non-cooperative, and cooperative are intensively investigated in this paper. The main contributions of the paper include two aspects. First, based on algebraic graph, three basic theoretical frameworks for distributed learning and distributed parameter estimation of cooperative strategy are surveyed: increment strategy, consensus strategy, and diffusion strategy. Second, based on classical adaptive learning algorithm and online updating law, the implementation process of distributed estimation algorithm and the latest research progress of above three distributed strategies are investigated.
Collapse
Affiliation(s)
- Fuxiao Tan
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave, Shanghai 201306, China
| |
Collapse
|
31
|
Su H, Zhang H, Sun S, Cai Y. Integral reinforcement learning-based online adaptive event-triggered control for non-zero-sum games of partially unknown nonlinear systems. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.09.088] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
32
|
An Analysis of IRL-Based Optimal Tracking Control of Unknown Nonlinear Systems with Constrained Input. Neural Process Lett 2019. [DOI: 10.1007/s11063-019-10029-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
33
|
Online event-triggered adaptive critic design for non-zero-sum games of partially unknown networked systems. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.07.029] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
34
|
Zhang Q, Zhao D. Data-Based Reinforcement Learning for Nonzero-Sum Games With Unknown Drift Dynamics. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:2874-2885. [PMID: 29994780 DOI: 10.1109/tcyb.2018.2830820] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper is concerned about the nonlinear optimization problem of nonzero-sum (NZS) games with unknown drift dynamics. The data-based integral reinforcement learning (IRL) method is proposed to approximate the Nash equilibrium of NZS games iteratively. Furthermore, we prove that the data-based IRL method is equivalent to the model-based policy iteration algorithm, which guarantees the convergence of the proposed method. For the implementation purpose, a single-critic neural network structure for the NZS games is given. To enhance the application capability of the data-based IRL method, we design the updating laws of critic weights based on the offline and online iterative learning methods, respectively. Note that the experience replay technique is introduced in the online iterative learning, which can improve the convergence rate of critic weights during the learning process. The uniform ultimate boundedness of the critic weights are guaranteed using the Lyapunov method. Finally, the numerical results demonstrate the effectiveness of the data-based IRL algorithm for nonlinear NZS games with unknown drift dynamics.
Collapse
|
35
|
Nguyen T, Mukhopadhyay S, Babbar-Sebens M. Why the ‘selfish’ optimizing agents could solve the decentralized reinforcement learning problems. AI COMMUN 2019. [DOI: 10.3233/aic-180596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Thanh Nguyen
- Department of Computer and Information Science, Indiana University Purdue University Indianapolis, 723 W Michigan St SL 280 Indianapolis, Indiana 46202, United States. E-mails: ,
| | - Snehasis Mukhopadhyay
- Department of Computer and Information Science, Indiana University Purdue University Indianapolis, 723 W Michigan St SL 280 Indianapolis, Indiana 46202, United States. E-mails: ,
| | - Meghna Babbar-Sebens
- Department of Water Resources Engineering, Oregon State University, 1691 SW Campus Way, Owen Hall 211, Corvallis, Oregon, 97331, United States. E-mail:
| |
Collapse
|
36
|
Song R, Zhu L. Stable value iteration for two-player zero-sum game of discrete-time nonlinear systems based on adaptive dynamic programming. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.03.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
37
|
Qu Q, Zhang H, Luo C, Yu R. Robust control design for multi-player nonlinear systems with input disturbances via adaptive dynamic programming. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.11.054] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
38
|
Liu YJ, Li S, Tong S, Chen CLP. Adaptive Reinforcement Learning Control Based on Neural Approximation for Nonlinear Discrete-Time Systems With Unknown Nonaffine Dead-Zone Input. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:295-305. [PMID: 29994726 DOI: 10.1109/tnnls.2018.2844165] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, an optimal control algorithm is designed for uncertain nonlinear systems in discrete-time, which are in nonaffine form and with unknown dead-zone. The main contributions of this paper are that an optimal control algorithm is for the first time framed in this paper for nonlinear systems with nonaffine dead-zone, and the adaptive parameter law for dead-zone is calculated by using the gradient rules. The mean value theory is employed to deal with the nonaffine dead-zone input and the implicit function theory based on reinforcement learning is appropriately introduced to find an unknown ideal controller which is approximated by using the action network. Other neural networks are taken as the critic networks to approximate the strategic utility functions. Based on the Lyapunov stability analysis theory, we can prove the stability of systems, i.e., the optimal control laws can guarantee that all the signals in the closed-loop system are bounded and the tracking errors are converged to a small compact set. Finally, two simulation examples demonstrate the effectiveness of the design algorithm.
Collapse
|
39
|
Liu C, Zhang H, Xiao G, Sun S. Integral reinforcement learning based decentralized optimal tracking control of unknown nonlinear large-scale interconnected systems with constrained-input. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.09.011] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
40
|
Liu F, Xiao W, Chen S, Jiang C. Adaptive Dynamic Programming-Based Multi-Sensor Scheduling for Collaborative Target Tracking in Energy Harvesting Wireless Sensor Networks. SENSORS 2018; 18:s18124090. [PMID: 30469527 PMCID: PMC6308500 DOI: 10.3390/s18124090] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2018] [Revised: 11/18/2018] [Accepted: 11/19/2018] [Indexed: 11/16/2022]
Abstract
Collaborative target tracking is one of the most important applications of wireless sensor networks (WSNs), in which the network must rely on sensor scheduling to balance the tracking accuracy and energy consumption, due to the limited network resources for sensing, communication, and computation. With the recent development of energy acquisition technologies, the building of WSNs based on energy harvesting has become possible to overcome the limitation of battery energy in WSNs, where theoretically the lifetime of the network could be extended to infinite. However, energy-harvesting WSNs pose new technical challenges for collaborative target tracking on how to schedule sensors over the infinite horizon under the restriction on limited sensor energy harvesting capabilities. In this paper, we propose a novel adaptive dynamic programming (ADP)-based multi-sensor scheduling algorithm (ADP-MSS) for collaborative target tracking for energy-harvesting WSNs. ADP-MSS can schedule multiple sensors for each time step over an infinite horizon to achieve high tracking accuracy, based on the extended Kalman filter (EKF) for target state prediction and estimation. Theoretical analysis shows the optimality of ADP-MSS, and simulation results demonstrate its superior tracking accuracy compared with an ADP-based single-sensor scheduling scheme and a simulated-annealing based multi-sensor scheduling scheme.
Collapse
Affiliation(s)
- Fen Liu
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing100083, China.
- Beijing Engineering Research Center of Industrial Spectrum Imaging, Beijing 100083, China.
| | - Wendong Xiao
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing100083, China.
- Beijing Engineering Research Center of Industrial Spectrum Imaging, Beijing 100083, China.
| | - Shuai Chen
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing100083, China.
- Beijing Engineering Research Center of Industrial Spectrum Imaging, Beijing 100083, China.
| | - Chengpeng Jiang
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing100083, China.
- Beijing Engineering Research Center of Industrial Spectrum Imaging, Beijing 100083, China.
| |
Collapse
|
41
|
Jiang H, Zhang H, Han J, Zhang K. Iterative adaptive dynamic programming methods with neural network implementation for multi-player zero-sum games. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.04.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
42
|
Sun M, Wu T, Chen L, Zhang G. Neural AILC for Error Tracking Against Arbitrary Initial Shifts. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2705-2716. [PMID: 28534792 DOI: 10.1109/tnnls.2017.2698507] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper concerns with the adaptive iterative learning control using neural networks for systems performing repetitive tasks over a finite time interval. Two standing issues of such iterative learning control processes are addressed: one is the initial condition problem and the other is that related to the approximation error. Instead of the state tracking, an error tracking approach is proposed to tackle the problem arising from arbitrary initial shifts. The desired error trajectory is prespecified at the design stage, suitable to different tracking tasks. The initial value of the desired error trajectory for each cycle is required to be the same as that of the actual error trajectory. It is just a requirement for the initial value of the desired error trajectory, but does not pose any requirement for the initial value of the actual error trajectory. It is shown that the actual error trajectory is adjustable and is able to converge to a prespecified neighborhood of the origin, while all variables of the closed-loop system are of uniform boundedness. The robustness improvement in case of nonzero approximation error is made possible due to the use of a deadzone modified Lyapunov functional. The resultant estimation for the bound of the approximation error avoids deterioration in tracking performance. The effectiveness of the designed learning controller is validated through an illustrative example.
Collapse
|
43
|
Nguyen T, Mukhopadhyay S. Two-phase selective decentralization to improve reinforcement learning systems with MDP. AI COMMUN 2018. [DOI: 10.3233/aic-180766] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Thanh Nguyen
- Department of Computer and Information Science, Indiana University Purdue University Indianapolis, 723 W Michigan St SL 280 Indianapolis, Indiana 46202, United States. E-mails: ,
| | - Snehasis Mukhopadhyay
- Department of Computer and Information Science, Indiana University Purdue University Indianapolis, 723 W Michigan St SL 280 Indianapolis, Indiana 46202, United States. E-mails: ,
| |
Collapse
|
44
|
Zhang H, Qu Q, Xiao G, Cui Y. Optimal Guaranteed Cost Sliding Mode Control for Constrained-Input Nonlinear Systems With Matched and Unmatched Disturbances. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2112-2126. [PMID: 29771665 DOI: 10.1109/tnnls.2018.2791419] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Based on integral sliding mode and approximate dynamic programming (ADP) theory, a novel optimal guaranteed cost sliding mode control is designed for constrained-input nonlinear systems with matched and unmatched disturbances. When the system moves on the sliding surface, the optimal guaranteed cost control problem of sliding mode dynamics is transformed into the optimal control problem of a reformulated auxiliary system with a modified cost function. The ADP algorithm based on single critic neural network (NN) is applied to obtain the approximate optimal control law for the auxiliary system. Lyapunov techniques are used to demonstrate the convergence of the NN weight errors. In addition, the derived approximate optimal control is verified to guarantee the sliding mode dynamics system to be stable in the sense of uniform ultimate boundedness. Some simulation results are presented to verify the feasibility of the proposed control scheme.
Collapse
|
45
|
Xiao G, Zhang H, Zhang K, Wen Y. Value iteration based integral reinforcement learning approach for H∞ controller design of continuous-time nonlinear systems. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.01.029] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
46
|
Liu L, Wang Z, Zhang H. Neural-Network-Based Robust Optimal Tracking Control for MIMO Discrete-Time Systems With Unknown Uncertainty Using Adaptive Critic Design. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1239-1251. [PMID: 28362616 DOI: 10.1109/tnnls.2017.2660070] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper is concerned with the robust optimal tracking control strategy for a class of nonlinear multi-input multi-output discrete-time systems with unknown uncertainty via adaptive critic design (ACD) scheme. The main purpose is to establish an adaptive actor-critic control method, so that the cost function in the procedure of dealing with uncertainty is minimum and the closed-loop system is stable. Based on the neural network approximator, an action network is applied to generate the optimal control signal and a critic network is used to approximate the cost function, respectively. In contrast to the previous methods, the main features of this paper are: 1) the ACD scheme is integrated into the controllers to cope with the uncertainty and 2) a novel cost function, which is not in quadric form, is proposed so that the total cost in the design procedure is reduced. It is proved that the optimal control signals and the tracking errors are uniformly ultimately bounded even when the uncertainty exists. Finally, a numerical simulation is developed to show the effectiveness of the present approach.
Collapse
|
47
|
Zhang H, Cui X, Luo Y, Jiang H. Finite-Horizon $H_{\infty }$ Tracking Control for Unknown Nonlinear Systems With Saturating Actuators. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1200-1212. [PMID: 28362620 DOI: 10.1109/tnnls.2017.2669099] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, a neural network (NN)-based online model-free integral reinforcement learning algorithm is developed to solve the finite-horizon optimal tracking control problem for completely unknown nonlinear continuous-time systems with disturbance and saturating actuators (constrained control input). An augmented system is constructed with the tracking error system and the command generator system. A time-varying Hamilton-Jacobi-Isaacs (HJI) equation is formulated for the augmented problem, which is extremely difficult or impossible to solve due to its time-dependent property and nonlinearity. Then, an actor-critic-disturbance NN structure-based scheme is proposed to learn the time-varying solution to the HJI equation in real time without using the knowledge of system dynamics. Since the solution to the HJI equation is time-dependent, the form of NNs representation with constant weights and time-dependent activation functions is considered. Furthermore, an extra error is incorporated in order to satisfy the terminal constraints in the weight update law. Convergence and stability proofs are given based on the Lyapunov theory for nonautonomous systems. Two simulation examples are provided to demonstrate the effectiveness of the designed algorithm.
Collapse
|
48
|
Wei Q, Liu D, Lin Q, Song R. Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:957-969. [PMID: 28141530 DOI: 10.1109/tnnls.2016.2638863] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, a novel adaptive dynamic programming (ADP) algorithm, called "iterative zero-sum ADP algorithm," is developed to solve infinite-horizon discrete-time two-player zero-sum games of nonlinear systems. The present iterative zero-sum ADP algorithm permits arbitrary positive semidefinite functions to initialize the upper and lower iterations. A novel convergence analysis is developed to guarantee the upper and lower iterative value functions to converge to the upper and lower optimums, respectively. When the saddle-point equilibrium exists, it is emphasized that both the upper and lower iterative value functions are proved to converge to the optimal solution of the zero-sum game, where the existence criteria of the saddle-point equilibrium are not required. If the saddle-point equilibrium does not exist, the upper and lower optimal performance index functions are obtained, respectively, where the upper and lower performance index functions are proved to be not equivalent. Finally, simulation results and comparisons are shown to illustrate the performance of the present method.
Collapse
|
49
|
Wei Q, Li B, Song R. Discrete-Time Stable Generalized Self-Learning Optimal Control With Approximation Errors. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1226-1238. [PMID: 28362617 DOI: 10.1109/tnnls.2017.2661865] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, a generalized policy iteration (GPI) algorithm with approximation errors is developed for solving infinite horizon optimal control problems for nonlinear systems. The developed stable GPI algorithm provides a general structure of discrete-time iterative adaptive dynamic programming algorithms, by which most of the discrete-time reinforcement learning algorithms can be described using the GPI structure. It is for the first time that approximation errors are explicitly considered in the GPI algorithm. The properties of the stable GPI algorithm with approximation errors are analyzed. The admissibility of the approximate iterative control law can be guaranteed if the approximation errors satisfy the admissibility criteria. The convergence of the developed algorithm is established, which shows that the iterative value function is convergent to a finite neighborhood of the optimal performance index function, if the approximate errors satisfy the convergence criterion. Finally, numerical examples and comparisons are presented.
Collapse
|
50
|
Self-learning robust optimal control for continuous-time nonlinear systems with mismatched disturbances. Neural Netw 2018; 99:19-30. [DOI: 10.1016/j.neunet.2017.11.022] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Revised: 10/16/2017] [Accepted: 11/28/2017] [Indexed: 11/19/2022]
|