1
|
Li Y, Zhang H, Wang Z, Huang C, Yan H. Decentralized Control for Large-Scale Systems With Actuator Faults and External Disturbances: A Data-Driven Method. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10882-10893. [PMID: 37027591 DOI: 10.1109/tnnls.2023.3245102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
This article investigates optimal control for a class of large-scale systems using a data-driven method. The existing control methods for large-scale systems in this context separately consider disturbances, actuator faults, and uncertainties. In this article, we build on such methods by proposing an architecture that accommodates simultaneous consideration of all of these effects, and an optimization index is designed for the control problem. This diversifies the class of large-scale systems amenable to optimal control. We first establish a min-max optimization index based on the zero-sum differential game theory. Then, by integrating all the Nash equilibrium solutions of the isolated subsystems, the decentralized zero-sum differential game strategy is obtained to stabilize the large-scale system. Meanwhile, by designing adaptive parameters, the impact of actuator failure on the system performance is eliminated. Afterward, an adaptive dynamic programming (ADP) method is utilized to learn the solution of the Hamilton-Jacobi-Isaac (HJI) equation, which does not need the prior knowledge of system dynamics. A rigorous stability analysis shows that the proposed controller asymptotically stabilizes the large-scale system. Finally, a multipower system example is adopted to illustrate the effectiveness of the proposed protocols.
Collapse
|
2
|
Wang Y, Wang D, Zhao M, Liu N, Qiao J. Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate. Neural Netw 2024; 175:106274. [PMID: 38583264 DOI: 10.1016/j.neunet.2024.106274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/15/2024] [Accepted: 03/25/2024] [Indexed: 04/09/2024]
Abstract
In this paper, an adjustable Q-learning scheme is developed to solve the discrete-time nonlinear zero-sum game problem, which can accelerate the convergence rate of the iterative Q-function sequence. First, the monotonicity and convergence of the iterative Q-function sequence are analyzed under some conditions. Moreover, by employing neural networks, the model-free tracking control problem can be overcome for zero-sum games. Second, two practical algorithms are designed to guarantee the convergence with accelerated learning. In one algorithm, an adjustable acceleration phase is added to the iteration process of Q-learning, which can be adaptively terminated with convergence guarantee. In another algorithm, a novel acceleration function is developed, which can adjust the relaxation factor to ensure the convergence. Finally, through a simulation example with the practical physical background, the fantastic performance of the developed algorithm is demonstrated with neural networks.
Collapse
Affiliation(s)
- Yuan Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Ding Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Mingming Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Nan Liu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Junfei Qiao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
3
|
Zhao B, Zhang Y, Liu D. Adaptive Dynamic Programming-Based Cooperative Motion/Force Control for Modular Reconfigurable Manipulators: A Joint Task Assignment Approach. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10944-10954. [PMID: 35544490 DOI: 10.1109/tnnls.2022.3171828] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This article develops a cooperative motion/force control (CMFC) scheme based on adaptive dynamic programming (ADP) for modular reconfigurable manipulators (MRMs) with the joint task assignment approach. By separating terms depending on local variables only, the dynamic model of the entire MRM system can be regarded as a set of joint modules interconnected by coupling torque. In addition, the Jacobian matrix, which reflects the interaction force of the MRM end-effector, can be mapped into each joint. Using this approach, both the motion and force tasks on the end-effector of the entire MRM system can be assigned to each joint module cooperatively. Then, by substituting the actual states of coupled joint modules with their desired ones, the norm-boundedness assumption on the interconnection of joint module can be relaxed. By using the measured input-output data of each joint module, a neural network (NN)-based robust decentralized observer, which guarantees the observation error to be asymptotically stable is established. An improved local value function is constructed for each joint module to reflect the interconnection. Then, the local Hamilton-Jacobi-Bellman equation is solved by constructing a local critic NN with a nested learning structure. Hereafter, the ADP-based CMFC is obtained by the assistance of force feedback compensation. Based on the Lyapunov stability analysis, the closed-loop MRM system is guaranteed to be uniformly ultimately bounded under the present ADP-based CMFC scheme. The simulation on a two-degree of freedom MRM system demonstrates the effectiveness of the present control approach.
Collapse
|
4
|
Han HG, Chen C, Sun HY, Qiao JF. Multiobjective Integrated Optimal Control for Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7712-7722. [PMID: 36129866 DOI: 10.1109/tcyb.2022.3204030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The multiobjective optimal control method optimizes the performance indexes of nonlinear systems to obtain setpoints, and designs a controller to track the setpoints. However, the stepwise optimal control method that independently analyzes the optimization process may obtain unfeasible and difficult to track setpoints, which will reduce the operation and control performance of the systems. To solve this problem, a multiobjective integrated optimal control (MIOC) strategy is proposed for nonlinear systems in this article. The main contributions of MIOC are threefold. First, in the framework of multiobjective model predictive control, an integrated control structure with a comprehensive cost function and a collaborative optimization algorithm is designed to achieve the coordinate optimal control. Second, for the time inconformity of setpoints and control laws caused by the characteristic of tracking control, the different prediction horizons are designed for the comprehensive cost function. Then, the collaborative optimization algorithm is proposed for the comprehensive cost function to achieve the integrated solution of setpoints and control laws to enhance the operation and control performance of nonlinear systems. Third, the stability and control performance analysis of MIOC is provided. Finally, the proposed MIOC method is applied for a nonlinear system to demonstrate its effectiveness.
Collapse
|
5
|
Lu K, Liu Z, Yu H, Chen CLP, Zhang Y. Decentralized Adaptive Neural Inverse Optimal Control of Nonlinear Interconnected Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8840-8851. [PMID: 35275825 DOI: 10.1109/tnnls.2022.3153360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Existing methods on decentralized optimal control of continuous-time nonlinear interconnected systems require a complicated and time-consuming iteration on finding the solution of Hamilton-Jacobi-Bellman (HJB) equations. In order to overcome this limitation, in this article, a decentralized adaptive neural inverse approach is proposed, which ensures the optimized performance but avoids solving HJB equations. Specifically, a new criterion of inverse optimal practical stabilization is proposed, based on which a new direct adaptive neural strategy and a modified tuning functions method are proposed to design a decentralized inverse optimal controller. It is proven that all the closed-loop signals are bounded and the goal of inverse optimality with respect to the cost functional is achieved. Illustrative examples validate the performance of the methods presented.
Collapse
|
6
|
Ha M, Wang D, Liu D. A Novel Value Iteration Scheme With Adjustable Convergence Rate. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7430-7442. [PMID: 35089866 DOI: 10.1109/tnnls.2022.3143527] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this article, a novel value iteration scheme is developed with convergence and stability discussions. A relaxation factor is introduced to adjust the convergence rate of the value function sequence. The convergence conditions with respect to the relaxation factor are given. The stability of the closed-loop system using the control policies generated by the present VI algorithm is investigated. Moreover, an integrated VI approach is developed to accelerate and guarantee the convergence by combining the advantages of the present and traditional value iterations. Also, a relaxation function is designed to adaptively make the developed value iteration scheme possess fast convergence property. Finally, the theoretical results and the effectiveness of the present algorithm are validated by numerical examples.
Collapse
|
7
|
Hu X, Zhang H, Ma D, Wang R, Wang T, Xie X. Real-Time Leak Location of Long-Distance Pipeline Using Adaptive Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7004-7013. [PMID: 34971544 DOI: 10.1109/tnnls.2021.3136939] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In traditional leak location methods, the position of the leak point is located through the time difference of pressure change points of both ends of the pipeline. The inaccurate estimation of pressure change points leads to the wrong leak location result. To address it, adaptive dynamic programming is proposed to solve the pipeline leak location problem in this article. First, a pipeline model is proposed to describe the pressure change along pipeline, which is utilized to reflect the iterative situation of the logarithmic form of pressure change. Then, under the Bellman optimality principle, a value iteration (VI) scheme is proposed to provide the optimal sequence of the nominal parameter and obtain the pipeline leak point. Furthermore, neural networks are built as the VI scheme structure to ensure the iterative performance of the proposed method. By transforming into the dynamic optimization problem, the proposed method adopts the estimation of the logarithmic form of pressure changes of both ends of the pipeline to locate the leak point, which avoids the wrong results caused by unclear pressure change points. Thus, it could be applied for real-time leak location of long-distance pipeline. Finally, the experiment cases are given to illustrate the effectiveness of the proposed method.
Collapse
|
8
|
Lin D, Xue S, Liu D, Liang M, Wang Y. Adaptive dynamic programming-based hierarchical decision-making of non-affine systems. Neural Netw 2023; 167:331-341. [PMID: 37673023 DOI: 10.1016/j.neunet.2023.07.044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 06/03/2023] [Accepted: 07/27/2023] [Indexed: 09/08/2023]
Abstract
In this paper, the problem of multiplayer hierarchical decision-making problem for non-affine systems is solved by adaptive dynamic programming. Firstly, the control dynamics are obtained according to the theory of dynamic feedback and combined with the original system dynamics to construct the affine augmented system. Thus, the non-affine multiplayer system is transformed into a general affine form. Then, the hierarchical decision problem is modeled as a Stackelberg game. In the Stackelberg game, the leader makes a decision based on the information of all followers, whereas the followers do not know each other's information and only obtain their optimal control strategy based on the leader's decision. Then, the augmented system is reconstructed by a neural network (NN) using input-output data. Moreover, a single critic NN is used to approximate the value function to obtain the optimal control strategy for each player. An extra term added to the weight update law makes the initial admissible control law no longer needed. According to the Lyapunov theory, the state of the system and the error of the weights of the NN are both uniformly ultimately bounded. Finally, the feasibility and validity of the algorithm are confirmed by simulation.
Collapse
Affiliation(s)
- Danyu Lin
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| | - Shan Xue
- School of Information and Communication Engineering, Hainan University, Haikou 570100, China.
| | - Derong Liu
- School of System Design and Intelligent Manufacturing, Southern University of Science and Technology, Shenzhen 518055, China; Department of Electrical and Computer Engineering, University of illinois Chicago, Chicago, IL 60607, USA.
| | - Mingming Liang
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| | - Yonghua Wang
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| |
Collapse
|
9
|
Liang M, Wang Y, Liu D. An Efficient Impulsive Adaptive Dynamic Programming Algorithm for Stochastic Systems. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:5545-5559. [PMID: 35380980 DOI: 10.1109/tcyb.2022.3158898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this study, a novel general impulsive transition matrix is defined, which can reveal the transition dynamics and probability distribution evolution patterns for all system states between two impulsive "events," instead of two regular time indexes. Based on this general matrix, the policy iteration-based impulsive adaptive dynamic programming (IADP) algorithm along with its variant, which is a more efficient IADP (EIADP) algorithm, are developed in order to solve the optimal impulsive control problems of discrete stochastic systems. Through analyzing the monotonicity, stability, and convergency properties of the obtained iterative value functions and control laws, it is proved that the IADP and EIADP algorithms both converge to the optimal impulsive performance index function. By dividing the whole impulsive policy into smaller pieces, the proposed EIADP algorithm updates the iterative policies in a "piece-by-piece" manner according to the actual hardware constraints. This feature of the EIADP method enables these ADP-based algorithms to be fully optimized to run on all "sizes" of computing devices including the ones with low memory spaces. A simulation experiment is conducted to validate the effectiveness of the present methods.
Collapse
|
10
|
Li Z, Wang M, Ma G. Adaptive optimal trajectory tracking control of AUVs based on reinforcement learning. ISA TRANSACTIONS 2023; 137:122-132. [PMID: 36522214 DOI: 10.1016/j.isatra.2022.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 11/16/2022] [Accepted: 12/02/2022] [Indexed: 06/04/2023]
Abstract
In this paper, an adaptive model-free optimal reinforcement learning (RL) neural network (NN) control scheme based on filter error is proposed for the trajectory tracking control problem of an autonomous underwater vehicle (AUV) with input saturation. Generally, the optimal control is realized by solving the Hamilton-Jacobi-Bellman (HJB) equation. However, due to its inherent nonlinearity and complexity, the HJB equation of AUV dynamics is challenging to solve. To deal with this problem, an RL strategy based on an actor-critic framework is proposed to approximate the solution of the HJB equation, where actor and critic NNs are used to perform control behavior and evaluate control performance, respectively. In addition, for the AUV system with the second-order strict-feedback dynamic model, the optimal controller design method based on filtering errors is proposed for the first time to simplify the controller design and accelerate the response speed of the system. Then, to solve the model-dependent problem, an extended state observer (ESO) is designed to estimate the unknown nonlinear dynamics, and an adaptive law is designed to estimate the unknown model parameters. To deal with the input saturation, an auxiliary variable system is utilized in the control law. The strict Lyapunov analysis guarantees that all signals of the system are semi-global uniformly ultimately bounded (SGUUB). Finally, the superiority of the proposed method is verified by comparative experiments.
Collapse
Affiliation(s)
- Zhifu Li
- School of Mechanical and Electrical Engineering, Guangzhou University, Guangzhou, 510006, China.
| | - Ming Wang
- School of Mechanical and Electrical Engineering, Guangzhou University, Guangzhou, 510006, China
| | - Ge Ma
- School of Mechanical and Electrical Engineering, Guangzhou University, Guangzhou, 510006, China
| |
Collapse
|
11
|
Wang W, Li Y. Distributed Fuzzy Optimal Consensus Control of State-Constrained Nonlinear Strict-Feedback Systems. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:2914-2929. [PMID: 35077380 DOI: 10.1109/tcyb.2021.3140104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This article investigates the distributed fuzzy optimal consensus control problem for state-constrained nonlinear strict-feedback systems under an identifier-actor-critic architecture. First, a fuzzy identifier is designed to approximate each agent's unknown nonlinear dynamics. Then, by defining multiple barrier-type local optimal performance indexes for each agent, the optimal virtual and actual control laws are obtained, where two fuzzy-logic systems working as the actor network and critic network are used to execute control behavior and evaluate control performance, respectively. It is proved that the proposed control protocol can drive all agents to reach consensus without violating state constraints, and make the local performance indexes reach the Nash equilibrium simultaneously. Simulation studies are given to verify the effectiveness of the developed fuzzy optimal consensus control approach.
Collapse
|
12
|
Zhao Y, Niu B, Zong G, Xu N, Ahmad A. Event-triggered optimal decentralized control for stochastic interconnected nonlinear systems via adaptive dynamic programming. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.03.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
13
|
Li K, Li Y. Adaptive NN Optimal Consensus Fault-Tolerant Control for Stochastic Nonlinear Multiagent Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:947-957. [PMID: 34432637 DOI: 10.1109/tnnls.2021.3104839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article investigates the problem of adaptive neural network (NN) optimal consensus tracking control for nonlinear multiagent systems (MASs) with stochastic disturbances and actuator bias faults. In control design, NN is adopted to approximate the unknown nonlinear dynamic, and a state identifier is constructed. The fault estimator is designed to solve the problem raised by time-varying actuator bias fault. By utilizing adaptive dynamic programming (ADP) in identifier-critic-actor construction, an adaptive NN optimal consensus fault-tolerant control algorithm is presented. It is proven that all signals of the controlled system are uniformly ultimately bounded (UUB) in probability, and all states of the follower agents can remain consensus with the leader's state. Finally, simulation results are given to illustrate the effectiveness of the developed optimal consensus control scheme and theorem.
Collapse
|
14
|
Lin M, Zhao B, Liu D. Policy gradient adaptive dynamic programming for nonlinear discrete-time zero-sum games with unknown dynamics. Soft comput 2023. [DOI: 10.1007/s00500-023-07817-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
15
|
Sun M, Yang Z, Dai X, Nian X, Xiong H, Wang H. An Adaptive Updating Method of Target Network Based on Moment Estimates for Deep Reinforcement Learning. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-11096-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
16
|
Ha M, Wang D, Liu D. Offline and Online Adaptive Critic Control Designs With Stability Guarantee Through Value Iteration. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13262-13274. [PMID: 34516384 DOI: 10.1109/tcyb.2021.3107801] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article is concerned with the stability of the closed-loop system using various control policies generated by value iteration. Some stability properties involving admissibility criteria, the attraction domain, and so forth, are investigated. An offline integrated value iteration (VI) scheme with a stability guarantee is developed by combining the advantages of VI and policy iteration, which is convenient to obtain admissible control policies. Also, based on the concept of attraction domain, an online adaptive dynamic programming algorithm using immature control policies is developed. Remarkably, it is ensured that the state trajectory under the online algorithm converges to the origin. Particularly, for linear systems, the online ADP algorithm with a general scheme possesses more enhanced stability property. The theoretical results reveal that the stability of the linear system can be guaranteed even if the control policy sequence includes finite unstable elements. The numerical results verify the effectiveness of the present algorithms.
Collapse
|
17
|
Yang X, Zeng Z, Gao Z. Decentralized Neurocontroller Design With Critic Learning for Nonlinear-Interconnected Systems. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:11672-11685. [PMID: 34191739 DOI: 10.1109/tcyb.2021.3085883] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We consider the decentralized control problem of a class of continuous-time nonlinear systems with mismatched interconnections. Initially, with the discounted cost functions being introduced to auxiliary subsystems, we have the decentralized control problem converted into a set of optimal control problems. To derive solutions to these optimal control problems, we first present the related Hamilton-Jacobi-Bellman equations (HJBEs). Then, we develop a novel critic learning method to solve these HJBEs. To implement the newly developed critic learning approach, we only use critic neural networks (NNs) and tune their weight vectors via the combination of a modified gradient descent method and concurrent learning. By using the present critic learning method, we not only remove the restriction of initial admissible control but also relax the persistence-of-excitation condition. After that, we employ Lyapunov's direct method to demonstrate that the critic NNs' weight estimation error and the states of closed-loop auxiliary systems are stable in the sense of uniform ultimate boundedness. Finally, we separately provide a nonlinear-interconnected plant and an unstable interconnected power system to validate the present critic learning approach.
Collapse
|
18
|
Wu Q, Zhao B, Liu D, Polycarpou MM. Event-triggered adaptive dynamic programming for decentralized tracking control of input constrained unknown nonlinear interconnected systems. Neural Netw 2022; 157:336-349. [DOI: 10.1016/j.neunet.2022.10.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 09/26/2022] [Accepted: 10/24/2022] [Indexed: 11/11/2022]
|
19
|
Xue S, Luo B, Liu D, Gao Y. Neural network-based event-triggered integral reinforcement learning for constrained H∞ tracking control with experience replay. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
20
|
Event-triggered integral reinforcement learning for nonzero-sum games with asymmetric input saturation. Neural Netw 2022; 152:212-223. [DOI: 10.1016/j.neunet.2022.04.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 04/04/2022] [Accepted: 04/14/2022] [Indexed: 11/20/2022]
|
21
|
Huo X, Karimi HR, Zhao X, Wang B, Zong G. Adaptive-Critic Design for Decentralized Event-Triggered Control of Constrained Nonlinear Interconnected Systems Within an Identifier-Critic Framework. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:7478-7491. [PMID: 33400659 DOI: 10.1109/tcyb.2020.3037321] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article studies the decentralized event-triggered control problem for a class of constrained nonlinear interconnected systems. By assigning a specific cost function for each constrained auxiliary subsystem, the original control problem is equivalently transformed into finding a series of optimal control policies updating in an aperiodic manner, and these optimal event-triggered control laws together constitute the desired decentralized controller. It is strictly proven that the system under consideration is stable in the sense of uniformly ultimate boundedness provided by the solutions of event-triggered Hamilton-Jacobi-Bellman equations. Different from the traditional adaptive critic design methods, we present an identifier-critic network architecture to relax the restrictions posed on the system dynamics, and the actor network commonly used to approximate the optimal control law is circumvented. The weights in the critic network are tuned on the basis of the gradient descent approach as well as the historical data, such that the persistence of excitation condition is no longer needed. The validity of our control scheme is demonstrated through a simulation example.
Collapse
|
22
|
Li Y, Liu Y, Tong S. Observer-Based Neuro-Adaptive Optimized Control of Strict-Feedback Nonlinear Systems With State Constraints. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3131-3145. [PMID: 33497342 DOI: 10.1109/tnnls.2021.3051030] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article proposes an adaptive neural network (NN) output feedback optimized control design for a class of strict-feedback nonlinear systems that contain unknown internal dynamics and the states that are immeasurable and constrained within some predefined compact sets. NNs are used to approximate the unknown internal dynamics, and an adaptive NN state observer is developed to estimate the immeasurable states. By constructing a barrier type of optimal cost functions for subsystems and employing an observer and the actor-critic architecture, the virtual and actual optimal controllers are developed under the framework of backstepping technique. In addition to ensuring the boundedness of all closed-loop signals, the proposed strategy can also guarantee that system states are confined within some preselected compact sets all the time. This is achieved by means of barrier Lyapunov functions which have been successfully applied to various kinds of nonlinear systems such as strict-feedback and pure-feedback dynamics. Besides, our developed optimal controller requires less conditions on system dynamics than some existing approaches concerning optimal control. The effectiveness of the proposed optimal control approach is eventually validated by numerical as well as practical examples.
Collapse
|
23
|
Yang X, Xu M, Wei Q. Dynamic Event-Sampled Control of Interconnected Nonlinear Systems Using Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:923-937. [PMID: 35666792 DOI: 10.1109/tnnls.2022.3178017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
We develop a decentralized dynamic event-based control strategy for nonlinear systems subject to matched interconnections. To begin with, we introduce a dynamic event-based sampling mechanism, which relies on the system's states and the variables generated by time-based differential equations. Then, we prove that the decentralized event-based controller for the whole system is composed of all the optimal event-based control policies of nominal subsystems. To derive these optimal event-based control policies, we design a critic-only architecture to solve the related event-based Hamilton-Jacobi-Bellman equations in the reinforcement learning framework. The implementation of such an architecture uses only critic neural networks (NNs) with their weight vectors being updated through the gradient descent method together with concurrent learning. After that, we demonstrate that the asymptotic stability of closed-loop nominal subsystems and the uniformly ultimate boundedness stability of critic NNs' weight estimation errors are guaranteed by using Lyapunov's approach. Finally, we provide simulations of a matched nonlinear-interconnected plant to validate the present theoretical claims.
Collapse
|
24
|
Luo Y, Yu X, Yang D, Zhou B. A survey of intelligent transmission line inspection based on unmanned aerial vehicle. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10189-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
25
|
Wang H, Li M. Model-Free Reinforcement Learning for Fully Cooperative Consensus Problem of Nonlinear Multiagent Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1482-1491. [PMID: 33338022 DOI: 10.1109/tnnls.2020.3042508] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article presents an off-policy model-free algorithm based on reinforcement learning (RL) to optimize the fully cooperative (FC) consensus problem of nonlinear continuous-time multiagent systems (MASs). First, the optimal FC consensus problem is transformed into solving the coupled Hamilton-Jacobian-Bellman (HJB) equation. Then, we propose a policy iteration (PI)-based algorithm, which is further proved to be effective to solve the coupled HJB equation. To implement this scheme in a model-free way, a model-free Bellman equation is derived to find the optimal value function and the optimal control policy for each agent. Then, based on the least-squares approach, the tuning law for actor and critic weights is derived by employing actor and critic neural networks into the model-free Bellman equation to approximate the target policies and the value function. Finally, we propose an off-policy model-free integral RL (IRL) algorithm, which can be used to optimize the FC consensus problem of the whole system in real time by using measured data. The effectiveness of this proposed algorithm is verified by the simulation results.
Collapse
|
26
|
Zamfirache IA, Precup RE, Roman RC, Petriu EM. Policy Iteration Reinforcement Learning-based control using a Grey Wolf Optimizer algorithm. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.11.051] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
27
|
Narayanan V, Modares H, Jagannathan S, Lewis FL. Event-Driven Off-Policy Reinforcement Learning for Control of Interconnected Systems. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1936-1946. [PMID: 32639933 DOI: 10.1109/tcyb.2020.2991166] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, we introduce a novel approximate optimal decentralized control scheme for uncertain input-affine nonlinear-interconnected systems. In the proposed scheme, we design a controller and an event-triggering mechanism (ETM) at each subsystem to optimize a local performance index and reduce redundant control updates, respectively. To this end, we formulate a noncooperative dynamic game at every subsystem in which we collectively model the interconnection inputs and the event-triggering error as adversarial players that deteriorate the subsystem performance and model the control policy as the performance optimizer, competing against these adversarial players. To obtain a solution to this game, one has to solve the associated Hamilton-Jacobi-Isaac (HJI) equation, which does not have a closed-form solution even when the subsystem dynamics are accurately known. In this context, we introduce an event-driven off-policy integral reinforcement learning (OIRL) approach to learn an approximate solution to this HJI equation using artificial neural networks (NNs). We then use this NN approximated solution to design the control policy and event-triggering threshold at each subsystem. In the learning framework, we guarantee the Zeno-free behavior of the ETMs at each subsystem using the exploration policies. Finally, we derive sufficient conditions to guarantee uniform ultimate bounded regulation of the controlled system states and demonstrate the efficacy of the proposed framework with numerical examples.
Collapse
|
28
|
Zhang K, Su R, Zhang H, Tian Y. Adaptive Resilient Event-Triggered Control Design of Autonomous Vehicles With an Iterative Single Critic Learning Framework. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:5502-5511. [PMID: 33534717 DOI: 10.1109/tnnls.2021.3053269] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article investigates the adaptive resilient event-triggered control for rear-wheel-drive autonomous (RWDA) vehicles based on an iterative single critic learning framework, which can effectively balance the frequency/changes in adjusting the vehicle's control during the running process. According to the kinematic equation of RWDA vehicles and the desired trajectory, the tracking error system during the autonomous driving process is first built, where the denial-of-service (DoS) attacking signals are injected into the networked communication and transmission. Combining the event-triggered sampling mechanism and iterative single critic learning framework, a new event-triggered condition is developed for the adaptive resilient control algorithm, and the novel utility function design is considered for driving the autonomous vehicle, where the control input can be guaranteed into an applicable saturated bound. Finally, we apply the new adaptive resilient control scheme to a case of driving the RWDA vehicles, and the simulation results illustrate the effectiveness and practicality successfully.
Collapse
|
29
|
Abdul-Adheem WR, Alkhayyat A, Al Mhdawi AK, Bessis N, Ibraheem IK, Abdulkareem AI, Humaidi AJ, AL-Qassar AA. Anti-Disturbance Compensation-Based Nonlinear Control for a Class of MIMO Uncertain Nonlinear Systems. ENTROPY 2021; 23:e23111487. [PMID: 34828185 PMCID: PMC8625644 DOI: 10.3390/e23111487] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 11/04/2021] [Accepted: 11/05/2021] [Indexed: 12/04/2022]
Abstract
Multi-Inputs-Multi-Outputs (MIMO) systems are recognized mainly in industrial applications with both input and state couplings, and uncertainties. The essential principle to deal with such difficulties is to eliminate the input couplings, then estimate the remaining issues in real-time, followed by an elimination process from the input channels. These difficulties are resolved in this research paper, where a decentralized control scheme is suggested using an Improved Active Disturbance Rejection Control (IADRC) configuration. A theoretical analysis using a state-space eigenvalue test followed by numerical simulations on a general uncertain nonlinear highly coupled MIMO system validated the effectiveness of the proposed control scheme in controlling such MIMO systems. Time-domain comparisons with the Conventional Active Disturbance Rejection Control (CADRC)-based decentralizing control scheme are also included.
Collapse
Affiliation(s)
- Wameedh Riyadh Abdul-Adheem
- Department of Electrical Engineering, College of Engineering, University of Baghdad, Baghdad 10001, Iraq; (W.R.A.-A.); (I.K.I.)
| | - Ahmed Alkhayyat
- College of Technical Engineering, The Islamic University, Najaf 54001, Iraq;
| | - Ammar K. Al Mhdawi
- Department of Computer Science, Edge Hill University, Ormskirk L39 4QP, UK;
- Correspondence:
| | - Nik Bessis
- Department of Computer Science, Edge Hill University, Ormskirk L39 4QP, UK;
| | - Ibraheem Kasim Ibraheem
- Department of Electrical Engineering, College of Engineering, University of Baghdad, Baghdad 10001, Iraq; (W.R.A.-A.); (I.K.I.)
- Department of Computer Engineering Techniques, Al-Rasheed University College, Baghdad 10001, Iraq
| | - Ahmed Ibraheem Abdulkareem
- Control and Systems Engineering Department, University of Technology, Baghdad 10001, Iraq; (A.I.A.); (A.J.H.); (A.A.A.-Q.)
| | - Amjad J. Humaidi
- Control and Systems Engineering Department, University of Technology, Baghdad 10001, Iraq; (A.I.A.); (A.J.H.); (A.A.A.-Q.)
| | - Arif A. AL-Qassar
- Control and Systems Engineering Department, University of Technology, Baghdad 10001, Iraq; (A.I.A.); (A.J.H.); (A.A.A.-Q.)
| |
Collapse
|
30
|
Ha M, Wang D, Liu D. Neural-network-based discounted optimal control via an integrated value iteration with accuracy guarantee. Neural Netw 2021; 144:176-186. [PMID: 34500256 DOI: 10.1016/j.neunet.2021.08.025] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 08/19/2021] [Accepted: 08/19/2021] [Indexed: 10/20/2022]
Abstract
A data-based value iteration algorithm with the bidirectional approximation feature is developed for discounted optimal control. The unknown nonlinear system dynamics is first identified by establishing a model neural network. To improve the identification precision, biases are introduced to the model network. The model network with biases is trained by the gradient descent algorithm, where the weights and biases across all layers are updated. The uniform ultimate boundedness stability with a proper learning rate is analyzed, by using the Lyapunov approach. Moreover, an integrated value iteration with the discounted cost is developed to fully guarantee the approximation accuracy of the optimal value function. Then, the effectiveness of the proposed algorithm is demonstrated by carrying out two simulation examples with physical backgrounds.
Collapse
Affiliation(s)
- Mingming Ha
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China.
| | - Ding Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China.
| | - Derong Liu
- Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, IL 60607, USA.
| |
Collapse
|
31
|
Zhang S, Zhao B, Liu D, Zhang Y. Observer-based event-triggered control for zero-sum games of input constrained multi-player nonlinear systems. Neural Netw 2021; 144:101-112. [PMID: 34478940 DOI: 10.1016/j.neunet.2021.08.012] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 07/18/2021] [Accepted: 08/09/2021] [Indexed: 11/18/2022]
Abstract
In this paper, an event-triggered control (ETC) method is investigated to solve zero-sum game (ZSG) problems of unknown multi-player continuous-time nonlinear systems with input constraints by using adaptive dynamic programming (ADP). To relax the requirement of system dynamics, a neural network (NN) observer is constructed to identify the dynamics of multi-player system via the input and output data. Then, the event-triggered Hamilton-Jacobi-Isaacs (HJI) equation of the ZSG can be solved by constructing a critic NN, and the approximated optimal control law and the worst disturbance law can be obtained directly. A triggering scheme which determines the updating time instants of the control law and the disturbance law is developed. Thus, the proposed ADP-based ETC method cannot only reduce the computational burden, but also save communication resource and bandwidths. Furthermore, we prove that the signals of the closed-loop system and the approximate errors of the critic NN weights are uniformly ultimately bounded by using Lyapunov's direct method, and the Zeno behavior is excluded. Finally, two simulation examples are provided to demonstrate the effectiveness of the proposed ETC scheme.
Collapse
Affiliation(s)
- Shunchao Zhang
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| | - Bo Zhao
- School of Systems Science, Beijing Normal University, Beijing 100875, China.
| | - Derong Liu
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| | - Yongwei Zhang
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| |
Collapse
|
32
|
|
33
|
Liu C, Zhang H, Sun S, Ren H. Online H∞ control for continuous-time nonlinear large-scale systems via single echo state network. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.017] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
34
|
Ma B, Li Y, An T, Dong B. Compensator-critic structure-based neuro-optimal control of modular robot manipulators with uncertain environmental contacts using non-zero-sum games. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107100] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
35
|
Zhao B, Liu D, Alippi C. Sliding-Mode Surface-Based Approximate Optimal Control for Uncertain Nonlinear Systems With Asymptotically Stable Critic Structure. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2858-2869. [PMID: 31945008 DOI: 10.1109/tcyb.2019.2962011] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article develops a novel sliding-mode surface (SMS)-based approximate optimal control scheme for a large class of nonlinear systems affected by unknown mismatched perturbations. The observer-based perturbation estimation procedure is employed to establish the online updated value function. The solution to the Hamilton-Jacobi-Bellman equation is approximated by an SMS-based critic neural network whose weights error dynamics is designed to be asymptotically stable by nested update laws. The sliding-mode control strategy is combined with the approximate optimal control design procedure to obtain a faster control action. The stability is proved based on the Lyapunov's direct method. The simulation results show the effectiveness of the developed control scheme.
Collapse
|
36
|
Yang X, He H, Zhong X. Approximate Dynamic Programming for Nonlinear-Constrained Optimizations. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2419-2432. [PMID: 31329149 DOI: 10.1109/tcyb.2019.2926248] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this paper, we study the constrained optimization problem of a class of uncertain nonlinear interconnected systems. First, we prove that the solution of the constrained optimization problem can be obtained through solving an array of optimal control problems of constrained auxiliary subsystems. Then, under the framework of approximate dynamic programming, we present a simultaneous policy iteration (SPI) algorithm to solve the Hamilton-Jacobi-Bellman equations corresponding to the constrained auxiliary subsystems. By building an equivalence relationship, we demonstrate the convergence of the SPI algorithm. Meanwhile, we implement the SPI algorithm via an actor-critic structure, where actor networks are used to approximate optimal control policies and critic networks are applied to estimate optimal value functions. By using the least squares method and the Monte Carlo integration technique together, we are able to determine the weight vectors of actor and critic networks. Finally, we validate the developed control method through the simulation of a nonlinear interconnected plant.
Collapse
|
37
|
Li H, Wu Y, Chen M. Adaptive Fault-Tolerant Tracking Control for Discrete-Time Multiagent Systems via Reinforcement Learning Algorithm. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1163-1174. [PMID: 32386171 DOI: 10.1109/tcyb.2020.2982168] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article investigates the adaptive fault-tolerant tracking control problem for a class of discrete-time multiagent systems via a reinforcement learning algorithm. The action neural networks (NNs) are used to approximate unknown and desired control input signals, and the critic NNs are employed to estimate the cost function in the design procedure. Furthermore, the direct adaptive optimal controllers are designed by combining the backstepping technique with the reinforcement learning algorithm. Comparing the existing reinforcement learning algorithm, the computational burden can be effectively reduced by using the method of less learning parameters. The adaptive auxiliary signals are established to compensate for the influence of the dead zones and actuator faults on the control performance. Based on the Lyapunov stability theory, it is proved that all signals of the closed-loop system are semiglobally uniformly ultimately bounded. Finally, some simulation results are presented to illustrate the effectiveness of the proposed approach.
Collapse
|
38
|
Zhao B, Luo F, Lin H, Liu D. Particle swarm optimized neural networks based local tracking control scheme of unknown nonlinear interconnected systems. Neural Netw 2020; 134:54-63. [PMID: 33285427 DOI: 10.1016/j.neunet.2020.09.020] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 09/07/2020] [Accepted: 09/28/2020] [Indexed: 11/28/2022]
Abstract
In this paper, a local tracking control (LTC) scheme is developed via particle swarm optimized neural networks (PSONN) for unknown nonlinear interconnected systems. With the local input-output data, a local neural network identifier is constructed to approximate the local input gain matrix and the mismatched interconnection, which are utilized to derive the LTC. To solve the local Hamilton-Jacobi-Bellman equation, a local critic NN is established to estimate the proper local value function, which reflects the mismatched interconnection. The weight vector of the local critic NN is trained online by particle swarm optimization, thus the success rate of system execution is increased. The stability of the closed-loop unknown nonlinear interconnected system is guaranteed to be uniformly ultimately bounded through Lyapunov's direct method. Simulation results of two examples demonstrate the effectiveness of the developed PSONN-based LTC scheme.
Collapse
Affiliation(s)
- Bo Zhao
- School of Systems Science, Beijing Normal University, Beijing 100875, China.
| | - Fangchao Luo
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| | - Haowei Lin
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| | - Derong Liu
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
| |
Collapse
|
39
|
Liu C, Jiang B, Patton RJ, Zhang K. Decentralized Output Sliding-Mode Fault-Tolerant Control for Heterogeneous Multiagent Systems. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4934-4945. [PMID: 31059465 DOI: 10.1109/tcyb.2019.2912636] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This paper proposes a novel decentralized output sliding-mode fault-tolerant control (FTC) design for heterogeneous multiagent systems (MASs) with matched disturbances, unmatched nonlinear interactions, and actuator faults. The respective iteration and iteration-free algorithms in the sliding-mode FTC scheme are designed with adaptive upper bounding laws to automatically compensate the matched and unmatched components. Then, a continuous fault-tolerant protocol in the observer-based integral sliding-mode design is developed to guarantee the asymptotic stability of MASs and the ultimate boundedness of the estimation errors. Simulation results validate the efficiency of the proposed FTC algorithm.
Collapse
|
40
|
Su H, Zhang H, Liang X, Liu C. Decentralized Event-Triggered Online Adaptive Control of Unknown Large-Scale Systems Over Wireless Communication Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4907-4919. [PMID: 31940563 DOI: 10.1109/tnnls.2019.2959005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, a novel online decentralized event-triggered control scheme is proposed for a class of nonlinear interconnected large-scale systems subject to unknown internal system dynamics and interconnected terms. First, by designing a neural network-based identifier, the unknown internal dynamics of the interconnected systems is reconstructed. Then, the adaptive critic design method is used to learn the approximate optimal control policies in the context of event-triggered mechanism. Specifically, the event-based control processes of different subsystems are independent, asynchronous, and decentralized. That is, the decentralized event-triggering conditions and the controllers only rely on the local state information of the corresponding subsystems, which avoids the transmissions of the state information between the subsystems over the wireless communication networks. Then, with the help of Lyapunov's theorem, the states of the developed closed-loop control system and the critic weight estimation errors are proved to be uniformly ultimately bounded. Finally, the effectiveness and applicability of the event-based control method are verified by an illustrative numerical example and a practical example.
Collapse
|
41
|
Xu Y, Jiang B, Yang H. Two-Level Game-Based Distributed Optimal Fault-Tolerant Control for Nonlinear Interconnected Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4892-4906. [PMID: 31940562 DOI: 10.1109/tnnls.2019.2958948] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article addresses the distributed optimal fault-tolerant control (FTC) issue by using the two-level game approach for a class of nonlinear interconnected systems, in which each subsystem couples with its neighbors through not only the states but also the inputs. At the first level, the FTC problem for each subsystem is formulated as a zero-sum differential game, in which the controller and the fault are regarded as two players with opposite interests. At the second level, the whole interconnected system is formulated as a graphical game, in which each subsystem is a player to achieve the global Nash equilibrium for the overall system. The rigorous proof of the stability of the interconnected system is given by means of the cyclic-small-gain theorem, and the relationship between the local optimality and the global optimality is analyzed. Moreover, based on the adaptive dynamic programming (ADP) technology, a distributed optimal FTC learning scheme is proposed, in which a group of critic neural networks (NNs) are established to approximate the cost functions. Finally, an example is taken to illustrate the efficiency and applicability of the obtained theoretical results.
Collapse
|
42
|
Bai W, Li T, Tong S. NN Reinforcement Learning Adaptive Control for a Class of Nonstrict-Feedback Discrete-Time Systems. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4573-4584. [PMID: 31995515 DOI: 10.1109/tcyb.2020.2963849] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article investigates an adaptive reinforcement learning (RL) optimal control design problem for a class of nonstrict-feedback discrete-time systems. Based on the neural network (NN) approximating ability and RL control design technique, an adaptive backstepping RL optimal controller and a minimal learning parameter (MLP) adaptive RL optimal controller are developed by establishing a novel strategic utility function and introducing external function terms. It is proved that the proposed adaptive RL optimal controllers can guarantee that all signals in the closed-loop systems are semiglobal uniformly ultimately bounded (SGUUB). The main feature is that the proposed schemes can solve the optimal control problem that the previous literature cannot deal with. Furthermore, the proposed MPL adaptive optimal control scheme can reduce the number of adaptive laws, and thus the computational complexity is decreased. Finally, the simulation results illustrate the validity of the proposed optimal control schemes.
Collapse
|
43
|
Wei Q, Song R, Liao Z, Li B, Lewis FL. Discrete-Time Impulsive Adaptive Dynamic Programming. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4293-4306. [PMID: 30990209 DOI: 10.1109/tcyb.2019.2906694] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal impulsive control problems for infinite horizon discrete-time nonlinear systems. Considering the constraint of the impulsive interval, in each iteration, the iterative impulsive value function under each possible impulsive interval is obtained, and then the iterative value function and iterative control law are achieved. A new convergence analysis method is developed which proves an iterative value function to converge to the optimum as the iteration index increases to infinity. The properties of the iterative control law are analyzed, and the detailed implementation of the optimal impulsive control law is presented. Finally, two simulation examples with comparisons are given to show the effectiveness of the developed method.
Collapse
|
44
|
Zhao B, Liu D, Luo C. Reinforcement Learning-Based Optimal Stabilization for Unknown Nonlinear Systems Subject to Inputs With Uncertain Constraints. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4330-4340. [PMID: 31899437 DOI: 10.1109/tnnls.2019.2954983] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article presents a novel reinforcement learning strategy that addresses an optimal stabilizing problem for unknown nonlinear systems subject to uncertain input constraints. The control algorithm is composed of two parts, i.e., online learning optimal control for the nominal system and feedforward neural networks (NNs) compensation for handling uncertain input constraints, which are considered as the saturation nonlinearities. Integrating the input-output data and recurrent NN, a Luenberger observer is established to approximate the unknown system dynamics. For nominal systems without input constraints, the online learning optimal control policy is derived by solving Hamilton-Jacobi-Bellman equation via a critic NN alone. By transforming the uncertain input constraints to saturation nonlinearities, the uncertain input constraints can be compensated by employing a feedforward NN compensator. The convergence of the closed-loop system is guaranteed to be uniformly ultimately bounded by using the Lyapunov stability analysis. Finally, the effectiveness of the developed stabilization scheme is illustrated by simulation studies.
Collapse
|
45
|
Decentralized composite suboptimal control for a class of two-time-scale interconnected networks with unknown slow dynamics. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.11.057] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
46
|
Feng T, Zhang J, Zhang H. Consensusability of discrete-time linear multi-agent systems with multiple inputs. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.11.040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
47
|
Ma L, Shen Y, Li J. A neural-network-based hysteresis model for piezoelectric actuators. THE REVIEW OF SCIENTIFIC INSTRUMENTS 2020; 91:015002. [PMID: 32012591 DOI: 10.1063/1.5121471] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Accepted: 12/08/2019] [Indexed: 06/10/2023]
Abstract
In this paper, a new neural network based hysteresis model is presented. First of all, a variable-order hysteretic operator (VOHO) is proposed via the characteristics of the motion point trajectory. Based on the VOHO, a basic hysteresis model (BHM) is constructed. Next, the input space is expanded from one-dimension to two-dimension based on the BHM so that the method of neural networks can be used to approximate the mapping between the expanded input space and the output space. Finally, three experiments involved with a piezoelectric actuator were implemented to validate the neural hysteresis model. The results of the experiments suggest that the proposed approach is effective.
Collapse
Affiliation(s)
- Lianwei Ma
- School of Information Science and Engineering, Ningbo Institute of Technology, Zhejiang University, Ningbo 315100, China
| | - Yu Shen
- Department of Applied Physics, Zhejiang University of Science and Technology, Hangzhou 310023, China
| | - Jinrong Li
- Department of Automation, Zhejiang University of Science and Technology, Hangzhou 310023, China
| |
Collapse
|
48
|
Mu C, Zhang Y. Learning-Based Robust Tracking Control of Quadrotor With Time-Varying and Coupling Uncertainties. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:259-273. [PMID: 30908267 DOI: 10.1109/tnnls.2019.2900510] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, a learning-based robust tracking control scheme is proposed for a quadrotor unmanned aerial vehicle system. The quadrotor dynamics are modeled including time-varying and coupling uncertainties. By designing position and attitude tracking error subsystems, the robust tracking control strategy is conducted by involving the approximately optimal control of associated nominal error subsystems. Furthermore, an improved weight updating rule is adopted, and neural networks are applied in the learning-based control scheme to get the approximately optimal control laws of the nominal error subsystems. The stability of tracking error subsystems with time-varying and coupling uncertainties is provided as the theoretical guarantee of learning-based robust tracking control scheme. Finally, considering the variable disturbances in the actual environment, three simulation cases are presented based on linear and nonlinear models of quadrotor with competitive results to demonstrate the effectiveness of the proposed control scheme.
Collapse
|
49
|
Moghadam R, Modares H. Resilient Autonomous Control of Distributed Multiagent Systems in Contested Environments. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:3957-3967. [PMID: 30130241 DOI: 10.1109/tcyb.2018.2856089] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
An autonomous and resilient controller is proposed for leader-follower multiagent systems under uncertainties and cyber-physical attacks. The leader is assumed nonautonomous with a nonzero control input, which allows changing the team behavior or mission in response to the environmental changes. A resilient learning-based control protocol is presented to find optimal solutions to the synchronization problem in the presence of attacks and system dynamic uncertainties. An observer-based distributed H∞ controller is first designed to prevent propagating the effects of attacks on sensors and actuators throughout the network, as well as to attenuate the effect of these attacks on the compromised agent itself. Nonhomogeneous game algebraic Riccati equations are derived to solve the H∞ optimal synchronization problem and off-policy reinforcement learning (RL) is utilized to learn their solution without requiring any knowledge of the agent's dynamics. A trust-confidence-based distributed control protocol is then proposed to mitigate attacks that hijack the entire node and attacks on communication links. A confidence value is defined for each agent based solely on its local evidence. The proposed resilient RL algorithm employs the confidence value of each agent to indicate the trustworthiness of its own information and broadcast it to its neighbors to put weights on the data they receive from it during and after learning. If the confidence value of an agent is low, it employs a trust mechanism to identify compromised agents and remove the data it receives from them from the learning process. The simulation results are provided to show the effectiveness of the proposed approach.
Collapse
|
50
|
Wen G, Ge SS, Chen CLP, Tu F, Wang S. Adaptive Tracking Control of Surface Vessel Using Optimized Backstepping Technique. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:3420-3431. [PMID: 29994688 DOI: 10.1109/tcyb.2018.2844177] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, a tracking control approach for surface vessel is developed based on the new control technique named optimized backstepping (OB), which considers optimization as a backstepping design principle. Since surface vessel systems are modeled by second-order dynamic in strict feedback form, backstepping is an ideal technique for finishing the tracking task. In the backstepping control of surface vessel, the virtual and actual controls are designed to be the optimized solutions of corresponding subsystems, therefore the overall control is optimized. In general, optimization control is designed based on the solution of Hamilton-Jacobi-Bellman equation. However, solving the equation is very difficult or even impossible due to the inherent nonlinearity and complexity. In order to overcome the difficulty, the reinforcement learning (RL) strategy of actor-critic architecture is usually considered, of which the critic and actor are utilized for evaluating the control performance and executing the control behavior, respectively. By employing the actor-critic RL algorithm for both virtual and actual controls of the vessel, it is proven that the desired optimizing and tracking performances can be arrived. Simulation results further demonstrate effectiveness of the proposed surface vessel control.
Collapse
|