1
|
Chai J, Zhu Y, Zhao D. NVIF: Neighboring Variational Information Flow for Cooperative Large-Scale Multiagent Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17829-17841. [PMID: 37672377 DOI: 10.1109/tnnls.2023.3309608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Communication-based multiagent reinforcement learning (MARL) has shown promising results in promoting cooperation by enabling agents to exchange information. However, the existing methods have limitations in large-scale multiagent systems due to high information redundancy, and they tend to overlook the unstable training process caused by the online-trained communication protocol. In this work, we propose a novel method called neighboring variational information flow (NVIF), which enhances communication among neighboring agents by providing them with the maximum information set (MIS) containing more information than the existing methods. NVIF compresses the MIS into a compact latent state while adopting neighboring communication. To stabilize the overall training process, we introduce a two-stage training mechanism. We first pretrain the NVIF module using a randomly sampled offline dataset to create a task-agnostic and stable communication protocol, and then use the pretrained protocol to perform online policy training with RL algorithms. Our theoretical analysis indicates that NVIF-proximal policy optimization (PPO), which combines NVIF with PPO, has the potential to promote cooperation with agent-specific rewards. Experiment results demonstrate the superiority of our method in both heterogeneous and homogeneous settings. Additional experiment results also demonstrate the potential of our method for multitask learning.
Collapse
|
2
|
Liang Y, Zhang H, Zhang J, Ming Z. Event-Triggered Guarantee Cost Control for Partially Unknown Stochastic Systems via Explorized Integral Reinforcement Learning Strategy. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7830-7844. [PMID: 36395138 DOI: 10.1109/tnnls.2022.3221105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this article, an integral reinforcement learning (IRL)-based event-triggered guarantee cost control (GCC) approach is proposed for stochastic systems which are modulated by randomly time-varying parameters. First, with the aid of the RL algorithm, the optimal GCC (OGCC) problem is converted into an optimal zero-sum game by solving a modified Hamilton-Jacobin-Isaac (HJI) equation of the auxiliary system. Moreover, in order to address the stochastic zero-sum game, we propose an on-policy IRL-based control approach involved by the multivariate probabilistic collocation method (MPCM), which can accurately predict the mean value of uncertain functions with randomly time-varying parameters. Furthermore, a novel GCC method, which combines the explorized IRL algorithm and MPCM, is designed to relax the restriction of knowing the system dynamics for the class of stochastic systems. On this foundation, for the purpose of reducing computation cost and avoiding the waste of resources, we propose an event-triggered GCC approach involved with explorized IRL and MPCM by utilizing critic-actor-disturbance neural networks (NNs). Meanwhile, the weight vectors of three NNs are updated simultaneously and aperiodically according to the designed triggering condition. The ultimate boundedness (UB) properties of the controlled systems have been proved by means of the Lyapunov theorem. Finally, the effectiveness of the developed GCC algorithms is illustrated via two simulation examples.
Collapse
|
3
|
Du X, Zhan X, Wu J, Yan H. Effects of Two-Channel Noise and Packet Loss on Performance of Information Time Delay Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8549-8556. [PMID: 37015669 DOI: 10.1109/tnnls.2022.3230648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The performance limitations of multiple-input multiple-output (MIMO) information time delay system (ITDS) with packet loss, codec and white Gaussian noise (WGN) are investigated in this article. By using the spectrum decomposition technique, inner-outer factorization, and partial factorization techniques, the expression of performance limitations is obtained under the two-degree-of-freedom (2DOF) compensator. The theoretical analysis results demonstrate that the system performance is related to the time delay, non-minimum phase (NMP) zeros, unstable zeros and their directions in a given device. In addition, WGN, packet loss and codec also impact the performance. Finally, the theoretical results are verified by simulation examples. Simulation results show that packet loss rate and encoding and decoding have a greater impact on system performance.
Collapse
|
4
|
Cao L, Cheng Z, Liu Y, Li H. Event-Based Adaptive NN Fixed-Time Cooperative Formation for Multiagent Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:6467-6477. [PMID: 36215380 DOI: 10.1109/tnnls.2022.3210269] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
This article focuses on the fixed-time formation control problem for nonlinear multiagent systems (MASs) with dynamic uncertainties and limited communication resources. Under the framework of the backstepping method, a time-varying formation function is introduced in the controller design. To attain the prescribed transient and steady-state performance of MASs, a fixed-time prescribed performance function (FTPPF) is designed and the further coordinate transformation addressing the zero equilibrium point problem is removed. To achieve better approximating performance, a neural network (NN)-based composite dynamic surface control (CDSC) strategy is proposed, where the CDSC scheme is consisted of prediction errors and serial-parallel estimation models. According to the signals generated by the estimation models, disturbance observers are established to overcome the difficulty from approximating errors and mismatched disturbances. Moreover, an improved dynamic event-triggered mechanism and varying threshold parameters are constructed to reduce the signal transmission frequency. Via the Lyapunov stability theory, all the signals in the closed-loop system are semi-globally uniformly ultimately bounded. Finally, the simulation results verify the effectiveness of the developed CDSC strategy.
Collapse
|
5
|
Li B, Chen N, Luo B, Chen J, Yang C, Gui W. ADP-Based Event-Triggered Constrained Optimal Control on Spatiotemporal Process: Application to Temperature Field in Roller Kiln. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:3229-3241. [PMID: 37195852 DOI: 10.1109/tnnls.2023.3267516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
The precise control of the spatiotemporal process in a roller kiln is crucial in the production of Ni-Co-Mn layered cathode material of lithium-ion batteries. Since the product is extremely sensitive to temperature distribution, temperature field control is of great significance. In this article, an event-triggered optimal control (ETOC) method with input constraints for the temperature field is proposed, which takes up an important position in reducing the communication and computation costs. A nonquadratic cost function is adopted to describe the system performance with input constraints. First, we present the problem description of the temperature field event-triggered control, where this field is described by a partial differential equation (PDE). Then, the event-triggered condition is designed according to the information of system states and control inputs. On this basis, a framework of the event-triggered adaptive dynamic programming (ETADP) method that is based on the model reduction technology is proposed for the PDE system. A critic network is used to approach the optimal performance index by a neural network (NN) together with that an actor network is used to optimize the control strategy. Furthermore, an upper bound of the performance index and a lower bound of interexecution times, as well as the stabilities of the impulsive dynamic system and the closed-loop PDE system, are also proved. Simulation verification demonstrates the effectiveness of the proposed method.
Collapse
|
6
|
Wu L, Li Z, Liu S, Li Z, Sun D. An improved compact-form antisaturation model-free adaptive control algorithm for a class of nonlinear systems with time delays. Sci Prog 2023; 106:368504231210361. [PMID: 37933475 PMCID: PMC10631356 DOI: 10.1177/00368504231210361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
To solve the time-delay problem and actuator saturation problem of nonlinear plants in industrial processes, an improved compact-form antisaturation model-free adaptive control (ICF-AS-MFAC) method is proposed in this work. The ICF-AS-MFAC scheme is based on the concept of the pseudo partial derivative (PPD) and adopts equivalent dynamic linearization technology. Then, a tracking differentiator is used to predict the future output of a time-delay system to effectively control the system. Additionally, the concept of the saturation parameter is proposed, and the ICF-AS-MFAC controller is designed to ensure that the control system will not exhibit actuator saturation. The proposed algorithm is more flexible, has faster output responses for time-delay systems, and solves the problem of actuator saturation. The convergence and stability of the proposed method are rigorously proven mathematically. The effectiveness of the proposed method is verified by numerical simulations, and the applicability of the proposed method is verified by a series of experimental results based on double tanks.
Collapse
Affiliation(s)
- Lipu Wu
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
| | - Zhen Li
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
| | - Shida Liu
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
| | - Zhijun Li
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
| | - Dehui Sun
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
| |
Collapse
|
7
|
Zhang H, Ming Z, Yan Y, Wang W. Data-Driven Finite-Horizon H ∞ Tracking Control With Event-Triggered Mechanism for the Continuous-Time Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:4687-4701. [PMID: 34633936 DOI: 10.1109/tnnls.2021.3116464] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, the neural network (NN)-based adaptive dynamic programming (ADP) event-triggered control method is presented to obtain the near-optimal control policy for the model-free finite-horizon H∞ optimal tracking control problem with constrained control input. First, using available input-output data, a data-driven model is established by a recurrent NN (RNN) to reconstruct the unknown system. Then, an augmented system with event-triggered mechanism is obtained by a tracking error system and a command generator. We present a novel event-triggering condition without Zeno behavior. On this basis, the relationship between event-triggered Hamilton-Jacobi-Isaacs (HJI) equation and time-triggered HJI equation is given in Theorem 3. Since the solution of the HJI equation is time-dependent for the augmented system, the time-dependent activation functions of NNs are considered. Moreover, an extra error is incorporated to satisfy the terminal constraints of cost function. This adaptive control pattern finds, in real time, approximations of the optimal value while also ensuring the uniform ultimate boundedness of the closed-loop system. Finally, the effectiveness of the proposed near-optimal control pattern is verified by two simulation examples.
Collapse
|
8
|
Li M, Wang D, Zhao M, Qiao J. Event-triggered constrained neural critic control of nonlinear continuous-time multiplayer nonzero-sum games. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.02.081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2023]
|
9
|
Yang Y, Modares H, Vamvoudakis KG, He W, Xu CZ, Wunsch DC. Hamiltonian-Driven Adaptive Dynamic Programming With Approximation Errors. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13762-13773. [PMID: 34495864 DOI: 10.1109/tcyb.2021.3108034] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, we consider an iterative adaptive dynamic programming (ADP) algorithm within the Hamiltonian-driven framework to solve the Hamilton-Jacobi-Bellman (HJB) equation for the infinite-horizon optimal control problem in continuous time for nonlinear systems. First, a novel function, "min-Hamiltonian," is defined to capture the fundamental properties of the classical Hamiltonian. It is shown that both the HJB equation and the policy iteration (PI) algorithm can be formulated in terms of the min-Hamiltonian within the Hamiltonian-driven framework. Moreover, we develop an iterative ADP algorithm that takes into consideration the approximation errors during the policy evaluation step. We then derive a sufficient condition on the iterative value gradient to guarantee closed-loop stability of the equilibrium point as well as convergence to the optimal value. A model-free extension based on an off-policy reinforcement learning (RL) technique is also provided. Finally, numerical results illustrate the efficacy of the proposed framework.
Collapse
|
10
|
Liu X, Xu B, Shou Y, Fan QY, Chen Y. Event-Triggered Adaptive Control of Uncertain Nonlinear Systems With Composite Condition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6030-6037. [PMID: 33961566 DOI: 10.1109/tnnls.2021.3072107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article concentrates on the event-based collaborative design for strict-feedback systems with uncertain nonlinearities. The controller is designed based on neural network (NN) weights adaptive law. The controller and NN weights adaptive law are only updated at the triggering instants determined by a novel composite triggering threshold. Considering the conservativeness of event condition, the state-model error is integrated into constructing the composite condition and NN weights adaptive law. In the context of the proposed mechanism, the requirements of system information and the allowable range of event-triggering error are relaxed. The number of triggering instants is greatly reduced without deteriorating the system performance. Moreover, the stability of the closed-loop is proved by the Lyapunov method following time-interval and sampling instants. Simulation results show the effectiveness of the scheme proposed in this article.
Collapse
|
11
|
Wang K, Mu C. Asynchronous learning for actor-critic neural networks and synchronous triggering for multiplayer system. ISA TRANSACTIONS 2022; 129:295-308. [PMID: 35216805 DOI: 10.1016/j.isatra.2022.02.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 01/19/2022] [Accepted: 02/05/2022] [Indexed: 06/14/2023]
Abstract
In this paper, based on actor-critic neural network structure and reinforcement learning scheme, a novel asynchronous learning algorithm with event communication is developed, so as to solve Nash equilibrium of multiplayer nonzero-sum differential game in an adaptive fashion. From the point of optimal control view, each player or local controller wants to minimize the individual infinite-time cost function by finding an optimal policy. In this novel learning framework, each player consists of one critic and one actor, and implements distributed asynchronous policy iteration to optimize decision-making process. In addition, communication burden between the system and players is effectively reduced by setting up a central event generator. Critic network executes fast updates by gradient-descent adaption while actor network gives event-induced updates using the gradient projection. The closed-loop asymptotic stability is ensured along with uniform ultimate convergence. Then, the effectiveness of the proposed algorithm is substantiated on a four-player nonlinear system, revealing that it can significantly reduce sampling numbers without impairing learning accuracy. Finally, by leveraging nonzero-sum game idea, the proposed learning scheme is also applied to solve the lateral-directional stability of a linear aircraft system, and is further extended to a nonlinear vehicle system for achieving adaptive cruise control.
Collapse
Affiliation(s)
- Ke Wang
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China.
| | - Chaoxu Mu
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China.
| |
Collapse
|
12
|
Fan QY, Wang D, Xu B. H ∞ Codesign for Uncertain Nonlinear Control Systems Based on Policy Iteration Method. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:10101-10110. [PMID: 33877997 DOI: 10.1109/tcyb.2021.3065995] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, the problem of H∞ codesign for nonlinear control systems with unmatched uncertainties and adjustable parameters is investigated. The main purpose is to solve the adjustable parameters and H∞ controller simultaneously so that better robust control performance can be achieved. By introducing a bounded function and defining a special cost function, the problem of solving the Hamilton-Jacobi-Isaacs equation is transformed into an optimization problem with nonlinear inequality constraints. Based on the sum of squares technique, a novel policy iteration algorithm is proposed to solve the problem of the H∞ codesign. Moreover, one modified algorithm for optimizing the robust performance index is given. The convergence and the performance improvement of new iteration policy algorithms are proved. Simulation results are presented to demonstrate the effectiveness of the proposed algorithms.
Collapse
|
13
|
Yang X, Zhu Y, Dong N, Wei Q. Decentralized Event-Driven Constrained Control Using Adaptive Critic Designs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5830-5844. [PMID: 33861716 DOI: 10.1109/tnnls.2021.3071548] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We study the decentralized event-driven control problem of nonlinear dynamical systems with mismatched interconnections and asymmetric input constraints. To begin with, by introducing a discounted cost function for each auxiliary subsystem, we transform the decentralized event-driven constrained control problem into a group of nonlinear H2 -constrained optimal control problems. Then, we develop the event-driven Hamilton-Jacobi-Bellman equations (ED-HJBEs), which arise in the nonlinear H2 -constrained optimal control problems. Meanwhile, we demonstrate that all the solutions of the ED-HJBEs together keep the overall system stable in the sense of uniform ultimate boundedness (UUB). To solve the ED-HJBEs, we build a critic-only architecture under the framework of adaptive critic designs. The architecture only employs critic neural networks and updates their weight vectors via the gradient descent method. After that, based on the Lyapunov approach, we prove that the UUB stability of all signals in the closed-loop auxiliary subsystems is assured. Finally, simulations of an illustrated nonlinear interconnected plant are provided to validate the present designs.
Collapse
|
14
|
Li H, Chen Y, Zhang Q, Zhao D. BiFNet: Bidirectional Fusion Network for Road Segmentation. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:8617-8628. [PMID: 34469325 DOI: 10.1109/tcyb.2021.3105488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multisensor fusion-based road segmentation plays an important role in the intelligent driving system since it provides a drivable area. The existing mainstream fusion method is mainly to feature fusion in the image space domain which causes the perspective compression of the road and damages the performance of the distant road. Considering the bird's eye views (BEVs) of the LiDAR remains the space structure in the horizontal plane, this article proposes a bidirectional fusion network (BiFNet) to fuse the image and BEV of the point cloud. The network consists of two modules: 1) the dense space transformation (DST) module, which solves the mutual conversion between the camera image space and BEV space and 2) the context-based feature fusion module, which fuses the different sensors information based on the scenes from corresponding features. This method has achieved competitive results on the KITTI dataset.
Collapse
|
15
|
Ran M, Li J, Xie L. Reinforcement-Learning-Based Disturbance Rejection Control for Uncertain Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9621-9633. [PMID: 33729973 DOI: 10.1109/tcyb.2021.3060736] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article investigates the reinforcement-learning (RL)-based disturbance rejection control for uncertain nonlinear systems having nonsimple nominal models. An extended state observer (ESO) is first designed to estimate the system state and the total uncertainty, which represents the perturbation to the nominal system dynamics. Based on the output of the observer, the control compensates for the total uncertainty in real time, and simultaneously, online approximates the optimal policy for the compensated system using a simulation of experience-based RL technique. Rigorous theoretical analysis is given to show the practical convergence of the system state to the origin and the developed policy to the ideal optimal policy. It is worth mentioning that the widely used restrictive persistence of excitation (PE) condition is not required in the established framework. Simulation results are presented to illustrate the effectiveness of the proposed method.
Collapse
|
16
|
Zhao Q, Si J, Sun J. Online Reinforcement Learning Control by Direct Heuristic Dynamic Programming: From Time-Driven to Event-Driven. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4139-4144. [PMID: 33534714 DOI: 10.1109/tnnls.2021.3053037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this work, time-driven learning refers to the machine learning method that updates parameters in a prediction model continuously as new data arrives. Among existing approximate dynamic programming (ADP) and reinforcement learning (RL) algorithms, the direct heuristic dynamic programming (dHDP) has been shown an effective tool as demonstrated in solving several complex learning control problems. It continuously updates the control policy and the critic as system states continuously evolve. It is therefore desirable to prevent the time-driven dHDP from updating due to insignificant system event such as noise. Toward this goal, we propose a new event-driven dHDP. By constructing a Lyapunov function candidate, we prove the uniformly ultimately boundedness (UUB) of the system states and the weights in the critic and the control policy networks. Consequently, we show the approximate control and cost-to-go function approaching Bellman optimality within a finite bound. We also illustrate how the event-driven dHDP algorithm works in comparison to the original time-driven dHDP.
Collapse
|
17
|
Zhao Q, Sun J, Wang G, Chen J. Event-Triggered ADP for Nonzero-Sum Games of Unknown Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1905-1913. [PMID: 33882002 DOI: 10.1109/tnnls.2021.3071545] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
For nonzero-sum (NZS) games of nonlinear systems, reinforcement learning (RL) or adaptive dynamic programming (ADP) has shown its capability of approximating the desired index performance and the optimal input policy iteratively. In this article, an event-triggered ADP is proposed for NZS games of continuous-time nonlinear systems with completely unknown system dynamics. To achieve the Nash equilibrium solution approximately, the critic neural networks and actor neural networks are utilized to estimate the value functions and the control policies, respectively. Compared with the traditional time-triggered mechanism, the proposed algorithm updates the neural network weights as well as the inputs of players only when a state-based event-triggered condition is violated. It is shown that the system stability and the weights' convergence are still guaranteed under mild assumptions, while occupation of communication and computation resources is considerably reduced. Meanwhile, the infamous Zeno behavior is excluded by proving the existence of a minimum inter-event time (MIET) to ensure the feasibility of the closed-loop event-triggered continuous-time system. Finally, a numerical example is simulated to illustrate the effectiveness of the proposed approach.
Collapse
|
18
|
Mu C, Wang K, Qiu T. Dynamic Event-Triggering Neural Learning Control for Partially Unknown Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:2200-2213. [PMID: 32697728 DOI: 10.1109/tcyb.2020.3004493] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article presents an event-sampled integral reinforcement learning algorithm for partially unknown nonlinear systems using a novel dynamic event-triggering strategy. This is a novel attempt to introduce the dynamic triggering into the adaptive learning process. The core of this algorithm is the policy iteration technique, which is implemented by two neural networks. A critic network is periodically tuned using the integral reinforcement signal, and an actor network adopts the event-based communication to update the control policy only at triggering instants. For overcoming the deficiency of static triggering, a dynamic triggering rule is proposed to determine the occurrence of events, in which an internal dynamic variable characterized by a first-order filter is defined. Theoretical results indicate that the impulsive system driven by events is asymptotically stable, the network weight is convergent, and the Zeno behavior is successfully avoided. Finally, three examples are provided to demonstrate that the proposed dynamic triggering algorithm can reduce samples and transmissions even more, with guaranteed learning performance.
Collapse
|
19
|
|
20
|
Disturbance-Improved Model-Free Adaptive Prediction Control for Discrete-Time Nonlinear Systems with Time Delay. Symmetry (Basel) 2021. [DOI: 10.3390/sym13112128] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This study proposes a Disturbance-improved Model-free Adaptive Prediction Control (DMFAPC) algorithm for a discrete-time nonlinear system with time delay and disturbance. The algorithm is shown to have good robustness. On the one hand, the Smith predictor is used to predict the output at a future time to eliminate the time delay in the system; on the other hand, an attenuation factor is introduced at the input to effectively eliminate the measurement disturbance. The proposed algorithm is a data-driven control algorithm that does not require the model information of the controlled system; it only requires the input and output data. The convergence of the DMFAPC is analyzed. Simulation results confirm the effectiveness of this algorithm.
Collapse
|
21
|
Zhao F, Gao W, Jiang ZP, Liu T. Event-Triggered Adaptive Optimal Control With Output Feedback: An Adaptive Dynamic Programming Approach. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:5208-5221. [PMID: 33035169 DOI: 10.1109/tnnls.2020.3027301] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article presents an event-triggered output-feedback adaptive optimal control method for continuous-time linear systems. First, it is shown that the unmeasurable states can be reconstructed by using the measured input and output data. An event-based feedback strategy is then proposed to reduce the number of controller updates and save communication resources. The discrete-time algebraic Riccati equation is iteratively solved through event-triggered adaptive dynamic programming based on both policy iteration (PI) and value iteration (VI) methods. The convergence of the proposed algorithm and the closed-loop stability is carried out by using the Lyapunov techniques. Two numerical examples are employed to verify the effectiveness of the design methodology.
Collapse
|
22
|
Yang X, He H. Event-Driven H ∞-Constrained Control Using Adaptive Critic Learning. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4860-4872. [PMID: 32112694 DOI: 10.1109/tcyb.2020.2972748] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article considers an event-driven H∞ control problem of continuous-time nonlinear systems with asymmetric input constraints. Initially, the H∞ -constrained control problem is converted into a two-person zero-sum game with the discounted nonquadratic cost function. Then, we present the event-driven Hamilton-Jacobi-Isaacs equation (HJIE) associated with the two-person zero-sum game. Meanwhile, we develop a novel event-triggering condition making Zeno behavior excluded. The present event-triggering condition differs from the existing literature in that it can make the triggering threshold non-negative without the requirement of properly selecting the prescribed level of disturbance attenuation. After that, under the framework of adaptive critic learning, we use a single critic network to solve the event-driven HJIE and tune its weight parameters by using historical and instantaneous state data simultaneously. Based on the Lyapunov approach, we demonstrate that the uniform ultimate boundedness of all the signals in the closed-loop system is guaranteed. Finally, simulations of a nonlinear plant are presented to validate the developed event-driven H∞ control strategy.
Collapse
|
23
|
Online event-based adaptive critic design with experience replay to solve partially unknown multi-player nonzero-sum games. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
24
|
Yang Y, Fan X, Xu C, Wu J, Sun B. State consensus cooperative control for a class of nonlinear multi-agent systems with output constraints via ADP approach. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.046] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
25
|
Wang N, Gao Y, Zhao H, Ahn CK. Reinforcement Learning-Based Optimal Tracking Control of an Unknown Unmanned Surface Vehicle. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3034-3045. [PMID: 32745008 DOI: 10.1109/tnnls.2020.3009214] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, a novel reinforcement learning-based optimal tracking control (RLOTC) scheme is established for an unmanned surface vehicle (USV) in the presence of complex unknowns, including dead-zone input nonlinearities, system dynamics, and disturbances. To be specific, dead-zone nonlinearities are decoupled to be input-dependent sloped controls and unknown biases that are encapsulated into lumped unknowns within tracking error dynamics. Neural network (NN) approximators are further deployed to adaptively identify complex unknowns and facilitate a Hamilton-Jacobi-Bellman (HJB) equation that formulates optimal tracking. In order to derive a practically optimal solution, an actor-critic reinforcement learning framework is built by employing adaptive NN identifiers to recursively approximate the total optimal policy and cost function. Eventually, theoretical analysis shows that the entire RLOTC scheme can render tracking errors that converge to an arbitrarily small neighborhood of the origin, subject to optimal cost. Simulation results and comprehensive comparisons on a prototype USV demonstrate remarkable effectiveness and superiority.
Collapse
|
26
|
Ma B, Li Y, An T, Dong B. Compensator-critic structure-based neuro-optimal control of modular robot manipulators with uncertain environmental contacts using non-zero-sum games. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107100] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
27
|
Xue S, Luo B, Liu D. Event-Triggered Adaptive Dynamic Programming for Unmatched Uncertain Nonlinear Continuous-Time Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2939-2951. [PMID: 32721899 DOI: 10.1109/tnnls.2020.3009015] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, an event-triggered adaptive dynamic programming (ADP) method is proposed to solve the robust control problem of unmatched uncertain systems. First, the robust control problem with unmatched uncertainties is transformed into the optimal control design for an auxiliary system. Subsequently, to reduce controller executions and save computational and communication resources, an event-triggering mechanism is introduced. By using a critic neural network (NN) to approximate the value function, novel concurrent learning is developed to learn NN weights, which avoids the requirement of an initial admissible control and the persistence of excitation condition. Moreover, it is proven that the developed event-triggered ADP controller guarantees the robustness of the uncertain system and the uniform ultimate boundedness of the NN weight estimation error. Finally, by using the F-16 aircraft and the inverted pendulum with unmatched uncertainties as examples, the simulation results show the effectiveness of the developed event-triggered ADP method.
Collapse
|
28
|
Zhao B, Liu D, Alippi C. Sliding-Mode Surface-Based Approximate Optimal Control for Uncertain Nonlinear Systems With Asymptotically Stable Critic Structure. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2858-2869. [PMID: 31945008 DOI: 10.1109/tcyb.2019.2962011] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article develops a novel sliding-mode surface (SMS)-based approximate optimal control scheme for a large class of nonlinear systems affected by unknown mismatched perturbations. The observer-based perturbation estimation procedure is employed to establish the online updated value function. The solution to the Hamilton-Jacobi-Bellman equation is approximated by an SMS-based critic neural network whose weights error dynamics is designed to be asymptotically stable by nested update laws. The sliding-mode control strategy is combined with the approximate optimal control design procedure to obtain a faster control action. The stability is proved based on the Lyapunov's direct method. The simulation results show the effectiveness of the developed control scheme.
Collapse
|
29
|
Zhang S, Zhao B, Zhang Y. Event-triggered control for input constrained non-affine nonlinear systems based on neuro-dynamic programming. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.01.116] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
30
|
Sliding mode-based online fault compensation control for modular reconfigurable robots through adaptive dynamic programming. COMPLEX INTELL SYST 2021. [DOI: 10.1007/s40747-021-00364-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractIn this paper, a sliding mode (SM)-based online fault compensation control scheme is investigated for modular reconfigurable robots (MRRs) with actuator failures via adaptive dynamic programming. It consists of a SM-based iterative controller, an adaptive robust term and an online fault compensator. For fault-free MRR systems, the SM surface-based Hamilton–Jacobi–Bellman equation is solved by online policy iteration algorithm. The adaptive robust term is added to guarantee the reachable condition of SM surface. For faulty MRR systems, the actuator failure is compensated online to avoid the fault detection and isolation mechanism. The closed-loop MRR system is guaranteed to be asymptotically stable under the developed fault compensation control scheme. Simulation results verify the effectiveness of the present fault compensation control approach.
Collapse
|
31
|
Ma B, Li Y. Compensator-critic structure-based event-triggered decentralized tracking control of modular robot manipulators: theory and experimental verification. COMPLEX INTELL SYST 2021. [DOI: 10.1007/s40747-021-00359-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
AbstractThis paper presents a novel compensator-critic structure-based event-triggered decentralized tracking control of modular robot manipulators (MRMs). On the basis of subsystem dynamics under joint torque feedback (JTF) technique, the proposed tracking error fusion function, which includes position error and velocity error, is utilized to construct performance index function. By analyzing the dynamic uncertainties, a local dynamic information-based robust controller is designed to engage the model uncertainty compensation. Based on adaptive dynamic programming (ADP) algorithm and the event-triggered mechanism, the decentralized tracking control is obtained by solving the event-triggered Hamilton–Jacobi–Bellman equation (HJBE) with the critic neural network (NN). The tracking error of the closed-loop manipulators system is proved to be ultimately uniformly bounded (UUB) using the Lyapunov stability theorem. Finally, experimental results illustrate the effectiveness of the developed control method.
Collapse
|
32
|
Wang L, Chen CLP. Reduced-Order Observer-Based Dynamic Event-Triggered Adaptive NN Control for Stochastic Nonlinear Systems Subject to Unknown Input Saturation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:1678-1690. [PMID: 32452775 DOI: 10.1109/tnnls.2020.2986281] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, a dynamic event-triggered control scheme for a class of stochastic nonlinear systems with unknown input saturation and partially unmeasured states is presented. First, a dynamic event-triggered mechanism (DEM) is designed to reduce some unnecessary transmissions from controller to actuator so as to achieve better resource efficiency. Unlike most existing event-triggered mechanisms, in which the threshold parameters are always fixed, the threshold parameter in the developed event-triggered condition is dynamically adjusted according to a dynamic rule. Second, an improved neural network that considers the reconstructed error is introduced to approximate the unknown nonlinear terms existed in the considered systems. Third, an auxiliary system with the same order as the considered system is constructed to deal with the influence of asymmetric input saturation, which is distinct from most existing methods for nonlinear systems with input saturation. Assuming that the partial state is unavailable in the system, a reduced-order observer is presented to estimate them. Furthermore, it is theoretically proven that the obtained control scheme can achieve the desired objects. Finally, a one-link manipulator system and a three-degree-of-freedom ship maneuvering system are presented to illustrate the effectiveness of the proposed control method.
Collapse
|
33
|
Sun J, Long T. Event-triggered distributed zero-sum differential game for nonlinear multi-agent systems using adaptive dynamic programming. ISA TRANSACTIONS 2021; 110:39-52. [PMID: 33127079 DOI: 10.1016/j.isatra.2020.10.043] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 09/15/2020] [Accepted: 10/13/2020] [Indexed: 06/11/2023]
Abstract
In this paper, to reduce the computational and communication burden, the event-triggered distributed zero-sum differential game problem for multi-agent systems is investigated. Firstly, based on the Minimax principle, an adaptive event-triggered distributed iterative differential game strategy is derived with an adaptive triggering condition for updating the control scheme aperiodically. Then, to implement this proposed strategy, the solution of coupled Hamilton-Jacobi-Isaacs (HJI) equation is approximated by constructing the critic neural network (NN). In order to further relax the restrictive persistent of excitation (PE) condition, a novel PE-free updating law is designed by using the experience replay method. Then, the distributed event-triggered nonlinear system is expressed as an impulsive dynamical system. After analyzing the stability, the developed strategy ensures the uniformly ultimately bounded (UUB) of all the closed-loop signals. Moreover, the minimal intersample time is proved to be lower bounded, which avoids the infamous Zeno behavior. Finally, the simulation results show that the number of controller update is reduced obviously, which saves the computational and communication resources.
Collapse
Affiliation(s)
- Jingliang Sun
- School of Aerospace Engineering, Beijing Institute of Technology, Beijing, 100081, China; Key Laboratory of Dynamics and Control of Flight Vehicle, Ministry of Education China, Beijing, 100081, China
| | - Teng Long
- School of Aerospace Engineering, Beijing Institute of Technology, Beijing, 100081, China; Key Laboratory of Dynamics and Control of Flight Vehicle, Ministry of Education China, Beijing, 100081, China.
| |
Collapse
|
34
|
Zhang Z, Ong YS, Wang D, Xue B. A Collaborative Multiagent Reinforcement Learning Method Based on Policy Gradient Potential. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1015-1027. [PMID: 31443061 DOI: 10.1109/tcyb.2019.2932203] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Gradient-based method has been extensively used in today's multiagent reinforcement learning (MARL). In a gradient-based MARL algorithm, each agent updates its parameterized strategy in the direction of the gradient of some performance index. However, studies on the convergence of the existing gradient-based MARL algorithms for identical interest games are quite few. In this article, we propose a policy gradient potential (PGP) algorithm that takes PGP as the source of information for guiding the strategy update, as opposed to the gradient itself, to learn the optimal joint strategy that has a maximal global reward. Since the payoff matrix and the joint strategy are often unavailable to the learning agents in reality, we consider the probability of obtaining the maximal reward as the performance index. Theoretical analysis of the PGP algorithm on the continuous model involving an identical interest repeated game shows that if the component action of every optimal joint action is unique, the critical points corresponding to all optimal joint actions are asymptotically stable. The PGP algorithm is experimentally studied and compared against other MARL algorithms on two commonly used collaborative tasks-the robots leaving a room task and the distributed sensor network task, as well as a real-world minefield navigation problem where only local state and local reward information are available. The results show that the PGP algorithm outperforms the other algorithms in terms of the cumulative reward and the number of time steps used in an episode.
Collapse
|
35
|
Yang X, Wei Q. Adaptive Critic Learning for Constrained Optimal Event-Triggered Control With Discounted Cost. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:91-104. [PMID: 32167914 DOI: 10.1109/tnnls.2020.2976787] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article studies an optimal event-triggered control (ETC) problem of nonlinear continuous-time systems subject to asymmetric control constraints. The present nonlinear plant differs from many studied systems in that its equilibrium point is nonzero. First, we introduce a discounted cost for such a system in order to obtain the optimal ETC without making coordinate transformations. Then, we present an event-triggered Hamilton-Jacobi-Bellman equation (ET-HJBE) arising in the discounted-cost constrained optimal ETC problem. After that, we propose an event-triggering condition guaranteeing a positive lower bound for the minimal intersample time. To solve the ET-HJBE, we construct a critic network under the framework of adaptive critic learning. The critic network weight vector is tuned through a modified gradient descent method, which simultaneously uses historical and instantaneous state data. By employing the Lyapunov method, we prove that the uniform ultimate boundedness of all signals in the closed-loop system is guaranteed. Finally, we provide simulations of a pendulum system and an oscillator system to validate the obtained optimal ETC strategy.
Collapse
|
36
|
Yang Y, Vamvoudakis KG, Modares H, Yin Y, Wunsch DC. Safe Intermittent Reinforcement Learning With Static and Dynamic Event Generators. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5441-5455. [PMID: 32054590 DOI: 10.1109/tnnls.2020.2967871] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, we present an intermittent framework for safe reinforcement learning (RL) algorithms. First, we develop a barrier function-based system transformation to impose state constraints while converting the original problem to an unconstrained optimization problem. Second, based on optimal derived policies, two types of intermittent feedback RL algorithms are presented, namely, a static and a dynamic one. We finally leverage an actor/critic structure to solve the problem online while guaranteeing optimality, stability, and safety. Simulation results show the efficacy of the proposed approach.
Collapse
|
37
|
Song R, Liu L. Event-triggered constrained robust control for partly-unknown nonlinear systems via ADP. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.012] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
38
|
Event-driven H ∞ control with critic learning for nonlinear systems. Neural Netw 2020; 132:30-42. [PMID: 32861146 DOI: 10.1016/j.neunet.2020.08.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 08/03/2020] [Accepted: 08/10/2020] [Indexed: 11/22/2022]
Abstract
In this paper, we study an event-driven H∞ control problem of continuous-time nonlinear systems. Initially, with the introduction of a discounted cost function, we convert the nonlinear H∞ control problem into an event-driven nonlinear two-player zero-sum game. Then, we develop an event-driven Hamilton-Jacobi-Isaacs equation (HJIE) related to the two-player zero-sum game. After that, we propose a novel event-triggering condition guaranteeing Zeno behavior not to happen. The triggering threshold in the newly proposed event-triggering condition can be kept positive without requiring to properly choose the prescribed level of disturbance attenuation. To solve the event-driven HJIE, we employ an adaptive critic architecture which contains a unique critic neural network (NN). The weight parameters used in the critic NN are tuned via the gradient descent method. After that, we carry out stability analysis of the hybrid closed-loop system based on Lyapunov's direct approach. Finally, we provide two nonlinear plants, including the pendulum system, to validate the proposed event-driven H∞ control scheme.
Collapse
|
39
|
Adaptive resilient control of a class of nonlinear systems based on event-triggered mechanism. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.061] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
40
|
Dastres H, Rezaie B, Baigzadehnoe B. Neural-network-based adaptive backstepping control for a class of unknown nonlinear time-delay systems with unknown input saturation. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.02.070] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
41
|
Zhao B, Shi G, Wang D. Asymptotically stable critic designs for approximate optimal stabilization of nonlinear systems subject to mismatched external disturbances. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2018.08.092] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
42
|
Li H, Zhang Q, Zhao D. Deep Reinforcement Learning-Based Automatic Exploration for Navigation in Unknown Environment. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2064-2076. [PMID: 31398138 DOI: 10.1109/tnnls.2019.2927869] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This paper investigates the automatic exploration problem under the unknown environment, which is the key point of applying the robotic system to some social tasks. The solution to this problem via stacking decision rules is impossible to cover various environments and sensor properties. Learning-based control methods are adaptive for these scenarios. However, these methods are damaged by low learning efficiency and awkward transferability from simulation to reality. In this paper, we construct a general exploration framework via decomposing the exploration process into the decision, planning, and mapping modules, which increases the modularity of the robotic system. Based on this framework, we propose a deep reinforcement learning-based decision algorithm that uses a deep neural network to learning exploration strategy from the partial map. The results show that this proposed algorithm has better learning efficiency and adaptability for unknown environments. In addition, we conduct the experiments on the physical robot, and the results suggest that the learned policy can be well transferred from simulation to the real robot.
Collapse
|
43
|
Wang S, Yu H, Yu J, Na J, Ren X. Neural-Network-Based Adaptive Funnel Control for Servo Mechanisms With Unknown Dead-Zone. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:1383-1394. [PMID: 30387759 DOI: 10.1109/tcyb.2018.2875134] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper proposes an adaptive funnel control (FC) scheme for servo mechanisms with an unknown dead-zone. To improve the transient and steady-state performance, a modified funnel variable, which relaxes the limitation of the original FC (e.g., systems with relative degree 1 or 2), is developed using the tracking error to replace the scaling factor. Then, by applying the error transformation method, the original error is transformed into a new error variable which is used in the controller design. By using an improved funnel function in a dynamic surface control procedure, an adaptive funnel controller is proposed to guarantee that the output error remains within a predefined funnel boundary. A novel command filter technique is introduced by using the Levant differentiator to eliminate the "explosion of complexity" problem in the conventional backstepping procedure. Neural networks are used to approximate the unknown dead-zone and unknown nonlinear functions. Comparative experiments on a turntable servo mechanism confirm the effectiveness of the devised control method.
Collapse
|
44
|
Luo B, Yang Y, Liu D, Wu HN. Event-Triggered Optimal Control With Performance Guarantees Using Adaptive Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:76-88. [PMID: 30892242 DOI: 10.1109/tnnls.2019.2899594] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This paper studies the problem of event-triggered optimal control (ETOC) for continuous-time nonlinear systems and proposes a novel event-triggering condition that enables designing ETOC methods directly based on the solution of the Hamilton-Jacobi-Bellman (HJB) equation. We provide formal performance guarantees by proving a predetermined upper bound. Moreover, we also prove the existence of a lower bound for interexecution time. For implementation purposes, an adaptive dynamic programming (ADP) method is developed to realize the ETOC using a critic neural network (NN) to approximate the value function of the HJB equation. Subsequently, we prove that semiglobal uniform ultimate boundedness can be guaranteed for states and NN weight errors with the ADP-based ETOC. Simulation results demonstrate the effectiveness of the developed ADP-based ETOC method.
Collapse
|
45
|
Event-triggered H∞ optimal control for continuous-time nonlinear systems using neurodynamic programming. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.06.090] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
46
|
Yang X, He H. Adaptive Critic Designs for Event-Triggered Robust Control of Nonlinear Systems With Unknown Dynamics. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:2255-2267. [PMID: 29993650 DOI: 10.1109/tcyb.2018.2823199] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper develops a novel event-triggered robust control strategy for continuous-time nonlinear systems with unknown dynamics. To begin with, the event-triggered robust nonlinear control problem is transformed into an event-triggered nonlinear optimal control problem by introducing an infinite-horizon integral cost for the nominal system. Then, a recurrent neural network (RNN) and adaptive critic designs (ACDs) are employed to solve the derived event-triggered nonlinear optimal control problem. The RNN is applied to reconstruct the system dynamics based on collected system data. After acquiring the knowledge of system dynamics, a unique critic network is proposed to obtain the approximate solution of the event-triggered Hamilton-Jacobi-Bellman equation within the framework of ACDs. The critic network is updated by using simultaneously historical and instantaneous state data. An advantage of the present critic network update law is that it can relax the persistence of excitation condition. Meanwhile, under a newly developed event-triggering condition, the proposed critic network tuning rule not only guarantees the critic network weights to converge to optimums but also ensures nominal system states to be uniformly ultimately bounded. Moreover, by using Lyapunov method, it is proved that the derived optimal event-triggered control (ETC) guarantees uniform ultimate boundedness of all the signals in the original system. Finally, a nonlinear oscillator and an unstable power system are provided to validate the developed robust ETC scheme.
Collapse
|
47
|
Yang D, Li T, Zhang H, Xie X. Event-trigger-based robust control for nonlinear constrained-input systems using reinforcement learning method. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.02.034] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
48
|
Shao K, Zhu Y, Zhao D. StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2019. [DOI: 10.1109/tetci.2018.2823329] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
49
|
Wang D, Liu D. Learning and Guaranteed Cost Control With Event-Based Adaptive Critic Implementation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:6004-6014. [PMID: 29993846 DOI: 10.1109/tnnls.2018.2817256] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper focuses on the event-triggered guaranteed cost control design of nonlinear systems via a self-learning technique. In brief, an event-based guaranteed cost control strategy of nonlinear systems subjects to matched uncertainties is developed, thereby balancing the performance of guaranteed cost and the actuality of limited communication resource. The original control design is transformed into an optimal control problem with an event-based mechanism, where the relationship of guaranteed cost performance compared to the time-based formulation is discussed. A critic neural network is constructed for implementing the event-based optimal control design with stability guarantee. Simulation experiments are carried out to verify the theoretical results in detail.
Collapse
|
50
|
Sun M, Wu T, Chen L, Zhang G. Neural AILC for Error Tracking Against Arbitrary Initial Shifts. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2705-2716. [PMID: 28534792 DOI: 10.1109/tnnls.2017.2698507] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper concerns with the adaptive iterative learning control using neural networks for systems performing repetitive tasks over a finite time interval. Two standing issues of such iterative learning control processes are addressed: one is the initial condition problem and the other is that related to the approximation error. Instead of the state tracking, an error tracking approach is proposed to tackle the problem arising from arbitrary initial shifts. The desired error trajectory is prespecified at the design stage, suitable to different tracking tasks. The initial value of the desired error trajectory for each cycle is required to be the same as that of the actual error trajectory. It is just a requirement for the initial value of the desired error trajectory, but does not pose any requirement for the initial value of the actual error trajectory. It is shown that the actual error trajectory is adjustable and is able to converge to a prespecified neighborhood of the origin, while all variables of the closed-loop system are of uniform boundedness. The robustness improvement in case of nonzero approximation error is made possible due to the use of a deadzone modified Lyapunov functional. The resultant estimation for the bound of the approximation error avoids deterioration in tracking performance. The effectiveness of the designed learning controller is validated through an illustrative example.
Collapse
|