1
|
Xue S, Zhang W, Luo B, Liu D. Integral Reinforcement Learning-Based Dynamic Event-Triggered Nonzero-Sum Games of USVs. IEEE TRANSACTIONS ON CYBERNETICS 2025; 55:1706-1716. [PMID: 40031610 DOI: 10.1109/tcyb.2025.3533139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
In this article, an integral reinforcement learning (IRL) method is developed for dynamic event-triggered nonzero-sum (NZS) games to achieve the Nash equilibrium of unmanned surface vehicles (USVs) with state and input constraints. Initially, a mapping function is designed to map the state and control of the USV into a safe environment. Subsequently, IRL-based coupled Hamilton-Jacobi equations, which avoid dependence on system dynamics, are derived to solve the Nash equilibrium. To conserve computational resources and reduce network transmission burdens, a static event-triggered control is initially designed, followed by the development of a more flexible dynamic form. Finally, a critic neural network is designed for each player to approximate its value function and control policy. Rigorous proofs are provided for the uniform ultimate boundedness of the state and the weight estimation errors. The effectiveness of the present method is demonstrated through simulation experiments.
Collapse
|
2
|
Song R, Yang G, Lewis FL. Nearly Optimal Control for Mixed Zero-Sum Game Based on Off-Policy Integral Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2793-2804. [PMID: 35877793 DOI: 10.1109/tnnls.2022.3191847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this article, we solve a class of mixed zero-sum game with unknown dynamic information of nonlinear system. A policy iterative algorithm that adopts integral reinforcement learning (IRL), which does not depend on system information, is proposed to obtain the optimal control of competitor and collaborators. An adaptive update law that combines critic-actor structure with experience replay is proposed. The actor function not only approximates optimal control of every player but also estimates auxiliary control, which does not participate in the actual control process and only exists in theory. The parameters of the actor-critic structure are simultaneously updated. Then, it is proven that the parameter errors of the polynomial approximation are uniformly ultimately bounded. Finally, the effectiveness of the proposed algorithm is verified by two given simulations.
Collapse
|
3
|
Wang Z, Chen C, Dong D. A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7509-7520. [PMID: 35580095 DOI: 10.1109/tcyb.2022.3170485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
While reinforcement learning (RL) algorithms are achieving state-of-the-art performance in various challenging tasks, they can easily encounter catastrophic forgetting or interference when faced with lifelong streaming information. In this article, we propose a scalable lifelong RL method that dynamically expands the network capacity to accommodate new knowledge while preventing past memories from being perturbed. We use a Dirichlet process mixture to model the nonstationary task distribution, which captures task relatedness by estimating the likelihood of task-to-cluster assignments and clusters the task models in a latent space. We formulate the prior distribution of the mixture as a Chinese restaurant process (CRP) that instantiates new mixture components as needed. The update and expansion of the mixture are governed by the Bayesian nonparametric framework with an expectation maximization (EM) procedure, which dynamically adapts the model complexity without explicit task boundaries or heuristics. Moreover, we use the domain randomization technique to train robust prior parameters for the initialization of each task model in the mixture; thus, the resulting model can better generalize and adapt to unseen tasks. With extensive experiments conducted on robot navigation and locomotion domains, we show that our method successfully facilitates scalable lifelong RL and outperforms relevant existing methods.
Collapse
|
4
|
Liu G, Sun Q, Wang R, Hu X. Nonzero-Sum Game-Based Voltage Recovery Consensus Optimal Control for Nonlinear Microgrids System. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8617-8629. [PMID: 35275823 DOI: 10.1109/tnnls.2022.3151650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Since most of the existing models based on the microgrids (MGs) are nonlinear, which could cause the controller oscillate, resulting in the excessive line loss, and the nonlinear could also lead to the controller design difficulty of MGs system. Therefore, this article researches the distributed voltage recovery consensus optimal control problem for the nonlinear MGs system with N -distributed generations (DGs), in the case of providing stringent real power sharing. First, based on the distributed cooperative control concept of multiagent systems and the critic neural networks (NNs), a novel distributed secondary voltage recovery consensus optimal control protocol is constructed via applying the backstepping technique and nonzero-sum (NZS) differential game strategy to realize the voltage recovery of island MGs. Meanwhile, the model identifier is established to reconstruct the unknown NZS games systems based on a three-layer NN. Then, a critic NN weight adaptive adjustment tuning law is proposed to ensure the convergence of the cost functions and the stability of the closed-loop system. Furthermore, according to Lyapunov stability theory, it is proven that all signals are uniform ultimate boundedness in the closed loop system and the voltage recovery synchronization error converges to an arbitrarily small neighborhood of the origin near. Finally, some simulation results in MATLAB illustrate the validity of the proposed control strategy.
Collapse
|
5
|
Singh R, Bhushan B. Reinforcement Learning-Based Model-Free Controller for Feedback Stabilization of Robotic Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7059-7073. [PMID: 35015649 DOI: 10.1109/tnnls.2021.3137548] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This article presents a reinforcement learning (RL) algorithm for achieving model-free control of robotic applications. The RL functions are adapted with the least-square temporal difference (LSTD) learning algorithms to develop a model-free state feedback controller by establishing linear quadratic regulator (LQR) as a baseline controller. The classical least-square policy iteration technique is adapted to establish the boundary conditions for complexities incurred by the learning algorithm. Furthermore, the use of exact and approximate policy iterations estimates the parameters of the learning functions for a feedback policy. To assess the operation of the proposed controller, the trajectory tracking and balancing control problems of unmanned helicopters and balancer robotic applications are solved for real-time experiment. The results showed the robustness of the proposed approach in achieving trajectory tracking and balancing control.
Collapse
|
6
|
Qian YY, Liu M, Wan Y, Lewis FL, Davoudi A. Distributed Adaptive Nash Equilibrium Solution for Differential Graphical Games. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:2275-2287. [PMID: 34623292 DOI: 10.1109/tcyb.2021.3114749] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article investigates differential graphical games for linear multiagent systems with a leader on fixed communication graphs. The objective is to make each agent synchronize to the leader and, meanwhile, optimize a performance index, which depends on the control policies of its own and its neighbors. To this end, a distributed adaptive Nash equilibrium solution is proposed for the differential graphical games. This solution, in contrast to the existing ones, is not only Nash but also fully distributed in the sense that each agent only uses local information of its own and its immediate neighbors without using any global information of the communication graph. Moreover, the asymptotic stability and global Nash equilibrium properties are analyzed for the proposed distributed adaptive Nash equilibrium solution. As an illustrative example, the differential graphical game solution is applied to the microgrid secondary control problem to achieve fully distributed voltage synchronization with optimized performance.
Collapse
|
7
|
Sun J, Dai J, Zhang H, Yu S, Xu S, Wang J. Neural-Network-Based Immune Optimization Regulation Using Adaptive Dynamic Programming. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1944-1953. [PMID: 35767503 DOI: 10.1109/tcyb.2022.3179302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This article investigates optimal regulation scheme between tumor and immune cells based on the adaptive dynamic programming (ADP) approach. The therapeutic goal is to inhibit the growth of tumor cells to allowable injury degree and maximize the number of immune cells in the meantime. The reliable controller is derived through the ADP approach to make the number of cells achieve the specific ideal states. First, the main objective is to weaken the negative effect caused by chemotherapy and immunotherapy, which means that the minimal dose of chemotherapeutic and immunotherapeutic drugs can be operational in the treatment process. Second, according to the nonlinear dynamical mathematical model of tumor cells, chemotherapy and immunotherapeutic drugs can act as powerful regulatory measures, which is a closed-loop control behavior. Finally, states of the system and critic weight errors are proved to be ultimately uniformly bounded with the appropriate optimization control strategy and the simulation results are shown to demonstrate the effectiveness of the cybernetics methodology.
Collapse
|
8
|
Yang Y, Modares H, Vamvoudakis KG, He W, Xu CZ, Wunsch DC. Hamiltonian-Driven Adaptive Dynamic Programming With Approximation Errors. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13762-13773. [PMID: 34495864 DOI: 10.1109/tcyb.2021.3108034] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, we consider an iterative adaptive dynamic programming (ADP) algorithm within the Hamiltonian-driven framework to solve the Hamilton-Jacobi-Bellman (HJB) equation for the infinite-horizon optimal control problem in continuous time for nonlinear systems. First, a novel function, "min-Hamiltonian," is defined to capture the fundamental properties of the classical Hamiltonian. It is shown that both the HJB equation and the policy iteration (PI) algorithm can be formulated in terms of the min-Hamiltonian within the Hamiltonian-driven framework. Moreover, we develop an iterative ADP algorithm that takes into consideration the approximation errors during the policy evaluation step. We then derive a sufficient condition on the iterative value gradient to guarantee closed-loop stability of the equilibrium point as well as convergence to the optimal value. A model-free extension based on an off-policy reinforcement learning (RL) technique is also provided. Finally, numerical results illustrate the efficacy of the proposed framework.
Collapse
|
9
|
Wang K, Mu C. Asynchronous learning for actor-critic neural networks and synchronous triggering for multiplayer system. ISA TRANSACTIONS 2022; 129:295-308. [PMID: 35216805 DOI: 10.1016/j.isatra.2022.02.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 01/19/2022] [Accepted: 02/05/2022] [Indexed: 06/14/2023]
Abstract
In this paper, based on actor-critic neural network structure and reinforcement learning scheme, a novel asynchronous learning algorithm with event communication is developed, so as to solve Nash equilibrium of multiplayer nonzero-sum differential game in an adaptive fashion. From the point of optimal control view, each player or local controller wants to minimize the individual infinite-time cost function by finding an optimal policy. In this novel learning framework, each player consists of one critic and one actor, and implements distributed asynchronous policy iteration to optimize decision-making process. In addition, communication burden between the system and players is effectively reduced by setting up a central event generator. Critic network executes fast updates by gradient-descent adaption while actor network gives event-induced updates using the gradient projection. The closed-loop asymptotic stability is ensured along with uniform ultimate convergence. Then, the effectiveness of the proposed algorithm is substantiated on a four-player nonlinear system, revealing that it can significantly reduce sampling numbers without impairing learning accuracy. Finally, by leveraging nonzero-sum game idea, the proposed learning scheme is also applied to solve the lateral-directional stability of a linear aircraft system, and is further extended to a nonlinear vehicle system for achieving adaptive cruise control.
Collapse
Affiliation(s)
- Ke Wang
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China.
| | - Chaoxu Mu
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China.
| |
Collapse
|
10
|
Mu C, Wang K, Ni Z. Adaptive Learning and Sampled-Control for Nonlinear Game Systems Using Dynamic Event-Triggering Strategy. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4437-4450. [PMID: 33621182 DOI: 10.1109/tnnls.2021.3057438] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Static event-triggering-based control problems have been investigated when implementing adaptive dynamic programming algorithms. The related triggering rules are only current state-dependent without considering previous values. This motivates our improvements. This article aims to provide an explicit formulation for dynamic event-triggering that guarantees asymptotic stability of the event-sampled nonzero-sum differential game system and desirable approximation of critic neural networks. This article first deduces the static triggering rule by processing the coupling terms of Hamilton-Jacobi equations, and then, Zeno-free behavior is realized by devising an exponential term. Subsequently, a novel dynamic-triggering rule is devised into the adaptive learning stage by defining a dynamic variable, which is mathematically characterized by a first-order filter. Moreover, mathematical proofs illustrate the system stability and the weight convergence. Theoretical analysis reveals the characteristics of dynamic rule and its relations with the static rules. Finally, a numerical example is presented to substantiate the established claims. The comparative simulation results confirm that both static and dynamic strategies can reduce the communication that arises in the control loops, while the latter undertakes less communication burden due to fewer triggered events.
Collapse
|
11
|
Wei Q, Ma H, Chen C, Dong D. Deep Reinforcement Learning With Quantum-Inspired Experience Replay. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9326-9338. [PMID: 33600343 DOI: 10.1109/tcyb.2021.3053414] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, a novel training paradigm inspired by quantum computation is proposed for deep reinforcement learning (DRL) with experience replay. In contrast to the traditional experience replay mechanism in DRL, the proposed DRL with quantum-inspired experience replay (DRL-QER) adaptively chooses experiences from the replay buffer according to the complexity and the replayed times of each experience (also called transition), to achieve a balance between exploration and exploitation. In DRL-QER, transitions are first formulated in quantum representations and then the preparation operation and depreciation operation are performed on the transitions. In this process, the preparation operation reflects the relationship between the temporal-difference errors (TD-errors) and the importance of the experiences, while the depreciation operation is taken into account to ensure the diversity of the transitions. The experimental results on Atari 2600 games show that DRL-QER outperforms state-of-the-art algorithms, such as DRL-PER and DCRL on most of these games with improved training efficiency and is also applicable to such memory-based DRL approaches as double network and dueling network.
Collapse
|
12
|
Event-triggered integral reinforcement learning for nonzero-sum games with asymmetric input saturation. Neural Netw 2022; 152:212-223. [DOI: 10.1016/j.neunet.2022.04.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 04/04/2022] [Accepted: 04/14/2022] [Indexed: 11/20/2022]
|
13
|
Liu P, Zhang H, Sun J, Tan Z. Event-triggered adaptive integral reinforcement learning method for zero-sum differential games of nonlinear systems with incomplete known dynamics. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07010-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
14
|
Robust Tracking Control for Non-Zero-Sum Games of Continuous-Time Uncertain Nonlinear Systems. MATHEMATICS 2022. [DOI: 10.3390/math10111904] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In this paper, a new adaptive critic design is proposed to approximate the online Nash equilibrium solution for the robust trajectory tracking control of non-zero-sum (NZS) games for continuous-time uncertain nonlinear systems. First, the augmented system was constructed by combining the tracking error and the reference trajectory. By modifying the cost function, the robust tracking control problem was transformed into an optimal tracking control problem. Based on adaptive dynamic programming (ADP), a single critic neural network (NN) was applied for each player to solve the coupled Hamilton–Jacobi–Bellman (HJB) equations approximately, and the obtained control laws were regarded as the feedback Nash equilibrium. Two additional terms were introduced in the weight update law of each critic NN, which strengthened the weight update process and eliminated the strict requirements for the initial stability control policy. More importantly, in theory, through the Lyapunov theory, the stability of the closed-loop system was guaranteed, and the robust tracking performance was analyzed. Finally, the effectiveness of the proposed scheme was verified by two examples.
Collapse
|
15
|
Off-policy algorithm based Hierarchical optimal control for completely unknown dynamic systems. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.11.077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
16
|
Wei Q, Zhu L, Song R, Zhang P, Liu D, Xiao J. Model-Free Adaptive Optimal Control for Unknown Nonlinear Multiplayer Nonzero-Sum Game. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:879-892. [PMID: 33108297 DOI: 10.1109/tnnls.2020.3030127] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, an online adaptive optimal control algorithm based on adaptive dynamic programming is developed to solve the multiplayer nonzero-sum game (MP-NZSG) for discrete-time unknown nonlinear systems. First, a model-free coupled globalized dual-heuristic dynamic programming (GDHP) structure is designed to solve the MP-NZSG problem, in which there is no model network or identifier. Second, in order to relax the requirement of systems dynamics, an online adaptive learning algorithm is developed to solve the Hamilton-Jacobi equation using the system states of two adjacent time steps. Third, a series of critic networks and action networks are used to approximate value functions and optimal policies for all players. All the neural network (NN) weights are updated online based on real-time system states. Fourth, the uniformly ultimate boundedness analysis of the NN approximation errors is proved based on the Lyapunov approach. Finally, simulation results are given to demonstrate the effectiveness of the developed scheme.
Collapse
|
17
|
Online event-based adaptive critic design with experience replay to solve partially unknown multi-player nonzero-sum games. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
18
|
Liu P, Zhang H, Ren H, Liu C. Online event-triggered adaptive critic design for multi-player zero-sum games of partially unknown nonlinear systems with input constraints. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.07.058] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
19
|
Yang Y, Zhu H, Zhang Q, Zhao B, Li Z, Wunsch DC. Sparse online kernelized actor-critic Learning in reproducing kernel Hilbert space. Artif Intell Rev 2021. [DOI: 10.1007/s10462-021-10045-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
20
|
Wang N, Gao Y, Zhao H, Ahn CK. Reinforcement Learning-Based Optimal Tracking Control of an Unknown Unmanned Surface Vehicle. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3034-3045. [PMID: 32745008 DOI: 10.1109/tnnls.2020.3009214] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, a novel reinforcement learning-based optimal tracking control (RLOTC) scheme is established for an unmanned surface vehicle (USV) in the presence of complex unknowns, including dead-zone input nonlinearities, system dynamics, and disturbances. To be specific, dead-zone nonlinearities are decoupled to be input-dependent sloped controls and unknown biases that are encapsulated into lumped unknowns within tracking error dynamics. Neural network (NN) approximators are further deployed to adaptively identify complex unknowns and facilitate a Hamilton-Jacobi-Bellman (HJB) equation that formulates optimal tracking. In order to derive a practically optimal solution, an actor-critic reinforcement learning framework is built by employing adaptive NN identifiers to recursively approximate the total optimal policy and cost function. Eventually, theoretical analysis shows that the entire RLOTC scheme can render tracking errors that converge to an arbitrarily small neighborhood of the origin, subject to optimal cost. Simulation results and comprehensive comparisons on a prototype USV demonstrate remarkable effectiveness and superiority.
Collapse
|
21
|
Online optimal learning algorithm for Stackelberg games with partially unknown dynamics and constrained inputs. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
22
|
Yang X, He H. Decentralized Event-Triggered Control for a Class of Nonlinear-Interconnected Systems Using Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:635-648. [PMID: 31670691 DOI: 10.1109/tcyb.2019.2946122] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, we propose a novel decentralized event-triggered control (ETC) scheme for a class of continuous-time nonlinear systems with matched interconnections. The present interconnected systems differ from most of the existing interconnected plants in that their equilibrium points are no longer assumed to be zero. Initially, we establish a theorem to indicate that the decentralized ETC law for the overall system can be represented by an array of optimal ETC laws for nominal subsystems. Then, to obtain these optimal ETC laws, we develop a reinforcement learning (RL)-based method to solve the Hamilton-Jacobi-Bellman equations arising in the discounted-cost optimal ETC problems of the nominal subsystems. Meanwhile, we only use critic networks to implement the RL-based approach and tune the critic network weight vectors by using the gradient descent method and the concurrent learning technique together. With the proposed weight vectors tuning rule, we are able to not only relax the persistence of the excitation condition but also ensure the critic network weight vectors to be uniformly ultimately bounded. Moreover, by utilizing the Lyapunov method, we prove that the obtained decentralized ETC law can force the entire system to be stable in the sense of uniform ultimate boundedness. Finally, we validate the proposed decentralized ETC strategy through simulations of the nonlinear-interconnected systems derived from two inverted pendulums connected via a spring.
Collapse
|
23
|
Zhang Y, Zhao B, Liu D. Event-triggered adaptive dynamic programming for multi-player zero-sum games with unknown dynamics. Soft comput 2021. [DOI: 10.1007/s00500-020-05293-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
24
|
Yang Y, Vamvoudakis KG, Modares H, Yin Y, Wunsch DC. Safe Intermittent Reinforcement Learning With Static and Dynamic Event Generators. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5441-5455. [PMID: 32054590 DOI: 10.1109/tnnls.2020.2967871] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, we present an intermittent framework for safe reinforcement learning (RL) algorithms. First, we develop a barrier function-based system transformation to impose state constraints while converting the original problem to an unconstrained optimization problem. Second, based on optimal derived policies, two types of intermittent feedback RL algorithms are presented, namely, a static and a dynamic one. We finally leverage an actor/critic structure to solve the problem online while guaranteeing optimality, stability, and safety. Simulation results show the efficacy of the proposed approach.
Collapse
|
25
|
Bai W, Li T, Tong S. NN Reinforcement Learning Adaptive Control for a Class of Nonstrict-Feedback Discrete-Time Systems. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4573-4584. [PMID: 31995515 DOI: 10.1109/tcyb.2020.2963849] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article investigates an adaptive reinforcement learning (RL) optimal control design problem for a class of nonstrict-feedback discrete-time systems. Based on the neural network (NN) approximating ability and RL control design technique, an adaptive backstepping RL optimal controller and a minimal learning parameter (MLP) adaptive RL optimal controller are developed by establishing a novel strategic utility function and introducing external function terms. It is proved that the proposed adaptive RL optimal controllers can guarantee that all signals in the closed-loop systems are semiglobal uniformly ultimately bounded (SGUUB). The main feature is that the proposed schemes can solve the optimal control problem that the previous literature cannot deal with. Furthermore, the proposed MPL adaptive optimal control scheme can reduce the number of adaptive laws, and thus the computational complexity is decreased. Finally, the simulation results illustrate the validity of the proposed optimal control schemes.
Collapse
|
26
|
Neural networks-based optimal tracking control for nonzero-sum games of multi-player continuous-time nonlinear systems via reinforcement learning. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.06.083] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
27
|
Mu C, Wang K, Zhang Q, Zhao D. Hierarchical optimal control for input-affine nonlinear systems through the formulation of Stackelberg game. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2019.12.078] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
28
|
Wan Z, Jiang C, Fahad M, Ni Z, Guo Y, He H. Robot-Assisted Pedestrian Regulation Based on Deep Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:1669-1682. [PMID: 30475740 DOI: 10.1109/tcyb.2018.2878977] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Pedestrian regulation can prevent crowd accidents and improve crowd safety in densely populated areas. Recent studies use mobile robots to regulate pedestrian flows for desired collective motion through the effect of passive human-robot interaction (HRI). This paper formulates a robot motion planning problem for the optimization of two merging pedestrian flows moving through a bottleneck exit. To address the challenge of feature representation of complex human motion dynamics under the effect of HRI, we propose using a deep neural network to model the mapping from the image input of pedestrian environments to the output of robot motion decisions. The robot motion planner is trained end-to-end using a deep reinforcement learning algorithm, which avoids hand-crafted feature detection and extraction, thus improving the learning capability for complex dynamic problems. Our proposed approach is validated in simulated experiments, and its performance is evaluated. The results demonstrate that the robot is able to find optimal motion decisions that maximize the pedestrian outflow in different flow conditions, and the pedestrian-accumulated outflow increases significantly compared to cases without robot regulation and with random robot motion.
Collapse
|
29
|
Su H, Zhang H, Sun S, Cai Y. Integral reinforcement learning-based online adaptive event-triggered control for non-zero-sum games of partially unknown nonlinear systems. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.09.088] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|