1
|
Li G, Wang J, Liu F, Deng F. Target-Attackers-Defenders Linear-Quadratic Exponential Stochastic Differential Games With Distributed Control. IEEE TRANSACTIONS ON CYBERNETICS 2025; PP:574-587. [PMID: 40030870 DOI: 10.1109/tcyb.2024.3508694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
This article investigates stochastic differential games involving multiple attackers, defenders, and a single target, with their interactions defined by a distributed topology. By leveraging principles of topological graph theory, a distributed design strategy is developed that operates without requiring global information, thereby minimizing system coupling. Additionally, this study extends the analysis to incorporate stochastic elements into the target-attackers-defenders games, moving beyond the scope of deterministic differential games. Using the direct method of completing the square and the Radon-Nikodym derivative, we derive optimal distributed control strategies for two scenarios: one where the target follows a predefined trajectory and another where it has free maneuverability. In both scenarios, our research demonstrates the effectiveness of the designed control strategies in driving the system toward a Nash equilibrium. Notably, our algorithm eliminates the need to solve the coupled Hamilton-Jacobi equation, significantly reducing computational complexity. To validate the effectiveness of the proposed control strategies, numerical simulations are presented in this article.
Collapse
|
2
|
Wei W, Wang J, Du J, Fang Z, Ren Y, Chen CLP. Differential Game-Based Deep Reinforcement Learning in Underwater Target Hunting Task. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:462-474. [PMID: 37889822 DOI: 10.1109/tnnls.2023.3325580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2023]
Abstract
To meet requirements for real-time trajectory scheduling and distributed coordination, underwater target hunting task is challenging in terms of turbulent ocean environments and dynamic adversarial environment. Despite the existing research in game-based target hunting area, few approaches have considered dynamic environmental factors, such as sea currents, winds, and communication delay. In this article, we focus on a target hunting system consisted of multiple unmanned underwater vehicles (UUVs) and a target with high maneuverability. Besides, differential game theory is leveraged to analyze adversarial behaviors between hunters and the escapee. However, it is intractable that UUVs have to deploy an adaptive scheme to guarantee the consistency and avoid the escape of the target without collision. Therefore, we conceive the Hamiltonian function with Leibniz's formula to obtain feedback control policies. In addition, it proves that the target hunting system is asymptotically stable in the mean, and the system can satisfy Nash equilibrium relying on the proposed control policies. Furthermore, we design a modified multiagent reinforcement learning (MARL) to facilitate the underwater target hunting task under the constraints of energetic flows and acoustic propagation delay. Simulation results show that the proposed scheme is superior to the typical MARL algorithm in terms of reward and success rate.
Collapse
|
3
|
Ming Z, Zhang H, Wang Y, Dai J. Policy Iteration Q-Learning for Linear Itô Stochastic Systems With Markovian Jumps and its Application to Power Systems. IEEE TRANSACTIONS ON CYBERNETICS 2024; 54:7804-7813. [PMID: 38865225 DOI: 10.1109/tcyb.2024.3403680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2024]
Abstract
This article addresses the solution of continuous-time linear Itô stochastic systems with Markovian jumps using an online policy iteration (PI) approach grounded in -learning. Initially, a model-dependent offline algorithm, structured according to traditional optimal control strategies, is designed to solve the algebraic Riccati equation (ARE). Employing Lyapunov theory, we rigorously derive the convergence of the offline PI algorithm and the admissibility of the iterative control law through mathematical analysis. This article represents the first attempt to tackle these technical challenges. Subsequently, to address the limitations inherent in the offline algorithm, we introduce a novel online -learning algorithm tailored for Itô stochastic systems with Markovian jumps. The proposed -learning algorithm obviates the need for transition probabilities and system matrices. We provide a thorough stability analysis of the closed-loop system. Finally, the effectiveness and applicability of the proposed algorithms are demonstrated through a simulation example, underpinned by the theorems established herein.
Collapse
|
4
|
Yan J, Cao W, Yang X, Chen C, Guan X. Communication-Efficient and Collision-Free Motion Planning of Underwater Vehicles via Integral Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8306-8320. [PMID: 37015364 DOI: 10.1109/tnnls.2022.3226776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Motion planning of underwater vehicles is regarded as a promising technique to make up the flexibility deficiency of underwater sensor networks (USNs). Nonetheless, the unique characteristics of underwater channel and environment make it challenging to achieve the above mission. This article is concerned with a communication-efficient and collision-free motion planning issue for underwater vehicles in fading channel and obstacle environment. We first develop a model-based integral reinforcement learning (IRL) estimator to predict the stochastic signal-to-noise ratio (SNR). With the estimated SNR, an integrated optimization problem for the codesign of communication efficiency and motion planning is constructed, in which the underwater vehicle dynamics, communication capacity, collision avoidance, and position control are all considered. In order to tackle this problem, a model-free IRL algorithm is designed to drive underwater vehicles to the desired position points while maximizing the communication capacity and avoiding the collision. It is worth mentioning that, the proposed motion planning solution in this article considers a realistic underwater communication channel, as well as a realistic dynamic model for underwater vehicles. Finally, simulation and experimental results are demonstrated to verify the effectiveness of the proposed approach.
Collapse
|
5
|
Liang Y, Zhang H, Zhang J, Ming Z. Event-Triggered Guarantee Cost Control for Partially Unknown Stochastic Systems via Explorized Integral Reinforcement Learning Strategy. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7830-7844. [PMID: 36395138 DOI: 10.1109/tnnls.2022.3221105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this article, an integral reinforcement learning (IRL)-based event-triggered guarantee cost control (GCC) approach is proposed for stochastic systems which are modulated by randomly time-varying parameters. First, with the aid of the RL algorithm, the optimal GCC (OGCC) problem is converted into an optimal zero-sum game by solving a modified Hamilton-Jacobin-Isaac (HJI) equation of the auxiliary system. Moreover, in order to address the stochastic zero-sum game, we propose an on-policy IRL-based control approach involved by the multivariate probabilistic collocation method (MPCM), which can accurately predict the mean value of uncertain functions with randomly time-varying parameters. Furthermore, a novel GCC method, which combines the explorized IRL algorithm and MPCM, is designed to relax the restriction of knowing the system dynamics for the class of stochastic systems. On this foundation, for the purpose of reducing computation cost and avoiding the waste of resources, we propose an event-triggered GCC approach involved with explorized IRL and MPCM by utilizing critic-actor-disturbance neural networks (NNs). Meanwhile, the weight vectors of three NNs are updated simultaneously and aperiodically according to the designed triggering condition. The ultimate boundedness (UB) properties of the controlled systems have been proved by means of the Lyapunov theorem. Finally, the effectiveness of the developed GCC algorithms is illustrated via two simulation examples.
Collapse
|
6
|
Qian YY, Liu M, Wan Y, Lewis FL, Davoudi A. Distributed Adaptive Nash Equilibrium Solution for Differential Graphical Games. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:2275-2287. [PMID: 34623292 DOI: 10.1109/tcyb.2021.3114749] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article investigates differential graphical games for linear multiagent systems with a leader on fixed communication graphs. The objective is to make each agent synchronize to the leader and, meanwhile, optimize a performance index, which depends on the control policies of its own and its neighbors. To this end, a distributed adaptive Nash equilibrium solution is proposed for the differential graphical games. This solution, in contrast to the existing ones, is not only Nash but also fully distributed in the sense that each agent only uses local information of its own and its immediate neighbors without using any global information of the communication graph. Moreover, the asymptotic stability and global Nash equilibrium properties are analyzed for the proposed distributed adaptive Nash equilibrium solution. As an illustrative example, the differential graphical game solution is applied to the microgrid secondary control problem to achieve fully distributed voltage synchronization with optimized performance.
Collapse
|
7
|
Li H, Wu Y, Chen M, Lu R. Adaptive Multigradient Recursive Reinforcement Learning Event-Triggered Tracking Control for Multiagent Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:144-156. [PMID: 34197328 DOI: 10.1109/tnnls.2021.3090570] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article proposes a fault-tolerant adaptive multigradient recursive reinforcement learning (RL) event-triggered tracking control scheme for strict-feedback discrete-time multiagent systems. The multigradient recursive RL algorithm is used to avoid the local optimal problem that may exist in the gradient descent scheme. Different from the existing event-triggered control results, a new lemma about the relative threshold event-triggered control strategy is proposed to handle the compensation error, which can improve the utilization of communication resources and weaken the negative impact on tracking accuracy and closed-loop system stability. To overcome the difficulty caused by sensor fault, a distributed control method is introduced by adopting the adaptive compensation technique, which can effectively decrease the number of online estimation parameters. Furthermore, by using the multigradient recursive RL algorithm with less learning parameters, the online estimation time can be effectively reduced. The stability of closed-loop multiagent systems is proved by using the Lyapunov stability theorem, and it is verified that all signals are semiglobally uniformly ultimately bounded. Finally, two simulation examples are given to show the availability of the presented control scheme.
Collapse
|
8
|
Wang K, Mu C. Asynchronous learning for actor-critic neural networks and synchronous triggering for multiplayer system. ISA TRANSACTIONS 2022; 129:295-308. [PMID: 35216805 DOI: 10.1016/j.isatra.2022.02.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 01/19/2022] [Accepted: 02/05/2022] [Indexed: 06/14/2023]
Abstract
In this paper, based on actor-critic neural network structure and reinforcement learning scheme, a novel asynchronous learning algorithm with event communication is developed, so as to solve Nash equilibrium of multiplayer nonzero-sum differential game in an adaptive fashion. From the point of optimal control view, each player or local controller wants to minimize the individual infinite-time cost function by finding an optimal policy. In this novel learning framework, each player consists of one critic and one actor, and implements distributed asynchronous policy iteration to optimize decision-making process. In addition, communication burden between the system and players is effectively reduced by setting up a central event generator. Critic network executes fast updates by gradient-descent adaption while actor network gives event-induced updates using the gradient projection. The closed-loop asymptotic stability is ensured along with uniform ultimate convergence. Then, the effectiveness of the proposed algorithm is substantiated on a four-player nonlinear system, revealing that it can significantly reduce sampling numbers without impairing learning accuracy. Finally, by leveraging nonzero-sum game idea, the proposed learning scheme is also applied to solve the lateral-directional stability of a linear aircraft system, and is further extended to a nonlinear vehicle system for achieving adaptive cruise control.
Collapse
Affiliation(s)
- Ke Wang
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China.
| | - Chaoxu Mu
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China.
| |
Collapse
|
9
|
Yang X, Xu M, Wei Q. Dynamic Event-Sampled Control of Interconnected Nonlinear Systems Using Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:923-937. [PMID: 35666792 DOI: 10.1109/tnnls.2022.3178017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
We develop a decentralized dynamic event-based control strategy for nonlinear systems subject to matched interconnections. To begin with, we introduce a dynamic event-based sampling mechanism, which relies on the system's states and the variables generated by time-based differential equations. Then, we prove that the decentralized event-based controller for the whole system is composed of all the optimal event-based control policies of nominal subsystems. To derive these optimal event-based control policies, we design a critic-only architecture to solve the related event-based Hamilton-Jacobi-Bellman equations in the reinforcement learning framework. The implementation of such an architecture uses only critic neural networks (NNs) with their weight vectors being updated through the gradient descent method together with concurrent learning. After that, we demonstrate that the asymptotic stability of closed-loop nominal subsystems and the uniformly ultimate boundedness stability of critic NNs' weight estimation errors are guaranteed by using Lyapunov's approach. Finally, we provide simulations of a matched nonlinear-interconnected plant to validate the present theoretical claims.
Collapse
|
10
|
Zhou Z, Xu H. Decentralized optimal large scale multi-player pursuit-evasion strategies: A mean field game approach with reinforcement learning. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.01.141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|