1
|
Shen Z, Dong T, Huang T. Asynchronous iterative Q-learning based tracking control for nonlinear discrete-time multi-agent systems. Neural Netw 2024; 180:106667. [PMID: 39216294 DOI: 10.1016/j.neunet.2024.106667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/24/2024] [Accepted: 08/23/2024] [Indexed: 09/04/2024]
Abstract
This paper addresses the tracking control problem of nonlinear discrete-time multi-agent systems (MASs). First, a local neighborhood error system (LNES) is constructed. Then, a novel tracking algorithm based on asynchronous iterative Q-learning (AIQL) is developed, which can transform the tracking problem into the optimal regulation of LNES. The AIQL-based algorithm has two Q values QiA and QiB for each agent i, where QiA is used for improving the control policy and QiB is used for evaluating the value of the control policy. Moreover, the convergence of LNES is given. It is shown that the LNES converges to 0 and the tracking problem is solved. A neural network-based actor-critic framework is used to implement AIQL. The critic network of AIQL is composed of two neural networks, which are used for approximating QiA and QiB respectively. Finally, simulation results are given to verify the performance of the developed algorithm. It is shown that the AIQL-based tracking algorithm has a lower cost value and faster convergence speed than the IQL-based tracking algorithm.
Collapse
Affiliation(s)
- Ziwen Shen
- College of Electronics and Information Engineering, Southwest University, Chongqing, 400715, PR China
| | - Tao Dong
- College of Electronics and Information Engineering, Southwest University, Chongqing, 400715, PR China.
| | - Tingwen Huang
- Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, 518055, PR China
| |
Collapse
|
2
|
Wang J, Wang W, Liang X. Finite-horizon optimal secure tracking control under denial-of-service attacks. ISA TRANSACTIONS 2024; 149:44-53. [PMID: 38692974 DOI: 10.1016/j.isatra.2024.04.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 04/23/2024] [Accepted: 04/23/2024] [Indexed: 05/03/2024]
Abstract
The finite-horizon optimal secure tracking control (FHOSTC) problem for cyber-physical systems under actuator denial-of-service (DoS) attacks is addressed in this paper. A model-free method based on the Q-function is designed to achieve FHOSTC without the system model information. First, an augmented time-varying Riccati equation (TVRE) is derived by integrating the system with the reference system into a unified augmented system. Then, a lower bound on malicious DoS attacks probability that guarantees the solutions of the TVRE is provided. Third, a Q-function that changes over time (time-varying Q-function, TVQF) is devised. A TVQF-based method is then proposed to solve the TVRE without the need for the knowledge of the augmented system dynamics. The developed method works backward-in-time and uses the least-squares method. To validate the performance and features of the developed method, simulation studies are conducted in the end.
Collapse
Affiliation(s)
- Jian Wang
- Key Laboratory of Marine Intelligent Equipment and System Ministry of Education, Shanghai Jiao Tong University, Shanghai, 200240, PR China
| | - Wei Wang
- School of Information Engineering, Zhongnan University of Economics and Law, Wuhan 430073, PR China; School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, PR China.
| | - Xiaofeng Liang
- Key Laboratory of Marine Intelligent Equipment and System Ministry of Education, Shanghai Jiao Tong University, Shanghai, 200240, PR China
| |
Collapse
|
3
|
Wang J, Wu J, Shen H, Cao J, Rutkowski L. Fuzzy H ∞ Control of Discrete-Time Nonlinear Markov Jump Systems via a Novel Hybrid Reinforcement Q-Learning Method. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7380-7391. [PMID: 36417712 DOI: 10.1109/tcyb.2022.3220537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this article, a novel hybrid reinforcement Q -learning control method is proposed to solve the adaptive fuzzy H∞ control problem of discrete-time nonlinear Markov jump systems based on the Takagi-Sugeno fuzzy model. First, the core problem of adaptive fuzzy H∞ control is converted to solving fuzzy game coupled algebraic Riccati equation, which can hardly be solved by mathematical methods directly. To solve this problem, an offline parallel hybrid learning algorithm is first designed, where system dynamics should be known as a prior. Furthermore, an online parallel Q -learning hybrid learning algorithm is developed. The main characteristics of the proposed online hybrid learning algorithms are threefold: 1) system dynamics are avoided during the learning process; 2) compared with the policy iteration method, the restriction of the initial stable control policy is removed; and 3) compared with the value iteration method, a faster convergence rate can be obtained. Finally, we provide a tunnel diode circuit system model to validate the effectiveness of the present learning algorithm.
Collapse
|
4
|
Perrusquía A, Guo W. Reward inference of discrete-time expert's controllers: A complementary learning approach. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.02.079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
|
5
|
Perrusquia A, Guo W. A Closed-Loop Output Error Approach for Physics-Informed Trajectory Inference Using Online Data. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1379-1391. [PMID: 36129867 DOI: 10.1109/tcyb.2022.3202864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
While autonomous systems can be used for a variety of beneficial applications, they can also be used for malicious intentions and it is mandatory to disrupt them before they act. So, an accurate trajectory inference algorithm is required for monitoring purposes that allows to take appropriate countermeasures. This article presents a closed-loop output error approach for trajectory inference of a class of linear systems. The approach combines the main advantages of state estimation and parameter identification algorithms in a complementary fashion using online data and an estimated model, which is constructed by the state and parameter estimates, that inform about the physics of the system to infer the followed noise-free trajectory. Exact model matching and estimation error cases are analyzed. A composite update rule based on a least-squares rule is also proposed to improve robustness and parameter and state convergence. The stability and convergence of the proposed approaches are assessed via the Lyapunov stability theory under the fulfilment of a persistent excitation condition. Simulation studies are carried out to validate the proposed approaches.
Collapse
|
6
|
Wang D, Ren J, Ha M. Discounted linear Q-learning control with novel tracking cost and its stability. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
7
|
Zhang D, Ye Z, Feng G, Li H. Intelligent Event-Based Fuzzy Dynamic Positioning Control of Nonlinear Unmanned Marine Vehicles Under DoS Attack. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13486-13499. [PMID: 34860659 DOI: 10.1109/tcyb.2021.3128170] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article addresses the dynamic positioning control problem of a nonlinear unmanned marine vehicle (UMV) system subject to network communication constraints and deny-of-service (DoS) attack, where the dynamics of UMV are described by a Takagi-Sugeno (T-S) fuzzy system (TSFS). In order to save limited communication resource, a new intelligent event-triggering mechanism is proposed, in which the event triggering threshold is optimized by a Q -learning algorithm. Then, a switched system approach is proposed to deal with the aperiodic DoS attack occurring in the communication channels. With a proper piecewise Lyapunov function, some sufficient conditions for global exponential stability (GES) of the closed-loop nonlinear UMV system are derived, and the corresponding observer and controller gains are designed via solving a set of matrix inequalities. A benchmark nonlinear UMV system is adopted as an example in simulation, and the simulation results validate the effectiveness of the proposed control method.
Collapse
|
8
|
Rizvi SAA, Pertzborn AJ, Lin Z. Reinforcement Learning Based Optimal Tracking Control Under Unmeasurable Disturbances With Application to HVAC Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:7523-7533. [PMID: 34129505 PMCID: PMC9703879 DOI: 10.1109/tnnls.2021.3085358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This paper presents the design of an optimal controller for solving tracking problems subject to unmeasurable disturbances and unknown system dynamics using reinforcement learning (RL). Many existing RL control methods take disturbance into account by directly measuring it and manipulating it for exploration during the learning process, thereby preventing any disturbance induced bias in the control estimates. However, in most practical scenarios, disturbance is neither measurable nor manipulable. The main contribution of this article is the introduction of a combination of a bias compensation mechanism and the integral action in the Q-learning framework to remove the need to measure or manipulate the disturbance, while preventing disturbance induced bias in the optimal control estimates. A bias compensated Q-learning scheme is presented that learns the disturbance induced bias terms separately from the optimal control parameters and ensures the convergence of the control parameters to the optimal solution even in the presence of unmeasurable disturbances. Both state feedback and output feedback algorithms are developed based on policy iteration (PI) and value iteration (VI) that guarantee the convergence of the tracking error to zero. The feasibility of the design is validated on a practical optimal control application of a heating, ventilating, and air conditioning (HVAC) zone controller.
Collapse
|
9
|
Perrusquía A. Human-behavior learning: A new complementary learning perspective for optimal decision making controllers. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.03.036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
10
|
Solution of the linear quadratic regulator problem of black box linear systems using reinforcement learning. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.03.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
11
|
Long M, Su H, Zeng Z. Output-Feedback Global Consensus of Discrete-Time Multiagent Systems Subject to Input Saturation via Q-Learning Method. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1661-1670. [PMID: 32396125 DOI: 10.1109/tcyb.2020.2987385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article proposes a Q -learning (QL)-based algorithm for global consensus of saturated discrete-time multiagent systems (DTMASs) via output feedback. According to the low-gain feedback (LGF) theory, control inputs of the saturated DTMASs can avoid the saturation by utilizing the control policies with LGF matrices, which were computed from the modified algebraic Riccati equation (MARE) by requiring the information of system dynamics in most previous works. However, in this article, we first find the lower bound on the real part of Laplacian matrices' nonzero eigenvalues of directed network topologies. Then, we define a test control input and propose a Q -function to derive a QL Bellman equation, which plays an essential part of the QL algorithm. Subsequently, different from the previous works, the output-feedback gain (OFG) matrix of this article can be obtained by limited iterations of the QL algorithm without requiring the information of agent dynamics and network topologies of the saturated DTMASs. Furthermore, the saturated DTMASs can achieve global consensus rather than the semiglobal consensus of the previous results. Finally, the effectiveness of the QL algorithm is confirmed via two simulations.
Collapse
|
12
|
Integral reinforcement learning-based optimal output feedback control for linear continuous-time systems with input delay. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.06.073] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
13
|
Luo B, Yang Y, Liu D. Policy Iteration Q-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:3630-3640. [PMID: 32092032 DOI: 10.1109/tcyb.2020.2970969] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, the data-based two-player zero-sum game problem is considered for linear discrete-time systems. This problem theoretically depends on solving the discrete-time game algebraic Riccati equation (DTGARE), while it requires complete system dynamics. To avoid solving the DTGARE, the Q -function is introduced and a data-based policy iteration Q -learning (PIQL) algorithm is developed to learn the optimal Q -function by using data collected from the real system. Writing the Q -function in a quadratic form, it is proved that the PIQL algorithm is equivalent to the Newton iteration method in the Banach space by using the Fréchet derivative. Then, the convergence of the PIQL algorithm can be guaranteed by Kantorovich's theorem. For the realization of the PIQL algorithm, the off-policy learning scheme is proposed using real data rather than the system model. Finally, the efficiency of the developed data-based PIQL method is validated through simulation studies.
Collapse
|
14
|
Calafiore GC, Possieri C. Output Feedback Q-Learning for Linear-Quadratic Discrete-Time Finite-Horizon Control Problems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3274-3281. [PMID: 32745011 DOI: 10.1109/tnnls.2020.3010304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
An algorithm is proposed to determine output feedback policies that solve finite-horizon linear-quadratic (LQ) optimal control problems without requiring knowledge of the system dynamical matrices. To reach this goal, the Q -factors arising from finite-horizon LQ problems are first characterized in the state feedback case. It is then shown how they can be parameterized as functions of the input-output vectors. A procedure is then proposed for estimating these functions from input/output data and using these estimates for computing the optimal control via the measured inputs and outputs.
Collapse
|
15
|
Intelligent adaptive optimal control using incremental model-based global dual heuristic programming subject to partial observability. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107153] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
16
|
Adaptive output-feedback optimal control for continuous-time linear systems based on adaptive dynamic programming approach. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.01.070] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|