1
|
Zhu L, Wei Q, Guo P. Synergetic Learning Neuro-Control for Unknown Affine Nonlinear Systems With Asymptotic Stability Guarantees. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3479-3489. [PMID: 38190681 DOI: 10.1109/tnnls.2023.3347663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
For completely unknown affine nonlinear systems, in this article, a synergetic learning algorithm (SLA) is developed to learn an optimal control. Unlike the conventional Hamilton-Jacobi-Bellman equation (HJBE) with system dynamics, a model-free HJBE (MF-HJBE) is deduced by means of off-policy reinforcement learning (RL). Specifically, the equivalence between HJBE and MF-HJBE is first bridged from the perspective of the uniqueness of the solution of the HJBE. Furthermore, it is proven that once the solution of MF-HJBE exists, its corresponding control input renders the system asymptotically stable and optimizes the cost function. To solve the MF-HJBE, the two agents composing the synergetic learning (SL) system, the critic agent and the actor agent, can evolve in real-time using only the system state data. By building an experience reply (ER)-based learning rule, it is proven that when the critic agent evolves toward the optimal cost function, the actor agent not only evolves toward the optimal control, but also guarantees the asymptotic stability of the system. Finally, simulations of the F16 aircraft system and the Van der Pol oscillator are conducted and the results support the feasibility of the developed SLA.
Collapse
|
2
|
Liang Y, Zhang H, Zhang J, Ming Z. Event-Triggered Guarantee Cost Control for Partially Unknown Stochastic Systems via Explorized Integral Reinforcement Learning Strategy. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7830-7844. [PMID: 36395138 DOI: 10.1109/tnnls.2022.3221105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this article, an integral reinforcement learning (IRL)-based event-triggered guarantee cost control (GCC) approach is proposed for stochastic systems which are modulated by randomly time-varying parameters. First, with the aid of the RL algorithm, the optimal GCC (OGCC) problem is converted into an optimal zero-sum game by solving a modified Hamilton-Jacobin-Isaac (HJI) equation of the auxiliary system. Moreover, in order to address the stochastic zero-sum game, we propose an on-policy IRL-based control approach involved by the multivariate probabilistic collocation method (MPCM), which can accurately predict the mean value of uncertain functions with randomly time-varying parameters. Furthermore, a novel GCC method, which combines the explorized IRL algorithm and MPCM, is designed to relax the restriction of knowing the system dynamics for the class of stochastic systems. On this foundation, for the purpose of reducing computation cost and avoiding the waste of resources, we propose an event-triggered GCC approach involved with explorized IRL and MPCM by utilizing critic-actor-disturbance neural networks (NNs). Meanwhile, the weight vectors of three NNs are updated simultaneously and aperiodically according to the designed triggering condition. The ultimate boundedness (UB) properties of the controlled systems have been proved by means of the Lyapunov theorem. Finally, the effectiveness of the developed GCC algorithms is illustrated via two simulation examples.
Collapse
|
3
|
Wang S, Wen S, Yang Y, Shi K, Huang T. Suboptimal Leader-to-Coordination Control for Nonlinear Systems With Switching Topologies: A Learning-Based Method. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10578-10588. [PMID: 35486552 DOI: 10.1109/tnnls.2022.3169417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In the cooperative control for multiagent systems (MASs), the key issues of distributed interaction, nonlinear characteristics, and optimization should be considered simultaneously, which, however, remain intractable theoretically even to this day. Considering these factors, this article investigates leader-to-formation control and optimization for nonlinear MASs using a learning-based method. Under time-varying switching topology, a fully distributed state observer based on neural networks is designed to reconstruct the dynamics and the state trajectory of the leader signal with arbitrary precision under jointly connected topology assumption. Benefitted from the observers, formation for MASs under switching topologies is transformed into tracking control for each subsystem with continuous state generated by the observers. An augmented system with discounted infinite LQR performance index is considered to optimize the control effect. Due to the complexity of solving the Hamilton-Jacobi-Bellman equation, the optimal value function is approximated by a critic network via the integral reinforcement learning method without the knowledge of drift dynamics. Meanwhile, an actor network is also presented to assure stability. The tracking errors and estimation weighted matrices are proven to be uniformly ultimately bounded. Finally, two illustrative examples are given to show the effectiveness of this method.
Collapse
|
4
|
Online Adaptive Optimal Control Algorithm Based on Synchronous Integral Reinforcement Learning with Explorations. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
5
|
Perrusquía A. Human-behavior learning: A new complementary learning perspective for optimal decision making controllers. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.03.036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
6
|
Wang H, Li M. Model-Free Reinforcement Learning for Fully Cooperative Consensus Problem of Nonlinear Multiagent Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1482-1491. [PMID: 33338022 DOI: 10.1109/tnnls.2020.3042508] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article presents an off-policy model-free algorithm based on reinforcement learning (RL) to optimize the fully cooperative (FC) consensus problem of nonlinear continuous-time multiagent systems (MASs). First, the optimal FC consensus problem is transformed into solving the coupled Hamilton-Jacobian-Bellman (HJB) equation. Then, we propose a policy iteration (PI)-based algorithm, which is further proved to be effective to solve the coupled HJB equation. To implement this scheme in a model-free way, a model-free Bellman equation is derived to find the optimal value function and the optimal control policy for each agent. Then, based on the least-squares approach, the tuning law for actor and critic weights is derived by employing actor and critic neural networks into the model-free Bellman equation to approximate the target policies and the value function. Finally, we propose an off-policy model-free integral RL (IRL) algorithm, which can be used to optimize the FC consensus problem of the whole system in real time by using measured data. The effectiveness of this proposed algorithm is verified by the simulation results.
Collapse
|
7
|
Xu Z, Shen T, Cheng D. Model-Free Reinforcement Learning by Embedding an Auxiliary System for Optimal Control of Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1520-1534. [PMID: 33347416 DOI: 10.1109/tnnls.2020.3042589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, a novel integral reinforcement learning (IRL) algorithm is proposed to solve the optimal control problem for continuous-time nonlinear systems with unknown dynamics. The main challenging issue in learning is how to reject the oscillation caused by the externally added probing noise. This article challenges the issue by embedding an auxiliary trajectory that is designed as an exciting signal to learn the optimal solution. First, the auxiliary trajectory is used to decompose the state trajectory of the controlled system. Then, by using the decoupled trajectories, a model-free policy iteration (PI) algorithm is developed, where the policy evaluation step and the policy improvement step are alternated until convergence to the optimal solution. It is noted that an appropriate external input is introduced at the policy improvement step to eliminate the requirement of the input-to-state dynamics. Finally, the algorithm is implemented on the actor-critic structure. The output weights of the critic neural network (NN) and the actor NN are updated sequentially by the least-squares methods. The convergence of the algorithm and the stability of the closed-loop system are guaranteed. Two examples are given to show the effectiveness of the proposed algorithm.
Collapse
|
8
|
Mohammadi M, Arefi MM, Vafamand N, Kaynak O. Control of an AUV with completely unknown dynamics and multi-asymmetric input constraints via off-policy reinforcement learning. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06476-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
9
|
Integral reinforcement learning-based optimal output feedback control for linear continuous-time systems with input delay. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.06.073] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
10
|
Xing X, Chang DE. The Adaptive Dynamic Programming Toolbox. SENSORS 2021; 21:s21165609. [PMID: 34451052 PMCID: PMC8402451 DOI: 10.3390/s21165609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 08/13/2021] [Accepted: 08/16/2021] [Indexed: 11/21/2022]
Abstract
The paper develops the adaptive dynamic programming toolbox (ADPT), which is a MATLAB-based software package and computationally solves optimal control problems for continuous-time control-affine systems. The ADPT produces approximate optimal feedback controls by employing the adaptive dynamic programming technique and solving the Hamilton–Jacobi–Bellman equation approximately. A novel implementation method is derived to optimize the memory consumption by the ADPT throughout its execution. The ADPT supports two working modes: model-based mode and model-free mode. In the former mode, the ADPT computes optimal feedback controls provided the system dynamics. In the latter mode, optimal feedback controls are generated from the measurements of system trajectories, without the requirement of knowledge of the system model. Multiple setting options are provided in the ADPT, such that various customized circumstances can be accommodated. Compared to other popular software toolboxes for optimal control, the ADPT features computational precision and time efficiency, which is illustrated with its applications to a highly non-linear satellite attitude control problem.
Collapse
|
11
|
Perrusquía A, Yu W. Identification and optimal control of nonlinear systems using recurrent neural networks and reinforcement learning: An overview. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.01.096] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
12
|
Wei Q, Liao Z, Yang Z, Li B, Liu D. Continuous-Time Time-Varying Policy Iteration. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4958-4971. [PMID: 31329153 DOI: 10.1109/tcyb.2019.2926631] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
A novel policy iteration algorithm, called the continuous-time time-varying (CTTV) policy iteration algorithm, is presented in this paper to obtain the optimal control laws for infinite horizon CTTV nonlinear systems. The adaptive dynamic programming (ADP) technique is utilized to obtain the iterative control laws for the optimization of the performance index function. The properties of the CTTV policy iteration algorithm are analyzed. Monotonicity, convergence, and optimality of the iterative value function have been analyzed, and the iterative value function can be proven to monotonically converge to the optimal solution of the Hamilton-Jacobi-Bellman (HJB) equation. Furthermore, the iterative control law is guaranteed to be admissible to stabilize the nonlinear systems. In the implementation of the presented CTTV policy algorithm, the approximate iterative control laws and iterative value function are obtained by neural networks. Finally, the numerical results are given to verify the effectiveness of the presented method.
Collapse
|
13
|
Zheng Z, Ruan L, Zhu M, Guo X. Reinforcement learning control for underactuated surface vessel with output error constraints and uncertainties. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.03.021] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
14
|
Optimal Output Feedback Control of Nonlinear Partially-Unknown Constrained-Input Systems Using Integral Reinforcement Learning. Neural Process Lett 2019. [DOI: 10.1007/s11063-019-10072-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
15
|
Wang D, Qiao J. Approximate neural optimal control with reinforcement learning for a torsional pendulum device. Neural Netw 2019; 117:1-7. [DOI: 10.1016/j.neunet.2019.04.026] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 04/10/2019] [Accepted: 04/30/2019] [Indexed: 11/16/2022]
|
16
|
Yang X, He H. Adaptive Critic Designs for Event-Triggered Robust Control of Nonlinear Systems With Unknown Dynamics. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:2255-2267. [PMID: 29993650 DOI: 10.1109/tcyb.2018.2823199] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper develops a novel event-triggered robust control strategy for continuous-time nonlinear systems with unknown dynamics. To begin with, the event-triggered robust nonlinear control problem is transformed into an event-triggered nonlinear optimal control problem by introducing an infinite-horizon integral cost for the nominal system. Then, a recurrent neural network (RNN) and adaptive critic designs (ACDs) are employed to solve the derived event-triggered nonlinear optimal control problem. The RNN is applied to reconstruct the system dynamics based on collected system data. After acquiring the knowledge of system dynamics, a unique critic network is proposed to obtain the approximate solution of the event-triggered Hamilton-Jacobi-Bellman equation within the framework of ACDs. The critic network is updated by using simultaneously historical and instantaneous state data. An advantage of the present critic network update law is that it can relax the persistence of excitation condition. Meanwhile, under a newly developed event-triggering condition, the proposed critic network tuning rule not only guarantees the critic network weights to converge to optimums but also ensures nominal system states to be uniformly ultimately bounded. Moreover, by using Lyapunov method, it is proved that the derived optimal event-triggered control (ETC) guarantees uniform ultimate boundedness of all the signals in the original system. Finally, a nonlinear oscillator and an unstable power system are provided to validate the developed robust ETC scheme.
Collapse
|
17
|
Luo B, Yang Y, Liu D. Adaptive -Learning for Data-Based Optimal Output Regulation With Experience Replay. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:3337-3348. [PMID: 29994038 DOI: 10.1109/tcyb.2018.2821369] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, the data-based optimal output regulation problem of discrete-time systems is investigated. An off-policy adaptive -learning (QL) method is developed by using real system data without requiring the knowledge of system dynamics and the mathematical model of utility function. By introducing the -function, an off-policy adaptive QL algorithm is developed to learn the optimal -function. An adaptive parameter in the policy evaluation is used to achieve tradeoff between the current and future -functions. The convergence of adaptive QL algorithm is proved and the influence of the adaptive parameter is analyzed. To realize the adaptive QL algorithm with real system data, the actor-critic neural network (NN) structure is developed. The least-squares scheme and the batch gradient descent method are developed to update the critic and actor NN weights, respectively. The experience replay technique is employed in the learning process, which leads to simple and convenient implementation of the adaptive QL method. Finally, the effectiveness of the developed adaptive QL method is verified through numerical simulations.
Collapse
|
18
|
Narayanan V, Jagannathan S. Event-Triggered Distributed Control of Nonlinear Interconnected Systems Using Online Reinforcement Learning With Exploration. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:2510-2519. [PMID: 28885167 DOI: 10.1109/tcyb.2017.2741342] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, a distributed control scheme for an interconnected system composed of uncertain input affine nonlinear subsystems with event triggered state feedback is presented by using a novel hybrid learning scheme-based approximate dynamic programming with online exploration. First, an approximate solution to the Hamilton-Jacobi-Bellman equation is generated with event sampled neural network (NN) approximation and subsequently, a near optimal control policy for each subsystem is derived. Artificial NNs are utilized as function approximators to develop a suite of identifiers and learn the dynamics of each subsystem. The NN weight tuning rules for the identifier and event-triggering condition are derived using Lyapunov stability theory. Taking into account, the effects of NN approximation of system dynamics and boot-strapping, a novel NN weight update is presented to approximate the optimal value function. Finally, a novel strategy to incorporate exploration in online control framework, using identifiers, is introduced to reduce the overall cost at the expense of additional computations during the initial online learning phase. System states and the NN weight estimation errors are regulated and local uniformly ultimately bounded results are achieved. The analytical results are substantiated using simulation studies.
Collapse
|
19
|
Yang X, He H. Adaptive critic designs for optimal control of uncertain nonlinear systems with unmatched interconnections. Neural Netw 2018; 105:142-153. [PMID: 29843095 DOI: 10.1016/j.neunet.2018.05.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Revised: 04/13/2018] [Accepted: 05/04/2018] [Indexed: 10/16/2022]
Abstract
In this paper, we develop a novel optimal control strategy for a class of uncertain nonlinear systems with unmatched interconnections. To begin with, we present a stabilizing feedback controller for the interconnected nonlinear systems by modifying an array of optimal control laws of auxiliary subsystems. We also prove that this feedback controller ensures a specified cost function to achieve optimality. Then, under the framework of adaptive critic designs, we use critic networks to solve the Hamilton-Jacobi-Bellman equations associated with auxiliary subsystem optimal control laws. The critic network weights are tuned through the gradient descent method combined with an additional stabilizing term. By using the newly established weight tuning rules, we no longer need the initial admissible control condition. In addition, we demonstrate that all signals in the closed-loop auxiliary subsystems are stable in the sense of uniform ultimate boundedness by using classic Lyapunov techniques. Finally, we provide an interconnected nonlinear plant to validate the present control scheme.
Collapse
Affiliation(s)
- Xiong Yang
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China; Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881, USA.
| | - Haibo He
- Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881, USA.
| |
Collapse
|
20
|
Self-learning robust optimal control for continuous-time nonlinear systems with mismatched disturbances. Neural Netw 2018; 99:19-30. [DOI: 10.1016/j.neunet.2017.11.022] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Revised: 10/16/2017] [Accepted: 11/28/2017] [Indexed: 11/19/2022]
|
21
|
Wang D, He H, Liu D. Adaptive Critic Nonlinear Robust Control: A Survey. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3429-3451. [PMID: 28682269 DOI: 10.1109/tcyb.2017.2712188] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Adaptive dynamic programming (ADP) and reinforcement learning are quite relevant to each other when performing intelligent optimization. They are both regarded as promising methods involving important components of evaluation and improvement, at the background of information technology, such as artificial intelligence, big data, and deep learning. Although great progresses have been achieved and surveyed when addressing nonlinear optimal control problems, the research on robustness of ADP-based control strategies under uncertain environment has not been fully summarized. Hence, this survey reviews the recent main results of adaptive-critic-based robust control design of continuous-time nonlinear systems. The ADP-based nonlinear optimal regulation is reviewed, followed by robust stabilization of nonlinear systems with matched uncertainties, guaranteed cost control design of unmatched plants, and decentralized stabilization of interconnected systems. Additionally, further comprehensive discussions are presented, including event-based robust control design, improvement of the critic learning rule, nonlinear H∞ control design, and several notes on future perspectives. By applying the ADP-based optimal and robust control methods to a practical power system and an overhead crane plant, two typical examples are provided to verify the effectiveness of theoretical results. Overall, this survey is beneficial to promote the development of adaptive critic control methods with robustness guarantee and the construction of higher level intelligent systems.
Collapse
|
22
|
Luo B, Liu D, Wu HN, Wang D, Lewis FL. Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3341-3354. [PMID: 27893404 DOI: 10.1109/tcyb.2016.2623859] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The model-free optimal control problem of general discrete-time nonlinear systems is considered in this paper, and a data-based policy gradient adaptive dynamic programming (PGADP) algorithm is developed to design an adaptive optimal controller method. By using offline and online data rather than the mathematical system model, the PGADP algorithm improves control policy with a gradient descent scheme. The convergence of the PGADP algorithm is proved by demonstrating that the constructed Q -function sequence converges to the optimal Q -function. Based on the PGADP algorithm, the adaptive control method is developed with an actor-critic structure and the method of weighted residuals. Its convergence properties are analyzed, where the approximate Q -function converges to its optimum. Computer simulation results demonstrate the effectiveness of the PGADP-based adaptive control method.
Collapse
|
23
|
Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.01.076] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
24
|
Yang X, Liu D, Luo B, Li C. Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.07.051] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
25
|
Modares H, Lewis FL. Optimal Output-Feedback Control of Unknown Continuous-Time Linear Systems Using Off-policy Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:2401-2410. [PMID: 28113995 DOI: 10.1109/tcyb.2015.2477810] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
A model-free off-policy reinforcement learning algorithm is developed to learn the optimal output-feedback (OPFB) solution for linear continuous-time systems. The proposed algorithm has the important feature of being applicable to the design of optimal OPFB controllers for both regulation and tracking problems. To provide a unified framework for both optimal regulation and tracking, a discounted performance function is employed and a discounted algebraic Riccati equation (ARE) is derived which gives the solution to the problem. Conditions on the existence of a solution to the discounted ARE are provided and an upper bound for the discount factor is found to assure the stability of the optimal control solution. To develop an optimal OPFB controller, it is first shown that the system state can be constructed using some limited observations on the system output over a period of the history of the system. A Bellman equation is then developed to evaluate a control policy and find an improved policy simultaneously using only some limited observations on the system output. Then, using this Bellman equation, a model-free Off-policy RL-based OPFB controller is developed without requiring the knowledge of the system state or the system dynamics. It is shown that the proposed OPFB method is more powerful than the static OPFB as it is equivalent to a state-feedback control policy. The proposed method is successfully used to solve a regulation and a tracking problem.
Collapse
|
26
|
Wang D, Li C, Liu D, Mu C. Data-based robust optimal control of continuous-time affine nonlinear systems with matched uncertainties. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.05.034] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
27
|
Luo B, Liu D, Huang T, Wang D. Model-Free Optimal Tracking Control via Critic-Only Q-Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:2134-2144. [PMID: 27416608 DOI: 10.1109/tnnls.2016.2585520] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Model-free control is an important and promising topic in control fields, which has attracted extensive attention in the past few years. In this paper, we aim to solve the model-free optimal tracking control problem of nonaffine nonlinear discrete-time systems. A critic-only Q-learning (CoQL) method is developed, which learns the optimal tracking control from real system data, and thus avoids solving the tracking Hamilton-Jacobi-Bellman equation. First, the Q-learning algorithm is proposed based on the augmented system, and its convergence is established. Using only one neural network for approximating the Q-function, the CoQL method is developed to implement the Q-learning algorithm. Furthermore, the convergence of the CoQL method is proved with the consideration of neural network approximation error. With the convergent Q-function obtained from the CoQL method, the adaptive optimal tracking control is designed based on the gradient descent scheme. Finally, the effectiveness of the developed CoQL method is demonstrated through simulation studies. The developed CoQL method learns with off-policy data and implements with a critic-only structure, thus it is easy to realize and overcome the inadequate exploration problem.
Collapse
|
28
|
Zhao D, Zhang Q, Wang D, Zhu Y. Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:854-865. [PMID: 26529794 DOI: 10.1109/tcyb.2015.2488680] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, an approximate online equilibrium solution is developed for an N -player nonzero-sum (NZS) game systems with completely unknown dynamics. First, a model identifier based on a three-layer neural network (NN) is established to reconstruct the unknown NZS games systems. Moreover, the identifier weight vector is updated based on experience replay technique which can relax the traditional persistence of excitation condition to a simplified condition on recorded data. Then, the single-network adaptive dynamic programming (ADP) with experience replay algorithm is proposed for each player to solve the coupled nonlinear Hamilton- (HJ) equations, where only the critic NN weight vectors are required to tune for each player. The feedback Nash equilibrium is provided by the solution of the coupled HJ equations. Based on the experience replay technique, a novel critic NN weights tuning law is proposed to guarantee the stability of the closed-loop system and the convergence of the value functions. Furthermore, a Lyapunov-based stability analysis shows that the uniform ultimate boundedness of the closed-loop system is achieved. Finally, two simulation examples are given to verify the effectiveness of the proposed control scheme.
Collapse
|
29
|
Liu D, Yang X, Wang D, Wei Q. Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints. IEEE TRANSACTIONS ON CYBERNETICS 2015; 45:1372-1385. [PMID: 25872221 DOI: 10.1109/tcyb.2015.2417170] [Citation(s) in RCA: 106] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The design of stabilizing controller for uncertain nonlinear systems with control constraints is a challenging problem. The constrained-input coupled with the inability to identify accurately the uncertainties motivates the design of stabilizing controller based on reinforcement-learning (RL) methods. In this paper, a novel RL-based robust adaptive control algorithm is developed for a class of continuous-time uncertain nonlinear systems subject to input constraints. The robust control problem is converted to the constrained optimal control problem with appropriately selecting value functions for the nominal system. Distinct from typical action-critic dual networks employed in RL, only one critic neural network (NN) is constructed to derive the approximate optimal control. Meanwhile, unlike initial stabilizing control often indispensable in RL, there is no special requirement imposed on the initial control. By utilizing Lyapunov's direct method, the closed-loop optimal control system and the estimated weights of the critic NN are proved to be uniformly ultimately bounded. In addition, the derived approximate optimal control is verified to guarantee the uncertain nonlinear system to be stable in the sense of uniform ultimate boundedness. Two simulation examples are provided to illustrate the effectiveness and applicability of the present approach.
Collapse
|