1
|
Wang Z, Wang X, Pang N. Dynamic event-triggered controller design for nonlinear systems: Reinforcement learning strategy. Neural Netw 2023; 163:341-353. [PMID: 37099897 DOI: 10.1016/j.neunet.2023.04.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/21/2023] [Accepted: 04/10/2023] [Indexed: 04/28/2023]
Abstract
The current investigation aims at the optimal control problem for discrete-time nonstrict-feedback nonlinear systems by invoking the reinforcement learning-based backstepping technique and neural networks. The dynamic-event-triggered control strategy introduced in this paper can alleviate the communication frequency between the actuator and controller. Based on the reinforcement learning strategy, actor-critic neural networks are employed to implement the n-order backstepping framework. Then, a neural network weight-updated algorithm is developed to minimize the computational burden and avoid the local optimal problem. Furthermore, a novel dynamic-event-triggered strategy is introduced, which can remarkably outperform the previously studied static-event-triggered strategy. Moreover, combined with the Lyapunov stability theory, all signals in the closed-loop system are strictly proven to be semiglobal uniformly ultimately bounded. Finally, the practicality of the offered control algorithms is further elucidated by the numerical simulation examples.
Collapse
Affiliation(s)
- Zichen Wang
- College of Westa, Southwest University, Chongqing, 400715, China
| | - Xin Wang
- College of Electronic and Information Engineering, Southwest University, Chongqing, 400715, China.
| | - Ning Pang
- College of Westa, Southwest University, Chongqing, 400715, China
| |
Collapse
|
2
|
Lauffenburger JC, Yom-Tov E, Keller PA, McDonnell ME, Bessette LG, Fontanet CP, Sears ES, Kim E, Hanken K, Buckley JJ, Barlev RA, Haff N, Choudhry NK. REinforcement learning to improve non-adherence for diabetes treatments by Optimising Response and Customising Engagement (REINFORCE): study protocol of a pragmatic randomised trial. BMJ Open 2021; 11:e052091. [PMID: 34862289 PMCID: PMC8647547 DOI: 10.1136/bmjopen-2021-052091] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
INTRODUCTION Achieving optimal diabetes control requires several daily self-management behaviours, especially adherence to medication. Evidence supports the use of text messages to support adherence, but there remains much opportunity to improve their effectiveness. One key limitation is that message content has been generic. By contrast, reinforcement learning is a machine learning method that can be used to identify individuals' patterns of responsiveness by observing their response to cues and then optimising them accordingly. Despite its demonstrated benefits outside of healthcare, its application to tailoring communication for patients has received limited attention. The objective of this trial is to test the impact of a reinforcement learning-based text messaging programme on adherence to medication for patients with type 2 diabetes. METHODS AND ANALYSIS In the REinforcement learning to Improve Non-adherence For diabetes treatments by Optimising Response and Customising Engagement (REINFORCE) trial, we are randomising 60 patients with suboptimal diabetes control treated with oral diabetes medications to receive a reinforcement learning intervention or control. Subjects in both arms will receive electronic pill bottles to use, and those in the intervention arm will receive up to daily text messages. The messages will be individually adapted using a reinforcement learning prediction algorithm based on daily adherence measurements from the pill bottles. The trial's primary outcome is average adherence to medication over the 6-month follow-up period. Secondary outcomes include diabetes control, measured by glycated haemoglobin A1c, and self-reported adherence. In sum, the REINFORCE trial will evaluate the effect of personalising the framing of text messages for patients to support medication adherence and provide insight into how this could be adapted at scale to improve other self-management interventions. ETHICS AND DISSEMINATION This study was approved by the Mass General Brigham Institutional Review Board (IRB) (USA). Findings will be disseminated through peer-reviewed journals, clinicaltrials.gov reporting and conferences. TRIAL REGISTRATION NUMBER Clinicaltrials.gov (NCT04473326).
Collapse
Affiliation(s)
- Julie C Lauffenburger
- Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Elad Yom-Tov
- Microsoft Research, Microsoft, Herzeliya, Israel
| | - Punam A Keller
- Tuck School of Business, Dartmouth College, Hanover, NH, USA
| | - Marie E McDonnell
- Endocrinology, Diabetes and Hypertension, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Lily G Bessette
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Constance P Fontanet
- Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Ellen S Sears
- Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Erin Kim
- Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Kaitlin Hanken
- Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - J Joseph Buckley
- Division of Sleep Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Renee A Barlev
- Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Nancy Haff
- Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Niteesh K Choudhry
- Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
3
|
Elkenawy A, El-Nagar AM, El-Bardini M, El-Rabaie NM. Full-state neural network observer-based hybrid quantum diagonal recurrent neural network adaptive tracking control. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05685-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
4
|
Ni X, Wen S, Wang H, Guo Z, Zhu S, Huang T. Observer-Based Quasi-Synchronization of Delayed Dynamical Networks With Parameter Mismatch Under Impulsive Effect. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3046-3055. [PMID: 32745009 DOI: 10.1109/tnnls.2020.3009271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article focuses on the observer-based quasi-synchronization problem of delayed dynamical networks with parameter mismatch under impulsive effect. First, since the state of each node is unknown in the real situation, the state estimation strategy is proposed to estimate the state of each node, so as to design an appropriate synchronization controller. Then, the corresponding controller is constructed to synchronize the slave nodes with their leader node. In this article, we take the impulsive effect into consideration, which means that an impulsive signal will be applied to the system every so often. Due to the existence of parameter mismatch and time-varying delay, by constructing an appropriate Lyapunouv function, we will eventually obtain a differential equation with constant and time-varying delay terms. Then, we analyze its trajectory by introducing the Cauchy matrix and prove its boundedness by contradiction. Finally, a numerical simulation is presented to illustrate the validness of obtained results.
Collapse
|
5
|
Calafiore GC, Possieri C. Output Feedback Q-Learning for Linear-Quadratic Discrete-Time Finite-Horizon Control Problems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3274-3281. [PMID: 32745011 DOI: 10.1109/tnnls.2020.3010304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
An algorithm is proposed to determine output feedback policies that solve finite-horizon linear-quadratic (LQ) optimal control problems without requiring knowledge of the system dynamical matrices. To reach this goal, the Q -factors arising from finite-horizon LQ problems are first characterized in the state feedback case. It is then shown how they can be parameterized as functions of the input-output vectors. A procedure is then proposed for estimating these functions from input/output data and using these estimates for computing the optimal control via the measured inputs and outputs.
Collapse
|
6
|
Bai W, Li T, Tong S. NN Reinforcement Learning Adaptive Control for a Class of Nonstrict-Feedback Discrete-Time Systems. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4573-4584. [PMID: 31995515 DOI: 10.1109/tcyb.2020.2963849] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article investigates an adaptive reinforcement learning (RL) optimal control design problem for a class of nonstrict-feedback discrete-time systems. Based on the neural network (NN) approximating ability and RL control design technique, an adaptive backstepping RL optimal controller and a minimal learning parameter (MLP) adaptive RL optimal controller are developed by establishing a novel strategic utility function and introducing external function terms. It is proved that the proposed adaptive RL optimal controllers can guarantee that all signals in the closed-loop systems are semiglobal uniformly ultimately bounded (SGUUB). The main feature is that the proposed schemes can solve the optimal control problem that the previous literature cannot deal with. Furthermore, the proposed MPL adaptive optimal control scheme can reduce the number of adaptive laws, and thus the computational complexity is decreased. Finally, the simulation results illustrate the validity of the proposed optimal control schemes.
Collapse
|
7
|
Xu W, Liu X, Wang H, Zhou Y. Event-based optimal output-feedback control of nonlinear discrete-time systems. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.05.098] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
8
|
Zhao B, Liu D, Luo C. Reinforcement Learning-Based Optimal Stabilization for Unknown Nonlinear Systems Subject to Inputs With Uncertain Constraints. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4330-4340. [PMID: 31899437 DOI: 10.1109/tnnls.2019.2954983] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article presents a novel reinforcement learning strategy that addresses an optimal stabilizing problem for unknown nonlinear systems subject to uncertain input constraints. The control algorithm is composed of two parts, i.e., online learning optimal control for the nominal system and feedforward neural networks (NNs) compensation for handling uncertain input constraints, which are considered as the saturation nonlinearities. Integrating the input-output data and recurrent NN, a Luenberger observer is established to approximate the unknown system dynamics. For nominal systems without input constraints, the online learning optimal control policy is derived by solving Hamilton-Jacobi-Bellman equation via a critic NN alone. By transforming the uncertain input constraints to saturation nonlinearities, the uncertain input constraints can be compensated by employing a feedforward NN compensator. The convergence of the closed-loop system is guaranteed to be uniformly ultimately bounded by using the Lyapunov stability analysis. Finally, the effectiveness of the developed stabilization scheme is illustrated by simulation studies.
Collapse
|
9
|
Huang M, Liu C, He X, Ma L, Lu Z, Su H. Reinforcement Learning-Based Control for Nonlinear Discrete-Time Systems with Unknown Control Directions and Control Constraints. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.03.061] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
10
|
Guo X, Yan W, Cui R. Event-Triggered Reinforcement Learning-Based Adaptive Tracking Control for Completely Unknown Continuous-Time Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:3231-3242. [PMID: 30946687 DOI: 10.1109/tcyb.2019.2903108] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, event-triggered reinforcement learning-based adaptive tracking control is developed for the continuous-time nonlinear system with unknown dynamics and external disturbances. The critic and action neural networks are designed to approximate an unknown long-term performance index and controller, respectively. The dead-zone event-triggered condition is developed to reduce communication and computational costs. Rigorous theoretical analysis is provided to show that the closed-loop system can be stabilized. The weight errors and the filtered tracking error are all uniformly ultimately bounded. Finally, to demonstrate the developed controller, the simulation results are provided using an autonomous underwater vehicle model.
Collapse
|
11
|
Rizvi SAA, Lin Z. Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:1523-1536. [PMID: 30296242 DOI: 10.1109/tnnls.2018.2870075] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Approximate dynamic programming (ADP) and reinforcement learning (RL) have emerged as important tools in the design of optimal and adaptive control systems. Most of the existing RL and ADP methods make use of full-state feedback, a requirement that is often difficult to satisfy in practical applications. As a result, output feedback methods are more desirable as they relax this requirement. In this paper, we present a new output feedback-based Q-learning approach to solving the linear quadratic regulation (LQR) control problem for discrete-time systems. The proposed scheme is completely online in nature and works without requiring the system dynamics information. More specifically, a new representation of the LQR Q-function is developed in terms of the input-output data. Based on this new Q-function representation, output feedback LQR controllers are designed. We present two output feedback iterative Q-learning algorithms based on the policy iteration and the value iteration methods. This scheme has the advantage that it does not incur any excitation noise bias, and therefore, the need of using discounted cost functions is circumvented, which in turn ensures closed-loop stability. It is shown that the proposed algorithms converge to the solution of the LQR Riccati equation. A comprehensive simulation study is carried out, which illustrates the proposed scheme.
Collapse
|
12
|
Zhu J, Zhu J, Wang Z, Guo S, Xu C. Hierarchical Decision and Control for Continuous Multitarget Problem: Policy Evaluation With Action Delay. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:464-473. [PMID: 29994732 DOI: 10.1109/tnnls.2018.2844466] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper proposes a hierarchical decision-making and control algorithm for the shepherd game, the seventh mission in the International Aerial Robotics Competition (IARC). In this game, the agent (a multirotor aerial robot) is required to contact targets (ground vehicles) sequentially and drive them to a certain boundary to earn score. During the game of 10 min, the agent should be fully autonomous without any human interference. Regarding the lower-level controller and dynamics of the agent, each action takes a duration of time to accomplish. Denoted as an action delay, in this paper, this action duration is nonconstant and is related to the final reward. Therefore, the challenging point is making the agent "aware of time" when applying a certain action. We solve this problem by two approaches: deep Q-networks and lookup table. The action delay predictor in the decision-level is fitted by a lower-level controller. Through simulations by the example of the shepherd game, the effectiveness and efficiency of this approach are validated. This paper helps our team winning the first prize in IARC 2017, and keeps the best record of this mission since it was released in 2013.
Collapse
|
13
|
Adaptive neural network tracking control-based reinforcement learning for wheeled mobile robots with skidding and slipping. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.12.051] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
14
|
Adaptive tracking control for a class of continuous-time uncertain nonlinear systems using the approximate solution of HJB equation. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.04.043] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
15
|
Mu C, Ni Z, Sun C, He H. Data-Driven Tracking Control With Adaptive Dynamic Programming for a Class of Continuous-Time Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:1460-1470. [PMID: 27116758 DOI: 10.1109/tcyb.2016.2548941] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A data-driven adaptive tracking control approach is proposed for a class of continuous-time nonlinear systems using a recent developed goal representation heuristic dynamic programming (GrHDP) architecture. The major focus of this paper is on designing a multivariable tracking scheme, including the filter-based action network (FAN) architecture, and the stability analysis in continuous-time fashion. In this design, the FAN is used to observe the system function, and then generates the corresponding control action together with the reference signals. The goal network will provide an internal reward signal adaptively based on the current system states and the control action. This internal reward signal is assigned as the input for the critic network, which approximates the cost function over time. We demonstrate its improved tracking performance in comparison with the existing heuristic dynamic programming (HDP) approach under the same parameter and environment settings. The simulation results of the multivariable tracking control on two examples have been presented to show that the proposed scheme can achieve better control in terms of learning speed and overall performance.
Collapse
|
16
|
Mu C, Ni Z, Sun C, He H. Air-Breathing Hypersonic Vehicle Tracking Control Based on Adaptive Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:584-598. [PMID: 26863677 DOI: 10.1109/tnnls.2016.2516948] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we propose a data-driven supplementary control approach with adaptive learning capability for air-breathing hypersonic vehicle tracking control based on action-dependent heuristic dynamic programming (ADHDP). The control action is generated by the combination of sliding mode control (SMC) and the ADHDP controller to track the desired velocity and the desired altitude. In particular, the ADHDP controller observes the differences between the actual velocity/altitude and the desired velocity/altitude, and then provides a supplementary control action accordingly. The ADHDP controller does not rely on the accurate mathematical model function and is data driven. Meanwhile, it is capable to adjust its parameters online over time under various working conditions, which is very suitable for hypersonic vehicle system with parameter uncertainties and disturbances. We verify the adaptive supplementary control approach versus the traditional SMC in the cruising flight, and provide three simulation studies to illustrate the improved performance with the proposed approach.
Collapse
|
17
|
Gorban AN, Tyukin IY, Prokhorov DV, Sofeikov KI. Approximation with random bases: Pro et Contra. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2015.09.021] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
18
|
Luo B, Wu HN, Huang T, Liu D. Reinforcement learning solution for HJB equation arising in constrained optimal control problem. Neural Netw 2015; 71:150-8. [DOI: 10.1016/j.neunet.2015.08.007] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2015] [Revised: 08/15/2015] [Accepted: 08/16/2015] [Indexed: 11/15/2022]
|
19
|
Esfandiari K, Abdollahi F, Talebi HA. Adaptive control of uncertain nonaffine nonlinear systems with input saturation using neural networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:2311-2322. [PMID: 25532213 DOI: 10.1109/tnnls.2014.2378991] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This paper presents a tracking control methodology for a class of uncertain nonlinear systems subject to input saturation constraint and external disturbances. Unlike most previous approaches on saturated systems, which assumed affine nonlinear systems, in this paper, tracking control problem is solved for uncertain nonaffine nonlinear systems with input saturation. To deal with the saturation constraint, an auxiliary system is constructed and a modified tracking error is defined. Then, by employing implicit function theorem, mean value theorem, and modified tracking error, updating rules are derived based on the well-known back-propagation (BP) algorithm, which has been proven to be the most relevant updating rule to control problems. However, most of the previous approaches on BP algorithm suffer from lack of stability analysis. By injecting a damping term to the standard BP algorithm, uniformly ultimately boundedness of all the signals of the closed-loop system is ensured via Lyapunov's direct method. Furthermore, the presented approach employs nonlinear in parameter neural networks. Hence, the proposed scheme is applicable to systems with higher degrees of nonlinearity. Using a high-gain observer to reconstruct the states of the system, an output feedback controller is also presented. Finally, the simulation results performed on a Duffing-Holmes chaotic system, a generalized pendulum-type system, and a numerical system are presented to demonstrate the effectiveness of the suggested state and output feedback control schemes.
Collapse
|
20
|
Zhong X, He H, Zhang H, Wang Z. A neural network based online learning and control approach for Markov jump systems. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2014.01.060] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
21
|
Liu YJ, Tang L, Tong S, Chen CLP, Li DJ. Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:165-176. [PMID: 25438326 DOI: 10.1109/tnnls.2014.2360724] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Based on the neural network (NN) approximator, an online reinforcement learning algorithm is proposed for a class of affine multiple input and multiple output (MIMO) nonlinear discrete-time systems with unknown functions and disturbances. In the design procedure, two networks are provided where one is an action network to generate an optimal control signal and the other is a critic network to approximate the cost function. An optimal control signal and adaptation laws can be generated based on two NNs. In the previous approaches, the weights of critic and action networks are updated based on the gradient descent rule and the estimations of optimal weight vectors are directly adjusted in the design. Consequently, compared with the existing results, the main contributions of this paper are: 1) only two parameters are needed to be adjusted, and thus the number of the adaptation laws is smaller than the previous results and 2) the updating parameters do not depend on the number of the subsystems for MIMO systems and the tuning rules are replaced by adjusting the norms on optimal weight vectors in both action and critic networks. It is proven that the tracking errors, the adaptation laws, and the control inputs are uniformly bounded using Lyapunov analysis method. The simulation examples are employed to illustrate the effectiveness of the proposed algorithm.
Collapse
|
22
|
Yang X, Liu D, Wang D, Wei Q. Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning. Neural Netw 2014; 55:30-41. [DOI: 10.1016/j.neunet.2014.03.008] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2013] [Revised: 02/08/2014] [Accepted: 03/20/2014] [Indexed: 11/30/2022]
|
23
|
Masaud K, Macnab C. Preventing bursting in adaptive control using an introspective neural network algorithm. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2014.01.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
24
|
|
25
|
Ni Z, He H, Wen J. Adaptive learning in tracking control based on the dual critic network design. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013; 24:913-928. [PMID: 24808473 DOI: 10.1109/tnnls.2013.2247627] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In this paper, we present a new adaptive dynamic programming approach by integrating a reference network that provides an internal goal representation to help the systems learning and optimization. Specifically, we build the reference network on top of the critic network to form a dual critic network design that contains the detailed internal goal representation to help approximate the value function. This internal goal signal, working as the reinforcement signal for the critic network in our design, is adaptively generated by the reference network and can also be adjusted automatically. In this way, we provide an alternative choice rather than crafting the reinforcement signal manually from prior knowledge. In this paper, we adopt the online action-dependent heuristic dynamic programming (ADHDP) design and provide the detailed design of the dual critic network structure. Detailed Lyapunov stability analysis for our proposed approach is presented to support the proposed structure from a theoretical point of view. Furthermore, we also develop a virtual reality platform to demonstrate the real-time simulation of our approach under different disturbance situations. The overall adaptive learning performance has been tested on two tracking control benchmarks with a tracking filter. For comparative studies, we also present the tracking performance with the typical ADHDP, and the simulation results justify the improved performance with our approach.
Collapse
|
26
|
Qinmin Yang, Jagannathan S. Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems using Online Approximators. ACTA ACUST UNITED AC 2012; 42:377-90. [DOI: 10.1109/tsmcb.2011.2166384] [Citation(s) in RCA: 131] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
27
|
Lei Yang, Si J, Tsakalis K, Rodriguez A. Direct Heuristic Dynamic Programming for Nonlinear Tracking Control With Filtered Tracking Error. ACTA ACUST UNITED AC 2009; 39:1617-22. [DOI: 10.1109/tsmcb.2009.2021950] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
28
|
Shih P, Kaul B, Jagannathan S, Drallmeier J. Reinforcement-Learning-Based Output-Feedback Control of Nonstrict Nonlinear Discrete-Time Systems With Application to Engine Emission Control. ACTA ACUST UNITED AC 2009; 39:1162-79. [DOI: 10.1109/tsmcb.2009.2013272] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
29
|
ZHANG Y, LIANG X, YANG P, CHEN Z, YUAN Z. Modeling and Control of Nonlinear Discrete-time Systems Based on Compound Neural Networks. Chin J Chem Eng 2009. [DOI: 10.1016/s1004-9541(08)60230-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
30
|
|
31
|
Shih P, Kaul BC, Jagannathan S, Drallmeier JA. Reinforcement-learning-based dual-control methodology for complex nonlinear discrete-time systems with application to spark engine EGR operation. IEEE TRANSACTIONS ON NEURAL NETWORKS 2008; 19:1369-88. [PMID: 18701368 DOI: 10.1109/tnn.2008.2000452] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A novel reinforcement-learning-based dual-control methodology adaptive neural network (NN) controller is developed to deliver a desired tracking performance for a class of complex feedback nonlinear discrete-time systems, which consists of a second-order nonlinear discrete-time system in nonstrict feedback form and an affine nonlinear discrete-time system, in the presence of bounded and unknown disturbances. For example, the exhaust gas recirculation (EGR) operation of a spark ignition (SI) engine is modeled by using such a complex nonlinear discrete-time system. A dual-controller approach is undertaken where primary adaptive critic NN controller is designed for the nonstrict feedback nonlinear discrete-time system whereas the secondary one for the affine nonlinear discrete-time system but the controllers together offer the desired performance. The primary adaptive critic NN controller includes an NN observer for estimating the states and output, an NN critic, and two action NNs for generating virtual control and actual control inputs for the nonstrict feedback nonlinear discrete-time system, whereas an additional critic NN and an action NN are included for the affine nonlinear discrete-time system by assuming the state availability. All NN weights adapt online towards minimization of a certain performance index, utilizing gradient-descent-based rule. Using Lyapunov theory, the uniformly ultimate boundedness (UUB) of the closed-loop tracking error, weight estimates, and observer estimates are shown. The adaptive critic NN controller performance is evaluated on an SI engine operating with high EGR levels where the controller objective is to reduce cyclic dispersion in heat release while minimizing fuel intake. Simulation and experimental results indicate that engine out emissions drop significantly at 20% EGR due to reduction in dispersion in heat release thus verifying the dual-control approach.
Collapse
Affiliation(s)
- Peter Shih
- Department of Electrical and Computer Engineering, University of Science and Technology, Rolla, MO 65409, USA.
| | | | | | | |
Collapse
|
32
|
Al-Tamimi A, Lewis F, Abu-Khalaf M. Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof. ACTA ACUST UNITED AC 2008; 38:943-9. [DOI: 10.1109/tsmcb.2008.926614] [Citation(s) in RCA: 702] [Impact Index Per Article: 43.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
33
|
Wang L, Wan C. Comments on "The Extreme Learning Machine. ACTA ACUST UNITED AC 2008; 19:1494-5; author reply 1495-6. [DOI: 10.1109/tnn.2008.2002273] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
34
|
Lewis FL, Huang J, Parisini T, Prokhorov DV, Wunsch DC. Special issue on neural networks for feedback control systems. ACTA ACUST UNITED AC 2007; 18:969-72. [PMID: 17668654 DOI: 10.1109/tnn.2007.902966] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
35
|
He P, Jagannathan S. Reinforcement Learning Neural-Network-Based Controller for Nonlinear Discrete-Time Systems With Input Constraints. ACTA ACUST UNITED AC 2007; 37:425-36. [PMID: 17416169 DOI: 10.1109/tsmcb.2006.883869] [Citation(s) in RCA: 177] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
A novel adaptive-critic-based neural network (NN) controller in discrete time is designed to deliver a desired tracking performance for a class of nonlinear systems in the presence of actuator constraints. The constraints of the actuator are treated in the controller design as the saturation nonlinearity. The adaptive critic NN controller architecture based on state feedback includes two NNs: the critic NN is used to approximate the "strategic" utility function, whereas the action NN is employed to minimize both the strategic utility function and the unknown nonlinear dynamic estimation errors. The critic and action NN weight updates are derived by minimizing certain quadratic performance indexes. Using the Lyapunov approach and with novel weight updates, the uniformly ultimate boundedness of the closed-loop tracking error and weight estimates is shown in the presence of NN approximation errors and bounded unknown disturbances. The proposed NN controller works in the presence of multiple nonlinearities, unlike other schemes that normally approximate one nonlinearity. Moreover, the adaptive critic NN controller does not require an explicit offline training phase, and the NN weights can be initialized at zero or random. Simulation results justify the theoretical analysis.
Collapse
Affiliation(s)
- Pingan He
- Department of Electrical and Computer Engineering, University of Missouri-Rolla, Rolla, MO 65409, USA.
| | | |
Collapse
|