1
|
Wang Z, Wang X, Pang N. Dynamic event-triggered controller design for nonlinear systems: Reinforcement learning strategy. Neural Netw 2023; 163:341-353. [PMID: 37099897 DOI: 10.1016/j.neunet.2023.04.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/21/2023] [Accepted: 04/10/2023] [Indexed: 04/28/2023]
Abstract
The current investigation aims at the optimal control problem for discrete-time nonstrict-feedback nonlinear systems by invoking the reinforcement learning-based backstepping technique and neural networks. The dynamic-event-triggered control strategy introduced in this paper can alleviate the communication frequency between the actuator and controller. Based on the reinforcement learning strategy, actor-critic neural networks are employed to implement the n-order backstepping framework. Then, a neural network weight-updated algorithm is developed to minimize the computational burden and avoid the local optimal problem. Furthermore, a novel dynamic-event-triggered strategy is introduced, which can remarkably outperform the previously studied static-event-triggered strategy. Moreover, combined with the Lyapunov stability theory, all signals in the closed-loop system are strictly proven to be semiglobal uniformly ultimately bounded. Finally, the practicality of the offered control algorithms is further elucidated by the numerical simulation examples.
Collapse
Affiliation(s)
- Zichen Wang
- College of Westa, Southwest University, Chongqing, 400715, China
| | - Xin Wang
- College of Electronic and Information Engineering, Southwest University, Chongqing, 400715, China.
| | - Ning Pang
- College of Westa, Southwest University, Chongqing, 400715, China
| |
Collapse
|
2
|
Wang Z, Wang X. Fault-tolerant control for nonlinear systems with a dead zone: Reinforcement learning approach. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:6334-6357. [PMID: 37161110 DOI: 10.3934/mbe.2023274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
This paper focuses on the adaptive reinforcement learning-based optimal control problem for standard nonstrict-feedback nonlinear systems with the actuator fault and an unknown dead zone. To simultaneously reduce the computational complexity and eliminate the local optimal problem, a novel neural network weight updated algorithm is presented to replace the classic gradient descent method. By utilizing the backstepping technique, the actor critic-based reinforcement learning control strategy is developed for high-order nonlinear nonstrict-feedback systems. In addition, two auxiliary parameters are presented to deal with the input dead zone and actuator fault respectively. All signals in the system are proven to be semi-globally uniformly ultimately bounded by Lyapunov theory analysis. At the end of the paper, some simulation results are shown to illustrate the remarkable effect of the proposed approach.
Collapse
Affiliation(s)
- Zichen Wang
- College of Westa, Southwest University, Chongqing 400715, China
| | - Xin Wang
- College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China
| |
Collapse
|
3
|
Li K, Li Y. Adaptive NN Optimal Consensus Fault-Tolerant Control for Stochastic Nonlinear Multiagent Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:947-957. [PMID: 34432637 DOI: 10.1109/tnnls.2021.3104839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article investigates the problem of adaptive neural network (NN) optimal consensus tracking control for nonlinear multiagent systems (MASs) with stochastic disturbances and actuator bias faults. In control design, NN is adopted to approximate the unknown nonlinear dynamic, and a state identifier is constructed. The fault estimator is designed to solve the problem raised by time-varying actuator bias fault. By utilizing adaptive dynamic programming (ADP) in identifier-critic-actor construction, an adaptive NN optimal consensus fault-tolerant control algorithm is presented. It is proven that all signals of the controlled system are uniformly ultimately bounded (UUB) in probability, and all states of the follower agents can remain consensus with the leader's state. Finally, simulation results are given to illustrate the effectiveness of the developed optimal consensus control scheme and theorem.
Collapse
|
4
|
Li H, Wu Y, Chen M, Lu R. Adaptive Multigradient Recursive Reinforcement Learning Event-Triggered Tracking Control for Multiagent Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:144-156. [PMID: 34197328 DOI: 10.1109/tnnls.2021.3090570] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article proposes a fault-tolerant adaptive multigradient recursive reinforcement learning (RL) event-triggered tracking control scheme for strict-feedback discrete-time multiagent systems. The multigradient recursive RL algorithm is used to avoid the local optimal problem that may exist in the gradient descent scheme. Different from the existing event-triggered control results, a new lemma about the relative threshold event-triggered control strategy is proposed to handle the compensation error, which can improve the utilization of communication resources and weaken the negative impact on tracking accuracy and closed-loop system stability. To overcome the difficulty caused by sensor fault, a distributed control method is introduced by adopting the adaptive compensation technique, which can effectively decrease the number of online estimation parameters. Furthermore, by using the multigradient recursive RL algorithm with less learning parameters, the online estimation time can be effectively reduced. The stability of closed-loop multiagent systems is proved by using the Lyapunov stability theorem, and it is verified that all signals are semiglobally uniformly ultimately bounded. Finally, two simulation examples are given to show the availability of the presented control scheme.
Collapse
|
5
|
Bai W, Li T, Long Y, Chen CLP. Event-Triggered Multigradient Recursive Reinforcement Learning Tracking Control for Multiagent Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:366-379. [PMID: 34270435 DOI: 10.1109/tnnls.2021.3094901] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, the tracking control problem of event-triggered multigradient recursive reinforcement learning is investigated for nonlinear multiagent systems (MASs). Attention is focused on the distributed reinforcement learning approach for MASs. The critic neural network (NN) is applied to estimate the long-term strategic utility function, and the actor NN is designed to approximate the uncertain dynamics in MASs. The multigradient recursive (MGR) strategy is tailored to learn the weight vector in NN, which eliminates the local optimal problem inherent in gradient descent method and decreases the dependence of initial value. Furthermore, reinforcement learning and event-triggered mechanism can improve the energy conservation of MASs by decreasing the amplitude of the controller signal and the controller update frequency, respectively. It is proved that all signals in MASs are semiglobal uniformly ultimately bounded (SGUUB) according to the Lyapunov theory. Simulation results are given to demonstrate the effectiveness of the proposed strategy.
Collapse
|
6
|
Xu W, Liu X, Wang H, Zhou Y. Event-Triggered Adaptive NN Tracking Control for MIMO Nonlinear Discrete-Time Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:7414-7424. [PMID: 34129504 DOI: 10.1109/tnnls.2021.3084965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article concentrates on the design of a novel event-based adaptive neural network (NN) control algorithm for a class of multiple-input-multiple-output (MIMO) nonlinear discrete-time systems. A controller is designed through a novel recursive design procedure, under which the dependence on virtual controls is avoided and only system states are needed. The numbers of the event-triggered conditions and parameters updated online in each subsystem reduce to only one, which largely reduces the computation burden and simplifies the algorithm realization. In this case, radial basis function NNs (RBFNNs) are employed to approximate the control input. The semiglobal uniformly ultimate boundedness (SGUUB) of all the signals in the closed-loop system is guaranteed by the Lyapunov difference approach. The effectiveness of the proposed algorithm is validated by a simulation example.
Collapse
|
7
|
Han H, Zhang J, Yang H, Hou Y, Qiao J. Data-Driven Robust Optimal Control for Nonlinear System with Uncertain Disturbances. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
8
|
Wei Q, Han L, Zhang T. Spiking Adaptive Dynamic Programming Based on Poisson Process for Discrete-Time Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1846-1856. [PMID: 34143743 DOI: 10.1109/tnnls.2021.3085781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, a new iterative spiking adaptive dynamic programming (SADP) method based on the Poisson process is developed to solve optimal impulsive control problems. For a fixed time interval, combining the Poisson process and the maximum likelihood estimation (MLE), the three-tuple of state, spiking interval, and probability of Poisson distribution can be computed, and then, the iterative value functions and iterative control laws can be obtained. A property analysis method is developed to show that the value functions converge to optimal performance index function as the iterative index increases from zero to infinity. Finally, two simulation examples are given to verify the effectiveness of the developed algorithm.
Collapse
|
9
|
Fu H, Chen X, Wang W, Wu M. Observer-Based Adaptive Synchronization Control of Unknown Discrete-Time Nonlinear Heterogeneous Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:681-693. [PMID: 33079683 DOI: 10.1109/tnnls.2020.3028569] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article is concerned with the optimal synchronization problem for discrete-time nonlinear heterogeneous multiagent systems (MASs) with an active leader. To overcome the difficulty in the derivation of the optimal control protocols for these systems, we develop an observer-based adaptive synchronization control approach, including the designs of a distributed observer and a distributed model reference adaptive controller with no prior knowledge of all agents' dynamics. To begin with, for the purpose of estimating the state of a nonlinear active leader for each follower, an adaptive neural network distributed observer is designed. Such an observer serves as a reference model in the distributed model reference adaptive control (MRAC). Then, a reinforcement learning-based distributed MRAC algorithm is presented to make every follower track its corresponding reference model on behavior in real time. In this algorithm, a distributed actor-critic network is employed to approximate the optimal distributed control protocols and the cost function. Through convergence analysis, the overall observer estimation error, the model reference tracking error, and the weight estimation errors are proved to be uniformly ultimately bounded. The developed approach further achieves the synchronization by means of synthesizing these results. The effectiveness of the developed approach is verified through a numerical example.
Collapse
|
10
|
Zhang F, Wu W, Hu J, Wang C. Deterministic learning from neural control for a class of sampled-data nonlinear systems. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.02.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
11
|
Abstract
A general control system tracking learning framework is proposed, by which an optimal learned tracking behavior called ‘primitive’ is extrapolated to new unseen trajectories without requiring relearning. This is considered intelligent behavior and strongly related to the neuro-motor cognitive control of biological (human-like) systems that deliver suboptimal executions for tasks outside of their current knowledge base, by using previously memorized experience. However, biological systems do not solve explicit mathematical equations for solving learning and prediction tasks. This stimulates the proposed hierarchical cognitive-like learning framework, based on state-of-the-art model-free control: (1) at the low-level L1, an approximated iterative Value Iteration for linearizing the closed-loop system (CLS) behavior by a linear reference model output tracking is first employed; (2) an experiment-driven Iterative Learning Control (EDILC) applied to the CLS from the reference input to the controlled output learns simple tracking tasks called ‘primitives’ in the secondary L2 level, and (3) the tertiary level L3 extrapolates the primitives’ optimal tracking behavior to new tracking tasks without trial-based relearning. The learning framework relies only on input-output system data to build a virtual state space representation of the underlying controlled system that is assumed to be observable. It has been shown to be effective by experimental validation on a representative, coupled, nonlinear, multivariable real-world system. Able to cope with new unseen scenarios in an optimal fashion, the hierarchical learning framework is an advance toward cognitive control systems.
Collapse
|
12
|
Li H, Wu Y, Chen M. Adaptive Fault-Tolerant Tracking Control for Discrete-Time Multiagent Systems via Reinforcement Learning Algorithm. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1163-1174. [PMID: 32386171 DOI: 10.1109/tcyb.2020.2982168] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article investigates the adaptive fault-tolerant tracking control problem for a class of discrete-time multiagent systems via a reinforcement learning algorithm. The action neural networks (NNs) are used to approximate unknown and desired control input signals, and the critic NNs are employed to estimate the cost function in the design procedure. Furthermore, the direct adaptive optimal controllers are designed by combining the backstepping technique with the reinforcement learning algorithm. Comparing the existing reinforcement learning algorithm, the computational burden can be effectively reduced by using the method of less learning parameters. The adaptive auxiliary signals are established to compensate for the influence of the dead zones and actuator faults on the control performance. Based on the Lyapunov stability theory, it is proved that all signals of the closed-loop system are semiglobally uniformly ultimately bounded. Finally, some simulation results are presented to illustrate the effectiveness of the proposed approach.
Collapse
|
13
|
Chakrabarty A, Jha DK, Buzzard GT, Wang Y, Vamvoudakis KG. Safe Approximate Dynamic Programming via Kernelized Lipschitz Estimation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:405-419. [PMID: 32203039 DOI: 10.1109/tnnls.2020.2978805] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We develop a method for obtaining safe initial policies for reinforcement learning via approximate dynamic programming (ADP) techniques for uncertain systems evolving with discrete-time dynamics. We employ the kernelized Lipschitz estimation to learn multiplier matrices that are used in semidefinite programming frameworks for computing admissible initial control policies with provably high probability. Such admissible controllers enable safe initialization and constraint enforcement while providing exponential stability of the equilibrium of the closed-loop system.
Collapse
|
14
|
Bai W, Li T, Tong S. NN Reinforcement Learning Adaptive Control for a Class of Nonstrict-Feedback Discrete-Time Systems. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4573-4584. [PMID: 31995515 DOI: 10.1109/tcyb.2020.2963849] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article investigates an adaptive reinforcement learning (RL) optimal control design problem for a class of nonstrict-feedback discrete-time systems. Based on the neural network (NN) approximating ability and RL control design technique, an adaptive backstepping RL optimal controller and a minimal learning parameter (MLP) adaptive RL optimal controller are developed by establishing a novel strategic utility function and introducing external function terms. It is proved that the proposed adaptive RL optimal controllers can guarantee that all signals in the closed-loop systems are semiglobal uniformly ultimately bounded (SGUUB). The main feature is that the proposed schemes can solve the optimal control problem that the previous literature cannot deal with. Furthermore, the proposed MPL adaptive optimal control scheme can reduce the number of adaptive laws, and thus the computational complexity is decreased. Finally, the simulation results illustrate the validity of the proposed optimal control schemes.
Collapse
|
15
|
Xu W, Liu X, Wang H, Zhou Y. Event-based optimal output-feedback control of nonlinear discrete-time systems. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.05.098] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
16
|
Morales L, Aguilar J, Rosales A, Chávez D, Leica P. Modeling and control of nonlinear systems using an Adaptive LAMDA approach. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106571] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
17
|
Cao W, Yang Q. Online sequential extreme learning machine based adaptive control for wastewater treatment plant. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.05.109] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
18
|
Huang M, Liu C, He X, Ma L, Lu Z, Su H. Reinforcement Learning-Based Control for Nonlinear Discrete-Time Systems with Unknown Control Directions and Control Constraints. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.03.061] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
19
|
Bai W, Zhou Q, Li T, Li H. Adaptive Reinforcement Learning Neural Network Control for Uncertain Nonlinear System With Input Saturation. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:3433-3443. [PMID: 31251205 DOI: 10.1109/tcyb.2019.2921057] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, an adaptive neural network (NN) control problem is investigated for discrete-time nonlinear systems with input saturation. Radial-basis-function (RBF) NNs, including critic NNs and action NNs, are employed to approximate the utility functions and system uncertainties, respectively. In the previous works, a gradient descent scheme is applied to update weight vectors, which may lead to local optimal problem. To circumvent this problem, a multigradient recursive (MGR) reinforcement learning scheme is proposed, which utilizes both the current gradient and the past gradients. As a consequence, the MGR scheme not only eliminates the local optimal problem but also guarantees faster convergence rate than the gradient descent scheme. Moreover, the constraint of actuator input saturation is considered. The closed-loop system stability is developed by using the Lyapunov stability theory, and it is proved that all the signals in the closed-loop system are semiglobal uniformly ultimately bounded (SGUUB). Finally, the effectiveness of the proposed approach is further validated via some simulation results.
Collapse
|
20
|
Guo X, Yan W, Cui R. Event-Triggered Reinforcement Learning-Based Adaptive Tracking Control for Completely Unknown Continuous-Time Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:3231-3242. [PMID: 30946687 DOI: 10.1109/tcyb.2019.2903108] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, event-triggered reinforcement learning-based adaptive tracking control is developed for the continuous-time nonlinear system with unknown dynamics and external disturbances. The critic and action neural networks are designed to approximate an unknown long-term performance index and controller, respectively. The dead-zone event-triggered condition is developed to reduce communication and computational costs. Rigorous theoretical analysis is provided to show that the closed-loop system can be stabilized. The weight errors and the filtered tracking error are all uniformly ultimately bounded. Finally, to demonstrate the developed controller, the simulation results are provided using an autonomous underwater vehicle model.
Collapse
|
21
|
Treesatayapun C. Knowledge-based reinforcement learning controller with fuzzy-rule network: experimental validation. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04509-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
22
|
Zhao S, Liang H, Ahn CK, Du P. Observer-based adaptive neural optimal control for discrete-time systems in nonstrict-feedback form. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.03.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
23
|
Li J, Chai T, Lewis FL, Ding Z, Jiang Y. Off-Policy Interleaved Q -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:1308-1320. [PMID: 30273155 DOI: 10.1109/tnnls.2018.2861945] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, a novel off-policy interleaved Q-learning algorithm is presented for solving optimal control problem of affine nonlinear discrete-time (DT) systems, using only the measured data along the system trajectories. Affine nonlinear feature of systems, unknown dynamics, and off-policy learning approach pose tremendous challenges on approximating optimal controllers. To this end, on-policy Q-learning method for optimal control of affine nonlinear DT systems is reviewed first, and its convergence is rigorously proven. The bias of solution to Q-function-based Bellman equation caused by adding probing noises to systems for satisfying persistent excitation is also analyzed when using on-policy Q-learning approach. Then, a behavior control policy is introduced followed by proposing an off-policy Q-learning algorithm. Meanwhile, the convergence of algorithm and no bias of solution to optimal control problem when adding probing noise to systems are investigated. Third, three neural networks run by the interleaved Q-learning approach in the actor-critic framework. Thus, a novel off-policy interleaved Q-learning algorithm is derived, and its convergence is proven. Simulation results are given to verify the effectiveness of the proposed method.
Collapse
|
24
|
Data-Driven Model-Free Tracking Reinforcement Learning Control with VRFT-based Adaptive Actor-Critic. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9091807] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This paper proposes a neural network (NN)-based control scheme in an Adaptive Actor-Critic (AAC) learning framework designed for output reference model tracking, as a representative deep-learning application. The control learning scheme is model-free with respect to the process model. AAC designs usually require an initial controller to start the learning process; however, systematic guidelines for choosing the initial controller are not offered in the literature, especially in a model-free manner. Virtual Reference Feedback Tuning (VRFT) is proposed for obtaining an initially stabilizing NN nonlinear state-feedback controller, designed from input-state-output data collected from the process in open-loop setting. The solution offers systematic design guidelines for initial controller design. The resulting suboptimal state-feedback controller is next improved under the AAC learning framework by online adaptation of a critic NN and a controller NN. The mixed VRFT-AAC approach is validated on a multi-input multi-output nonlinear constrained coupled vertical two-tank system. Discussions on the control system behavior are offered together with comparisons with similar approaches.
Collapse
|
25
|
Liu YJ, Li S, Tong S, Chen CLP. Adaptive Reinforcement Learning Control Based on Neural Approximation for Nonlinear Discrete-Time Systems With Unknown Nonaffine Dead-Zone Input. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:295-305. [PMID: 29994726 DOI: 10.1109/tnnls.2018.2844165] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, an optimal control algorithm is designed for uncertain nonlinear systems in discrete-time, which are in nonaffine form and with unknown dead-zone. The main contributions of this paper are that an optimal control algorithm is for the first time framed in this paper for nonlinear systems with nonaffine dead-zone, and the adaptive parameter law for dead-zone is calculated by using the gradient rules. The mean value theory is employed to deal with the nonaffine dead-zone input and the implicit function theory based on reinforcement learning is appropriately introduced to find an unknown ideal controller which is approximated by using the action network. Other neural networks are taken as the critic networks to approximate the strategic utility functions. Based on the Lyapunov stability analysis theory, we can prove the stability of systems, i.e., the optimal control laws can guarantee that all the signals in the closed-loop system are bounded and the tracking errors are converged to a small compact set. Finally, two simulation examples demonstrate the effectiveness of the design algorithm.
Collapse
|
26
|
Luo B, Yang Y, Liu D. Adaptive -Learning for Data-Based Optimal Output Regulation With Experience Replay. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:3337-3348. [PMID: 29994038 DOI: 10.1109/tcyb.2018.2821369] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, the data-based optimal output regulation problem of discrete-time systems is investigated. An off-policy adaptive -learning (QL) method is developed by using real system data without requiring the knowledge of system dynamics and the mathematical model of utility function. By introducing the -function, an off-policy adaptive QL algorithm is developed to learn the optimal -function. An adaptive parameter in the policy evaluation is used to achieve tradeoff between the current and future -functions. The convergence of adaptive QL algorithm is proved and the influence of the adaptive parameter is analyzed. To realize the adaptive QL algorithm with real system data, the actor-critic neural network (NN) structure is developed. The least-squares scheme and the batch gradient descent method are developed to update the critic and actor NN weights, respectively. The experience replay technique is employed in the learning process, which leads to simple and convenient implementation of the adaptive QL method. Finally, the effectiveness of the developed adaptive QL method is verified through numerical simulations.
Collapse
|
27
|
Xu B, Yang D, Shi Z, Pan Y, Chen B, Sun F. Online Recorded Data-Based Composite Neural Control of Strict-Feedback Systems With Application to Hypersonic Flight Dynamics. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:3839-3849. [PMID: 28952951 DOI: 10.1109/tnnls.2017.2743784] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper investigates the online recorded data-based composite neural control of uncertain strict-feedback systems using the backstepping framework. In each step of the virtual control design, neural network (NN) is employed for uncertainty approximation. In previous works, most designs are directly toward system stability ignoring the fact how the NN is working as an approximator. In this paper, to enhance the learning ability, a novel prediction error signal is constructed to provide additional correction information for NN weight update using online recorded data. In this way, the neural approximation precision is highly improved, and the convergence speed can be faster. Furthermore, the sliding mode differentiator is employed to approximate the derivative of the virtual control signal, and thus, the complex analysis of the backstepping design can be avoided. The closed-loop stability is rigorously established, and the boundedness of the tracking error can be guaranteed. Through simulation of hypersonic flight dynamics, the proposed approach exhibits better tracking performance.
Collapse
|
28
|
Liang Y, Zhang H, Xiao G, Jiang H. Reinforcement learning-based online adaptive controller design for a class of unknown nonlinear discrete-time systems with time delays. Neural Comput Appl 2018. [DOI: 10.1007/s00521-018-3537-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
29
|
Hu Y, Si B. A Reinforcement Learning Neural Network for Robotic Manipulator Control. Neural Comput 2018; 30:1983-2004. [DOI: 10.1162/neco_a_01079] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
We propose a neural network model for reinforcement learning to control a robotic manipulator with unknown parameters and dead zones. The model is composed of three networks. The state of the robotic manipulator is predicted by the state network of the model, the action policy is learned by the action network, and the performance index of the action policy is estimated by a critic network. The three networks work together to optimize the performance index based on the reinforcement learning control scheme. The convergence of the learning methods is analyzed. Application of the proposed model on a simulated two-link robotic manipulator demonstrates the effectiveness and the stability of the model.
Collapse
Affiliation(s)
- Yazhou Hu
- State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, P.R.C., and University of Chinese Academy of Sciences, Beijing 100049, P.R.C
| | - Bailu Si
- State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, Shenyang, P.R.C
| |
Collapse
|
30
|
Scardapane S, Wang D, Uncini A. Bayesian Random Vector Functional-Link Networks for Robust Data Modeling. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:2049-2059. [PMID: 28749364 DOI: 10.1109/tcyb.2017.2726143] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Random vector functional-link (RVFL) networks are randomized multilayer perceptrons with a single hidden layer and a linear output layer, which can be trained by solving a linear modeling problem. In particular, they are generally trained using a closed-form solution of the (regularized) least-squares approach. This paper introduces several alternative strategies for performing full Bayesian inference (BI) of RVFL networks. Distinct from standard or classical approaches, our proposed Bayesian training algorithms allow to derive an entire probability distribution over the optimal output weights of the network, instead of a single pointwise estimate according to some given criterion (e.g., least-squares). This provides several known advantages, including the possibility of introducing additional prior knowledge in the training process, the availability of an uncertainty measure during the test phase, and the capability of automatically inferring hyper-parameters from given data. In this paper, two BI algorithms for regression are first proposed that, under some practical assumptions, can be implemented by a simple iterative process with closed-form computations. Simulation results show that one of the proposed algorithms, Bayesian RVFL, is able to outperform standard training algorithms for RVFL networks with a proper regularization factor selected carefully via a line search procedure. A general strategy based on variational inference is also presented, with an application to data modeling problems with noisy outputs or outliers. As we discuss in this paper, using recent advances in automatic differentiation this strategy can be applied to a wide range of additional situations in an immediate fashion.
Collapse
|
31
|
Fan B, Yang Q, Tang X, Sun Y. Robust ADP Design for Continuous-Time Nonlinear Systems With Output Constraints. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2127-2138. [PMID: 29771666 DOI: 10.1109/tnnls.2018.2806347] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, a novel robust adaptive dynamic programming (RADP)-based control strategy is presented for the optimal control of a class of output-constrained continuous-time unknown nonlinear systems. Our contribution includes a step forward beyond the usual optimal control result to show that the output of the plant is always within user-defined bounds. To achieve the new results, an error transformation technique is first established to generate an equivalent nonlinear system, whose asymptotic stability guarantees both the asymptotic stability and the satisfaction of the output restriction of the original system. Furthermore, RADP algorithms are developed to solve the transformed nonlinear optimal control problem with completely unknown dynamics as well as a robust design to guarantee the stability of the closed-loop systems in the presence of unavailable internal dynamic state. Via small-gain theorem, asymptotic stability of the original and transformed nonlinear system is theoretically guaranteed. Finally, comparison results demonstrate the merits of the proposed control policy.
Collapse
|
32
|
Luo B, Liu D, Wu HN. Adaptive Constrained Optimal Control Design for Data-Based Nonlinear Discrete-Time Systems With Critic-Only Structure. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2099-2111. [PMID: 28981435 DOI: 10.1109/tnnls.2017.2751018] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Reinforcement learning has proved to be a powerful tool to solve optimal control problems over the past few years. However, the data-based constrained optimal control problem of nonaffine nonlinear discrete-time systems has rarely been studied yet. To solve this problem, an adaptive optimal control approach is developed by using the value iteration-based Q-learning (VIQL) with the critic-only structure. Most of the existing constrained control methods require the use of a certain performance index and only suit for linear or affine nonlinear systems, which is unreasonable in practice. To overcome this problem, the system transformation is first introduced with the general performance index. Then, the constrained optimal control problem is converted to an unconstrained optimal control problem. By introducing the action-state value function, i.e., Q-function, the VIQL algorithm is proposed to learn the optimal Q-function of the data-based unconstrained optimal control problem. The convergence results of the VIQL algorithm are established with an easy-to-realize initial condition . To implement the VIQL algorithm, the critic-only structure is developed, where only one neural network is required to approximate the Q-function. The converged Q-function obtained from the critic-only VIQL method is employed to design the adaptive constrained optimal controller based on the gradient descent scheme. Finally, the effectiveness of the developed adaptive control method is tested on three examples with computer simulation.
Collapse
|
33
|
Wang Z, Liu L, Wu Y, Zhang H. Optimal Fault-Tolerant Control for Discrete-Time Nonlinear Strict-Feedback Systems Based on Adaptive Critic Design. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2179-2191. [PMID: 29771670 DOI: 10.1109/tnnls.2018.2810138] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper investigates the problem of optimal fault-tolerant control (FTC) for a class of unknown nonlinear discrete-time systems with actuator fault in the framework of adaptive critic design (ACD). A pivotal highlight is the adaptive auxiliary signal of the actuator fault, which is designed to offset the effect of the fault. The considered systems are in strict-feedback forms and involve unknown nonlinear functions, which will result in the causal problem. To solve this problem, the original nonlinear systems are transformed into a novel system by employing the diffeomorphism theory. Besides, the action neural networks (ANNs) are utilized to approximate a predefined unknown function in the backstepping design procedure. Combined the strategic utility function and the ACD technique, a reinforcement learning algorithm is proposed to set up an optimal FTC, in which the critic neural networks (CNNs) provide an approximate structure of the cost function. In this case, it not only guarantees the stability of the systems, but also achieves the optimal control performance as well. In the end, two simulation examples are used to show the effectiveness of the proposed optimal FTC strategy.
Collapse
|
34
|
Fan QY, Yang GH, Ye D. Quantization-Based Adaptive Actor-Critic Tracking Control With Tracking Error Constraints. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:970-980. [PMID: 28166508 DOI: 10.1109/tnnls.2017.2651104] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
In this paper, the problem of adaptive actor-critic (AC) tracking control is investigated for a class of continuous-time nonlinear systems with unknown nonlinearities and quantized inputs. Different from the existing results based on reinforcement learning, the tracking error constraints are considered and new critic functions are constructed to improve the performance further. To ensure that the tracking errors keep within the predefined time-varying boundaries, a tracking error transformation technique is used to constitute an augmented error system. Specific critic functions, rather than the long-term cost function, are introduced to supervise the tracking performance and tune the weights of the AC neural networks (NNs). A novel adaptive controller with a special structure is designed to reduce the effect of the NN reconstruction errors, input quantization, and disturbances. Based on the Lyapunov stability theory, the boundedness of the closed-loop signals and the desired tracking performance can be guaranteed. Finally, simulations on two connected inverted pendulums are given to illustrate the effectiveness of the proposed method.
Collapse
|
35
|
Adaptive neural network tracking control-based reinforcement learning for wheeled mobile robots with skidding and slipping. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.12.051] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
36
|
Luo B, Liu D, Wu HN, Wang D, Lewis FL. Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3341-3354. [PMID: 27893404 DOI: 10.1109/tcyb.2016.2623859] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The model-free optimal control problem of general discrete-time nonlinear systems is considered in this paper, and a data-based policy gradient adaptive dynamic programming (PGADP) algorithm is developed to design an adaptive optimal controller method. By using offline and online data rather than the mathematical system model, the PGADP algorithm improves control policy with a gradient descent scheme. The convergence of the PGADP algorithm is proved by demonstrating that the constructed Q -function sequence converges to the optimal Q -function. Based on the PGADP algorithm, the adaptive control method is developed with an actor-critic structure and the method of weighted residuals. Its convergence properties are analyzed, where the approximate Q -function converges to its optimum. Computer simulation results demonstrate the effectiveness of the PGADP-based adaptive control method.
Collapse
|
37
|
Luo B, Liu D, Huang T, Yang X, Ma H. Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.05.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
38
|
Chen CLP. Neural Approximation-Based Adaptive Control for a Class of Nonlinear Nonstrict Feedback Discrete-Time Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1531-1541. [PMID: 28113479 DOI: 10.1109/tnnls.2016.2531089] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, an adaptive control approach-based neural approximation is developed for a class of uncertain nonlinear discrete-time (DT) systems. The main characteristic of the considered systems is that they can be viewed as a class of multi-input multioutput systems in the nonstrict feedback structure. The similar control problem of this class of systems has been addressed in the past, but it focused on the continuous-time systems. Due to the complicacies of the system structure, it will become more difficult for the controller design and the stability analysis. To stabilize this class of systems, a new recursive procedure is developed, and the effect caused by the noncausal problem in the nonstrict feedback DT structure can be solved using a semirecurrent neural approximation. Based on the Lyapunov difference approach, it is proved that all the signals of the closed-loop system are semiglobal, ultimately uniformly bounded, and a good tracking performance can be guaranteed. The feasibility of the proposed controllers can be validated by setting a simulation example.
Collapse
|
39
|
Jon R, Wang Z, Luo C, Jong M. Adaptive robust speed control based on recurrent elman neural network for sensorless PMSM servo drives. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.09.095] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
40
|
Vamvoudakis KG, Miranda MF, Hespanha JP. Asymptotically Stable Adaptive-Optimal Control Algorithm With Saturating Actuators and Relaxed Persistence of Excitation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:2386-2398. [PMID: 26513810 DOI: 10.1109/tnnls.2015.2487972] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This paper proposes a control algorithm based on adaptive dynamic programming to solve the infinite-horizon optimal control problem for known deterministic nonlinear systems with saturating actuators and nonquadratic cost functionals. The algorithm is based on an actor/critic framework, where a critic neural network (NN) is used to learn the optimal cost, and an actor NN is used to learn the optimal control policy. The adaptive control nature of the algorithm requires a persistence of excitation condition to be a priori validated, but this can be relaxed using previously stored data concurrently with current data in the update of the critic NN. A robustifying control term is added to the controller to eliminate the effect of residual errors, leading to the asymptotically stability of the closed-loop system. Simulation results show the effectiveness of the proposed approach for a controlled Van der Pol oscillator and also for a power system plant.
Collapse
|
41
|
Liu YJ, Tong S. Optimal Control-Based Adaptive NN Design for a Class of Nonlinear Discrete-Time Block-Triangular Systems. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:2670-2680. [PMID: 26929080 DOI: 10.1109/tcyb.2015.2494007] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we propose an optimal control scheme-based adaptive neural network design for a class of unknown nonlinear discrete-time systems. The controlled systems are in a block-triangular multi-input-multi-output pure-feedback structure, i.e., there are both state and input couplings and nonaffine functions to be included in every equation of each subsystem. The design objective is to provide a control scheme, which not only guarantees the stability of the systems, but also achieves optimal control performance. The main contribution of this paper is that it is for the first time to achieve the optimal performance for such a class of systems. Owing to the interactions among subsystems, making an optimal control signal is a difficult task. The design ideas are that: 1) the systems are transformed into an output predictor form; 2) for the output predictor, the ideal control signal and the strategic utility function can be approximated by using an action network and a critic network, respectively; and 3) an optimal control signal is constructed with the weight update rules to be designed based on a gradient descent method. The stability of the systems can be proved based on the difference Lyapunov method. Finally, a numerical simulation is given to illustrate the performance of the proposed scheme.
Collapse
|
42
|
Yang X, Liu D, Luo B, Li C. Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.07.051] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
43
|
Yang X, Liu D, Wei Q, Wang D. Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.08.119] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
44
|
Zhang Y, Wang S. MLP technique based reinforcement learning control of discrete pure-feedback systems. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.05.087] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
45
|
Gao Y, Wang H, Liu YJ. Adaptive fuzzy control with minimal leaning parameters for electric induction motors. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2014.12.071] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
46
|
Meng W, Yang Q, Sun Y. Adaptive neural control of nonlinear MIMO systems with time-varying output constraints. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:1074-1085. [PMID: 25051562 DOI: 10.1109/tnnls.2014.2333878] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In this paper, adaptive neural control is investigated for a class of unknown multiple-input multiple-output nonlinear systems with time-varying asymmetric output constraints. To ensure constraint satisfaction, we employ a system transformation technique to transform the original constrained (in the sense of the output restrictions) system into an equivalent unconstrained one, whose stability is sufficient to solve the output constraint problem. It is shown that output tracking is achieved without violation of the output constraint. More specifically, we can shape the system performance arbitrarily on transient and steady-state stages with the output evolving in predefined time-varying boundaries all the time. A single neural network, whose weights are tuned online, is used in our design to approximate the unknown functions in the system dynamics, while the singularity problem of the control coefficient matrix is avoided without assumption on the prior knowledge of control input's bound. All the signals in the closed-loop system are proved to be semiglobally uniformly ultimately bounded via Lyapunov synthesis. Finally, the merits of the proposed controller are verified in the simulation environment.
Collapse
|
47
|
Luo B, Wu HN, Li HX. Adaptive optimal control of highly dissipative nonlinear spatially distributed processes with neuro-dynamic programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:684-696. [PMID: 25794375 DOI: 10.1109/tnnls.2014.2320744] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Highly dissipative nonlinear partial differential equations (PDEs) are widely employed to describe the system dynamics of industrial spatially distributed processes (SDPs). In this paper, we consider the optimal control problem of the general highly dissipative SDPs, and propose an adaptive optimal control approach based on neuro-dynamic programming (NDP). Initially, Karhunen-Loève decomposition is employed to compute empirical eigenfunctions (EEFs) of the SDP based on the method of snapshots. These EEFs together with singular perturbation technique are then used to obtain a finite-dimensional slow subsystem of ordinary differential equations that accurately describes the dominant dynamics of the PDE system. Subsequently, the optimal control problem is reformulated on the basis of the slow subsystem, which is further converted to solve a Hamilton-Jacobi-Bellman (HJB) equation. HJB equation is a nonlinear PDE that has proven to be impossible to solve analytically. Thus, an adaptive optimal control method is developed via NDP that solves the HJB equation online using neural network (NN) for approximating the value function; and an online NN weight tuning law is proposed without requiring an initial stabilizing control policy. Moreover, by involving the NN estimation error, we prove that the original closed-loop PDE system with the adaptive optimal control policy is semiglobally uniformly ultimately bounded. Finally, the developed method is tested on a nonlinear diffusion-convection-reaction process and applied to a temperature cooling fin of high-speed aerospace vehicle, and the achieved results show its effectiveness.
Collapse
|
48
|
Liu YJ, Tang L, Tong S, Chen CLP, Li DJ. Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:165-176. [PMID: 25438326 DOI: 10.1109/tnnls.2014.2360724] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Based on the neural network (NN) approximator, an online reinforcement learning algorithm is proposed for a class of affine multiple input and multiple output (MIMO) nonlinear discrete-time systems with unknown functions and disturbances. In the design procedure, two networks are provided where one is an action network to generate an optimal control signal and the other is a critic network to approximate the cost function. An optimal control signal and adaptation laws can be generated based on two NNs. In the previous approaches, the weights of critic and action networks are updated based on the gradient descent rule and the estimations of optimal weight vectors are directly adjusted in the design. Consequently, compared with the existing results, the main contributions of this paper are: 1) only two parameters are needed to be adjusted, and thus the number of the adaptation laws is smaller than the previous results and 2) the updating parameters do not depend on the number of the subsystems for MIMO systems and the tuning rules are replaced by adjusting the norms on optimal weight vectors in both action and critic networks. It is proven that the tracking errors, the adaptation laws, and the control inputs are uniformly bounded using Lyapunov analysis method. The simulation examples are employed to illustrate the effectiveness of the proposed algorithm.
Collapse
|
49
|
Wei Q, Wang FY, Liu D, Yang X. Finite-approximation-error-based discrete-time iterative adaptive dynamic programming. IEEE TRANSACTIONS ON CYBERNETICS 2014; 44:2820-2833. [PMID: 25265640 DOI: 10.1109/tcyb.2014.2354377] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for infinite horizon discrete-time nonlinear systems with finite approximation errors. First, a new generalized value iteration algorithm of ADP is developed to make the iterative performance index function converge to the solution of the Hamilton-Jacobi-Bellman equation. The generalized value iteration algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. When the iterative control law and iterative performance index function in each iteration cannot accurately be obtained, for the first time a new "design method of the convergence criteria" for the finite-approximation-error-based generalized value iteration algorithm is established. A suitable approximation error can be designed adaptively to make the iterative performance index function converge to a finite neighborhood of the optimal performance index function. Neural networks are used to implement the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the developed method.
Collapse
|
50
|
Yang X, Liu D, Wang D, Wei Q. Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning. Neural Netw 2014; 55:30-41. [DOI: 10.1016/j.neunet.2014.03.008] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2013] [Revised: 02/08/2014] [Accepted: 03/20/2014] [Indexed: 11/30/2022]
|