1
|
Safe reinforcement learning for discrete-time fully cooperative games with partial state and control constraints using control barrier functions. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2022.10.058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
2
|
Safe Reinforcement Learning for Affine Nonlinear Systems with State Constraints and Input Saturation Using Control Barrier Functions. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
3
|
|
4
|
Ma H, Zhang Q. Threshold dynamics and optimal control on an age-structured SIRS epidemic model with vaccination. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:9474-9495. [PMID: 34814354 DOI: 10.3934/mbe.2021465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We consider a vaccination control into a age-structured susceptible-infective-recovered-susceptible (SIRS) model and study the global stability of the endemic equilibrium by the iterative method. The basic reproduction number $ R_0 $ is obtained. It is shown that if $ R_0 < 1 $, then the disease-free equilibrium is globally asymptotically stable, if $ R_0 > 1 $, then the disease-free and endemic equilibrium coexist simultaneously, and the global asymptotic stability of endemic equilibrium is also shown. Additionally, the Hamilton-Jacobi-Bellman (HJB) equation is given by employing the Bellman's principle of optimality. Through proving the existence of viscosity solution for HJB equation, we obtain the optimal vaccination control strategy. Finally, numerical simulations are performed to illustrate the corresponding analytical results.
Collapse
Affiliation(s)
- Han Ma
- School of Mathematics and Statistics, Ningxia University, Yinchuan, 750021, China
| | - Qimin Zhang
- School of Mathematics and Statistics, Ningxia University, Yinchuan, 750021, China
| |
Collapse
|
5
|
A Learning Control Method of Automated Vehicle Platoon at Straight Path with DDPG-Based PID. ELECTRONICS 2021. [DOI: 10.3390/electronics10212580] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Cooperative adaptive cruise control (CACC) has important significance for the development of the connected and automated vehicle (CAV) industry. The traditional proportional integral derivative (PID) platoon controller adjustment is not only time-consuming and laborious, but also unable to adapt to different working conditions. This paper proposes a learning control method for a vehicle platooning system using a deep deterministic policy gradient (DDPG)-based PID. The main contribution of this study is automating the PID weight tuning process by formulating this objective as a deep reinforcement learning (DRL) problem. The longitudinal control of the vehicle platooning is divided into upper and lower control structures. The upper-level controller based on the DDPG algorithm can adjust the current PID controller parameters. Through offline training and learning in a SUMO simulation software environment, the PID controller can adapt to different road and vehicular platooning acceleration and deceleration conditions. The lower-level controller controls the gas/brake pedal to accurately track the desired acceleration and speed. Based on the hardware-in-the-loop (HIL) simulation platform, the results show that in terms of the maximum speed error, for the DDPG-based PID controller this is 0.02–0.08 m/s less than for the conventional PID controller, with a maximum reduction of 5.48%. In addition, the maximum distance error of the DDPG-based PID controller is 0.77 m, which is 14.44% less than that of the conventional PID controller.
Collapse
|
6
|
Finite-horizon robust formation-containment control of multi-agent networks with unknown dynamics. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.01.063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
7
|
Learning Stable Robust Adaptive NARMA Controller for UAV and Its Application to Twin Rotor MIMO Systems. Neural Process Lett 2020. [DOI: 10.1007/s11063-020-10265-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
8
|
Dong L, Yan J, Yuan X, He H, Sun C. Functional Nonlinear Model Predictive Control Based on Adaptive Dynamic Programming. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:4206-4218. [PMID: 30130246 DOI: 10.1109/tcyb.2018.2859801] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper presents a functional model predictive control (MPC) approach based on an adaptive dynamic programming (ADP) algorithm with the abilities of handling control constraints and disturbances for the optimal control of nonlinear discrete-time systems. In the proposed ADP-based nonlinear MPC (NMPC) structure, a neural-network-based identification is established first to reconstruct the unknown system dynamics. Then, the actor-critic scheme is adopted with a critic network to estimate the index performance function and an action network to approximate the optimal control input. Meanwhile, as the MPC strategy can effectively determine the current control by solving a finite horizon open-loop optimal control problem, in the proposed algorithm, the infinite horizon is decomposed into a series of finite horizons to obtain the optimal control. In each finite horizon, the finite ADP algorithm solves the optimal control problem subject to the terminal constraint, the control constraint, and the disturbance. The uniform ultimate boundedness of the closed-loop system is verified by the Lyapunov approach. Finally, the ADP-based NMPC is conducted on two different cases and the simulation results demonstrate the quick response and strong robustness of the proposed method.
Collapse
|
9
|
Liang Y, Zhang H, Cai Y, Sun S. A neural network-based approach for solving quantized discrete-time H∞ optimal control with input constraint over finite-horizon. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.12.031] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
10
|
Wang D, Mu C, Liu D, Ma H. On Mixed Data and Event Driven Design for Adaptive-Critic-Based Nonlinear $H_{\infty}$ Control. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:993-1005. [PMID: 28166505 DOI: 10.1109/tnnls.2016.2642128] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, based on the adaptive critic learning technique, the control for a class of unknown nonlinear dynamic systems is investigated by adopting a mixed data and event driven design approach. The nonlinear control problem is formulated as a two-player zero-sum differential game and the adaptive critic method is employed to cope with the data-based optimization. The novelty lies in that the data driven learning identifier is combined with the event driven design formulation, in order to develop the adaptive critic controller, thereby accomplishing the nonlinear control. The event driven optimal control law and the time driven worst case disturbance law are approximated by constructing and tuning a critic neural network. Applying the event driven feedback control, the closed-loop system is built with stability analysis. Simulation studies are conducted to verify the theoretical results and illustrate the control performance. It is significant to observe that the present research provides a new avenue of integrating data-based control and event-triggering mechanism into establishing advanced adaptive critic systems.
Collapse
|
11
|
Zhang H, Cui X, Luo Y, Jiang H. Finite-Horizon $H_{\infty }$ Tracking Control for Unknown Nonlinear Systems With Saturating Actuators. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1200-1212. [PMID: 28362620 DOI: 10.1109/tnnls.2017.2669099] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, a neural network (NN)-based online model-free integral reinforcement learning algorithm is developed to solve the finite-horizon optimal tracking control problem for completely unknown nonlinear continuous-time systems with disturbance and saturating actuators (constrained control input). An augmented system is constructed with the tracking error system and the command generator system. A time-varying Hamilton-Jacobi-Isaacs (HJI) equation is formulated for the augmented problem, which is extremely difficult or impossible to solve due to its time-dependent property and nonlinearity. Then, an actor-critic-disturbance NN structure-based scheme is proposed to learn the time-varying solution to the HJI equation in real time without using the knowledge of system dynamics. Since the solution to the HJI equation is time-dependent, the form of NNs representation with constant weights and time-dependent activation functions is considered. Furthermore, an extra error is incorporated in order to satisfy the terminal constraints in the weight update law. Convergence and stability proofs are given based on the Lyapunov theory for nonautonomous systems. Two simulation examples are provided to demonstrate the effectiveness of the designed algorithm.
Collapse
|
12
|
Yang F, Wang C. Pattern-Based NN Control of a Class of Uncertain Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1108-1119. [PMID: 28186912 DOI: 10.1109/tnnls.2017.2655503] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This paper presents a pattern-based neural network (NN) control approach for a class of uncertain nonlinear systems. The approach consists of two phases of identification and another two phases of recognition and control. First, in the phase (i) of identification, adaptive NN controllers are designed to achieve closed-loop stability and tracking performance of nonlinear systems for different control situations, and the corresponding closed-loop control system dynamics are identified via deterministic learning. The identified control system dynamics are stored in constant radial basis function (RBF) NNs, and a set of constant NN controllers are constructed by using the obtained constant RBF networks. Second, in the phase (ii) of identification, when the plant is operated under different or abnormal conditions, the system dynamics under normal control are identified via deterministic learning. A bank of dynamical estimators is constructed for all the abnormal conditions and the learned knowledge is embedded in the estimators. Third, in the phase of recognition, when one identified control situation recurs, by using the constructed estimators, the recurred control situation will be rapidly recognized. Finally, in the phase of pattern-based control, based on the rapid recognition, the constant NN controller corresponding to the current control situation is selected, and both closed-loop stability and improved control performance can be achieved. The results presented show that the pattern-based control realizes a humanlike control process, and will provide a new framework for fast decision and control in dynamic environments. A simulation example is included to demonstrate the effectiveness of the approach.
Collapse
|
13
|
Cao Z, Xiao Q, Huang R, Zhou M. Robust Neuro-Optimal Control of Underactuated Snake Robots With Experience Replay. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:208-217. [PMID: 29300697 DOI: 10.1109/tnnls.2017.2768820] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, the problem of path following for underactuated snake robots is investigated by using approximate dynamic programming and neural networks (NNs). The lateral undulatory gait of a snake robot is stabilized in a virtual holonomic constraint manifold through a partial feedback linearizing control law. Based on a dynamic compensator and Line-of-Sight guidance law, the path-following problem is transformed to a regulation problem of a nonlinear system with uncertainties. Subsequently, it is solved by an infinite horizon optimal control scheme using a single critic NN. A novel fluctuating learning algorithm is derived to approximate the associated cost function online and relax the initial stabilizing control requirement. The approximate optimal control input is derived by solving a modified Hamilton-Jacobi-Bellman equation. The conventional persistence of excitation condition is relaxed by using experience replay technique. The proposed control scheme ensures that all states of the snake robot are uniformly ultimate bounded which is analyzed by using the Lyapunov approach, and the tracking error asymptotically converges to a residual set. Simulation results are presented to verify the effectiveness of the proposed method.
Collapse
|
14
|
Wang D, He H, Liu D. Adaptive Critic Nonlinear Robust Control: A Survey. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3429-3451. [PMID: 28682269 DOI: 10.1109/tcyb.2017.2712188] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Adaptive dynamic programming (ADP) and reinforcement learning are quite relevant to each other when performing intelligent optimization. They are both regarded as promising methods involving important components of evaluation and improvement, at the background of information technology, such as artificial intelligence, big data, and deep learning. Although great progresses have been achieved and surveyed when addressing nonlinear optimal control problems, the research on robustness of ADP-based control strategies under uncertain environment has not been fully summarized. Hence, this survey reviews the recent main results of adaptive-critic-based robust control design of continuous-time nonlinear systems. The ADP-based nonlinear optimal regulation is reviewed, followed by robust stabilization of nonlinear systems with matched uncertainties, guaranteed cost control design of unmatched plants, and decentralized stabilization of interconnected systems. Additionally, further comprehensive discussions are presented, including event-based robust control design, improvement of the critic learning rule, nonlinear H∞ control design, and several notes on future perspectives. By applying the ADP-based optimal and robust control methods to a practical power system and an overhead crane plant, two typical examples are provided to verify the effectiveness of theoretical results. Overall, this survey is beneficial to promote the development of adaptive critic control methods with robustness guarantee and the construction of higher level intelligent systems.
Collapse
|
15
|
Luo B, Liu D, Wu HN, Wang D, Lewis FL. Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3341-3354. [PMID: 27893404 DOI: 10.1109/tcyb.2016.2623859] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The model-free optimal control problem of general discrete-time nonlinear systems is considered in this paper, and a data-based policy gradient adaptive dynamic programming (PGADP) algorithm is developed to design an adaptive optimal controller method. By using offline and online data rather than the mathematical system model, the PGADP algorithm improves control policy with a gradient descent scheme. The convergence of the PGADP algorithm is proved by demonstrating that the constructed Q -function sequence converges to the optimal Q -function. Based on the PGADP algorithm, the adaptive control method is developed with an actor-critic structure and the method of weighted residuals. Its convergence properties are analyzed, where the approximate Q -function converges to its optimum. Computer simulation results demonstrate the effectiveness of the PGADP-based adaptive control method.
Collapse
|
16
|
Luo B, Liu D, Huang T, Yang X, Ma H. Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.05.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
17
|
|
18
|
Sahin S, Guzelis C. Online Learning ARMA Controllers With Guaranteed Closed-Loop Stability. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:2314-2326. [PMID: 26462245 DOI: 10.1109/tnnls.2015.2480764] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This paper presents a novel online block adaptive learning algorithm for autoregressive moving average (ARMA) controller design based on the real data measured from the plant. The method employs ARMA input-output models both for the plant and the resulting closed-loop system. In a sliding window, the plant model parameters are identified first offline using a supervised learning algorithm minimizing an ε -insensitive and regularized identification error, which is the window average of the distances between the measured plant output and the model output for the input provided by the controller. The optimal controller parameters are then determined again offline for another sliding window as the solution to a constrained optimization problem, where the cost is the ε -insensitive and regularized output tracking error and the constraints that are linear inequalities of the controller parameters are imposed for ensuring the closed-loop system to be Schur stable. Not only the identification phase but also the controller design phase uses the input-output samples measured from the plant during online learning. In the developed online controller design method, the controller parameters can always be kept in a parameter region providing Schur stability for the closed-loop system. The ε -insensitiveness provides robustness against disturbances, so does the regularization better generalization performance in the identification and the control. The method is tested on benchmark plants, including the inverted pendulum and dc motor models. The method is also tested on an emulated and also a real dc motor by online block adaptive learning ARMA controllers, in particular, Proportional-Integral-Derivative controllers.
Collapse
|
19
|
Luo B, Liu D, Huang T, Wang D. Model-Free Optimal Tracking Control via Critic-Only Q-Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:2134-2144. [PMID: 27416608 DOI: 10.1109/tnnls.2016.2585520] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Model-free control is an important and promising topic in control fields, which has attracted extensive attention in the past few years. In this paper, we aim to solve the model-free optimal tracking control problem of nonaffine nonlinear discrete-time systems. A critic-only Q-learning (CoQL) method is developed, which learns the optimal tracking control from real system data, and thus avoids solving the tracking Hamilton-Jacobi-Bellman equation. First, the Q-learning algorithm is proposed based on the augmented system, and its convergence is established. Using only one neural network for approximating the Q-function, the CoQL method is developed to implement the Q-learning algorithm. Furthermore, the convergence of the CoQL method is proved with the consideration of neural network approximation error. With the convergent Q-function obtained from the CoQL method, the adaptive optimal tracking control is designed based on the gradient descent scheme. Finally, the effectiveness of the developed CoQL method is demonstrated through simulation studies. The developed CoQL method learns with off-policy data and implements with a critic-only structure, thus it is easy to realize and overcome the inadequate exploration problem.
Collapse
|
20
|
Cui X, Zhang H, Luo Y, Zu P. Online finite-horizon optimal learning algorithm for nonzero-sum games with partially unknown dynamics and constrained inputs. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.12.021] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|