1
|
Zhao M, Wang D, Qiao J. Neural-network-based accelerated safe Q-learning for optimal control of discrete-time nonlinear systems with state constraints. Neural Netw 2025; 186:107249. [PMID: 39955957 DOI: 10.1016/j.neunet.2025.107249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 01/11/2025] [Accepted: 02/02/2025] [Indexed: 02/18/2025]
Abstract
For unknown nonlinear systems with state constraints, it is difficult to achieve the safe optimal control by using Q-learning methods based on traditional quadratic utility functions. To solve this problem, this article proposes an accelerated safe Q-learning (SQL) technique that addresses the concurrent requirements of safety and optimality for discrete-time nonlinear systems within an integrated framework. First, an adjustable control barrier function is designed and integrated into the cost function, aiming to facilitate the transformation of constrained optimal control problems into unconstrained cases. The augmented cost function is closely linked to the next state, enabling quicker deviation of the state from constraint boundaries. Second, leveraging offline data that adheres to safety constraints, we introduce an off-policy value iteration SQL approach for searching a safe optimal policy, thus mitigating the risk of unsafe interactions that may result from suboptimal iterative policies. Third, the vast amounts of offline data and the complex augmented cost function can hinder the learning speed of the algorithm. To address this issue, we integrate historical iteration information into the current iteration step to accelerate policy evaluation, and introduce the Nesterov Momentum technique to expedite policy improvement. Additionally, the theoretical analysis demonstrates the convergence, optimality, and safety of the SQL algorithm. Finally, under the influence of different parameters, simulation outcomes of two nonlinear systems with state constraints reveal the efficacy and advantages of the accelerated SQL approach. The proposed method requires fewer iterations while enabling the system state to converge to the equilibrium point more rapidly.
Collapse
Affiliation(s)
- Mingming Zhao
- School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Ding Wang
- School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Junfei Qiao
- School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
2
|
Xiang Z, Li P, Zou W, Ahn CK. Data-Based Optimal Switching and Control With Admissibility Guaranteed Q-Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:5963-5973. [PMID: 38837921 DOI: 10.1109/tnnls.2024.3405739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2024]
Abstract
This article addresses the data-based optimal switching and control codesign for discrete-time nonlinear switched systems via a two-stage approximate dynamic programming (ADP) algorithm. Through offline policy improvement and policy evaluation, the proposed algorithm iteratively determines the optimal hybrid control policy using system input/output data. Moreover, a strict proof of the convergence is given for the two-stage ADP algorithm. Admissibility, an essential property of the hybrid control policy must be ensured for practical application. To this end, the properties of the hybrid control policies are analyzed and an admissibility criterion is obtained. To realize the proposed Q-learning algorithm, an actor-critic neural network (NN) structure that employs multiple NNs to approximate the Q-functions and control policies for different subsystems is adopted. By applying the proposed admissibility criterion, the obtained hybrid control policy is guaranteed to be admissible. Finally, two numerical simulations verify the effectiveness of the proposed algorithm.
Collapse
|
3
|
Zhao H, Shan J, Peng L, Yu H. Adaptive Event-Triggered Bipartite Formation for Multiagent Systems via Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17817-17828. [PMID: 37729566 DOI: 10.1109/tnnls.2023.3309326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
This article investigates the online learning and energy-efficient control issues for nonlinear discrete-time multiagent systems (MASs) with unknown dynamics models and antagonistic interactions. First, a distributed combined measurement error function is formulated using the signed graph theory to transfer the bipartite formation issue into a consensus issue. Then, an enhanced linearization controller model for the controlled MASs is developed by employing dynamic linearization technology. After that, an online learning adaptive event-triggered (ET) actor-critic neural network (AC-NN) framework for the MASs to implement bipartite formation control tasks is proposed by employing the optimized NNs and designed adaptive ET mechanism. Moreover, the convergence of the designed formation control framework is strictly proved by the constructed Lyapunov functions. Finally, simulation and experimental studies further demonstrate the effectiveness of the proposed algorithm.
Collapse
|
4
|
Qiao P, Liu X, Zhang Q, Xu B. An optimal control algorithm toward unknown constrained nonlinear systems based on the sequential sampling and updating of surrogate model. ISA TRANSACTIONS 2024; 153:117-132. [PMID: 39030118 DOI: 10.1016/j.isatra.2024.07.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 06/18/2024] [Accepted: 07/05/2024] [Indexed: 07/21/2024]
Abstract
The application of optimal control theory in practical engineering is often limited by the modeling cost and complexity of the mathematical model of the controlled plant, and various constraints. To bridge the gap between the theory and practice, this paper proposes a model-free direct method based on the sequential sampling and updating of surrogate model, and extends the ability of direct method to solve model-free optimal control problems with general constraints. The algorithm selects sample points from the current actual trajectory data to update the surrogate model of controlled plant, and solve the optimal control problem of the constantly refined surrogate model until the result converges. The presented initial and subsequent sampling strategies eliminate the dependence on the model. Furthermore, the new stopping criteria ensure the overlap of final actual and planned trajectories. The several examples illustrate that the presented algorithm can obtain constrained solutions with greater accuracy and require fewer sample data.
Collapse
Affiliation(s)
- Ping Qiao
- School of Mechanical Engineering, Suzhou University of Science and Technology, 215101 Suzhou, People's Republic of China.
| | - Xin Liu
- Guizhou Xiaozhi Tongxie Technology Co., Ltd, 550081 Guiyang, People's Republic of China.
| | - Qi Zhang
- School of Cyber Science and Engineering, Huazhong University of Science and Technology, 430074 Wuhan, People's Republic of China.
| | - Bing Xu
- School of Mechanical Engineering, Suzhou University of Science and Technology, 215101 Suzhou, People's Republic of China.
| |
Collapse
|
5
|
Wang Y, Wang D, Zhao M, Liu N, Qiao J. Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate. Neural Netw 2024; 175:106274. [PMID: 38583264 DOI: 10.1016/j.neunet.2024.106274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/15/2024] [Accepted: 03/25/2024] [Indexed: 04/09/2024]
Abstract
In this paper, an adjustable Q-learning scheme is developed to solve the discrete-time nonlinear zero-sum game problem, which can accelerate the convergence rate of the iterative Q-function sequence. First, the monotonicity and convergence of the iterative Q-function sequence are analyzed under some conditions. Moreover, by employing neural networks, the model-free tracking control problem can be overcome for zero-sum games. Second, two practical algorithms are designed to guarantee the convergence with accelerated learning. In one algorithm, an adjustable acceleration phase is added to the iteration process of Q-learning, which can be adaptively terminated with convergence guarantee. In another algorithm, a novel acceleration function is developed, which can adjust the relaxation factor to ensure the convergence. Finally, through a simulation example with the practical physical background, the fantastic performance of the developed algorithm is demonstrated with neural networks.
Collapse
Affiliation(s)
- Yuan Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Ding Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Mingming Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Nan Liu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Junfei Qiao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
6
|
Lin M, Zhao B, Liu D. Policy gradient adaptive dynamic programming for nonlinear discrete-time zero-sum games with unknown dynamics. Soft comput 2023. [DOI: 10.1007/s00500-023-07817-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
7
|
Xue S, Luo B, Liu D, Gao Y. Neural network-based event-triggered integral reinforcement learning for constrained H∞ tracking control with experience replay. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
8
|
Duan D, Liu C. Event-based optimal guidance laws design for missile-target interception systems using fuzzy dynamic programming approach. ISA TRANSACTIONS 2022; 128:243-255. [PMID: 34801242 DOI: 10.1016/j.isatra.2021.10.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 10/27/2021] [Accepted: 10/29/2021] [Indexed: 06/13/2023]
Abstract
In this paper, the guidance system with unknown dynamics is modeled as a partially unknown zero-sum differential game system. Then, a periodic event-triggered optimal control algorithm is designed to interception target under a plug-n-play framework. To realize this algorithm, generalized fuzzy hyperbolic models are employed to construct the identifier-critic structure, where the online identifier is used to estimate unknown dynamics, meanwhile, the generalized fuzzy hyperbolic models-based critic network is utilized to approximate the cost function. Note that plug-n-play framework lets both the designed identifier and critic network work simultaneously, in other words, the prior system information is no longer required, which simplifies the network structure. Using the Lyapunov function method, the approximate optimal control strategy and corresponding weight updating laws are derived to guarantee that the closed-loop system and weight approximation errors are uniformly ultimately bounded, where an additional function is added to weight updating laws to release the requirement for admissible initial control. Finally, to compare the intercept effects and the utilization ratio of communication resources of the periodic event-triggered control algorithm and the common adaptive dynamic programming algorithm, the missile interception system is introduced as an example.
Collapse
Affiliation(s)
- Dandan Duan
- College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu 210016, China
| | - Chunsheng Liu
- College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu 210016, China.
| |
Collapse
|
9
|
Yi X, Luo B, Zhao Y. Adaptive Dynamic Programming-Based Visual Servoing Control for Quadrotor. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
10
|
Yu X, Hou Z, Polycarpou MM. A Data-Driven ILC Framework for a Class of Nonlinear Discrete-Time Systems. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:6143-6157. [PMID: 33571102 DOI: 10.1109/tcyb.2020.3029596] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, we propose a data-driven iterative learning control (ILC) framework for unknown nonlinear nonaffine repetitive discrete-time single-input-single-output systems by applying the dynamic linearization (DL) technique. The ILC law is constructed based on the equivalent DL expression of an unknown ideal learning controller in the iteration and time domains. The learning control gain vector is adaptively updated by using a Newton-type optimization method. The monotonic convergence on the tracking errors of the controlled plant is theoretically guaranteed with respect to the 2-norm under some conditions. In the proposed ILC framework, existing proportional, integral, and derivative type ILC, and high-order ILC can be considered as special cases. The proposed ILC framework is a pure data-driven ILC, that is, the ILC law is independent of the physical dynamics of the controlled plant, and the learning control gain updating algorithm is formulated using only the measured input-output data of the nonlinear system. The proposed ILC framework is effectively verified by two illustrative examples on a complicated unknown nonlinear system and on a linear time-varying system.
Collapse
|
11
|
Tang F, Niu B, Zong G, Zhao X, Xu N. Periodic event-triggered adaptive tracking control design for nonlinear discrete-time systems via reinforcement learning. Neural Netw 2022; 154:43-55. [PMID: 35853319 DOI: 10.1016/j.neunet.2022.06.039] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 05/11/2022] [Accepted: 06/29/2022] [Indexed: 11/26/2022]
Abstract
In this paper, an event-triggered control scheme with periodic characteristic is developed for nonlinear discrete-time systems under an actor-critic architecture of reinforcement learning (RL). The periodic event-triggered mechanism (ETM) is constructed to decide whether the sampling data are delivered to controllers or not. Meanwhile, the controller is updated only when the event-triggered condition deviates from a prescribed threshold. Compared with traditional continuous ETMs, the proposed periodic ETM can guarantee a minimal lower bound of the inter-event intervals and avoid sampling calculation point-to-point, which means that the partial communication resources can be efficiently economized. The critic and actor neural networks (NNs), consisting of radial basis function neural networks (RBFNNs), aim to approximate the unknown long-term performance index function and the ideal event-triggered controller, respectively. A rigorous stability analysis based on the Lyapunov difference method is provided to substantiate that the closed-loop system can be stabilized. All error signals of the closed-loop system are uniformly ultimately bounded (UUB) under the guidance of the proposed control scheme. Finally, two simulation examples are given to validate the effectiveness of the control design.
Collapse
Affiliation(s)
- Fanghua Tang
- College of Control Science and Engineering, Bohai University, Jinzhou 121013, Liaoning, China.
| | - Ben Niu
- School of Information Science and Engineering, Shandong Normal University, Jinan 250014, China.
| | - Guangdeng Zong
- School of Engineering, Qufu Normal University, Rizhao 276826, China.
| | - Xudong Zhao
- College of Control Science and Engineering, Bohai University, Jinzhou 121013, Liaoning, China; Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, Liaoning, China.
| | - Ning Xu
- Institute of Information and Control, Hangzhou Dianzi University, Hangzhou 310018, China.
| |
Collapse
|
12
|
Song S, Zhu M, Dai X, Gong D. Model-Free Optimal Tracking Control of Nonlinear Input-Affine Discrete-Time Systems via an Iterative Deterministic Q-Learning Algorithm. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:999-1012. [PMID: 35657846 DOI: 10.1109/tnnls.2022.3178746] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this article, a novel model-free dynamic inversion-based Q-learning (DIQL) algorithm is proposed to solve the optimal tracking control (OTC) problem of unknown nonlinear input-affine discrete-time (DT) systems. Compared with the existing DIQL algorithm and the discount factor-based Q-learning (DFQL) algorithm, the proposed algorithm can eliminate the tracking error while ensuring that it is model-free and off-policy. First, a new deterministic Q-learning iterative scheme is presented, and based on this scheme, a model-based off-policy DIQL algorithm is designed. The advantage of this new scheme is that it can avoid the training of unusual data and improve data utilization, thereby saving computing resources. Simultaneously, the convergence and stability of the designed algorithm are analyzed, and the proof that adding probing noise into the behavior policy does not affect the convergence is presented. Then, by introducing neural networks (NNs), the model-free version of the designed algorithm is further proposed so that the OTC problem can be solved without any knowledge about the system dynamics. Finally, three simulation examples are given to demonstrate the effectiveness of the proposed algorithm.
Collapse
|
13
|
Fu Y, Hong C, Fu J, Chai T. Approximate Optimal Tracking Control of Nondifferentiable Signals for a Class of Continuous-Time Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:4441-4450. [PMID: 33141675 DOI: 10.1109/tcyb.2020.3027344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, for a class of continuous-time nonlinear nonaffine systems with unknown dynamics, a robust approximate optimal tracking controller (RAOTC) is proposed in the framework of adaptive dynamic programming (ADP). The distinguishing contribution of this article is that a new Lyapunov function is constructed, by using which the derivative information of tracking errors is not required in computing its time derivative along with the solution of the closed-loop system. Thus, the proposed method can make the system states follow nondifferentiable reference signals, which removes the common assumption that the reference signals have to be continuous for tracking control of continuous-time nonlinear systems in the literature. The theoretical analysis, simulation, and application results well illustrate the effectiveness and superiority of the proposed method.
Collapse
|
14
|
Wang H, Li M. Model-Free Reinforcement Learning for Fully Cooperative Consensus Problem of Nonlinear Multiagent Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1482-1491. [PMID: 33338022 DOI: 10.1109/tnnls.2020.3042508] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article presents an off-policy model-free algorithm based on reinforcement learning (RL) to optimize the fully cooperative (FC) consensus problem of nonlinear continuous-time multiagent systems (MASs). First, the optimal FC consensus problem is transformed into solving the coupled Hamilton-Jacobian-Bellman (HJB) equation. Then, we propose a policy iteration (PI)-based algorithm, which is further proved to be effective to solve the coupled HJB equation. To implement this scheme in a model-free way, a model-free Bellman equation is derived to find the optimal value function and the optimal control policy for each agent. Then, based on the least-squares approach, the tuning law for actor and critic weights is derived by employing actor and critic neural networks into the model-free Bellman equation to approximate the target policies and the value function. Finally, we propose an off-policy model-free integral RL (IRL) algorithm, which can be used to optimize the FC consensus problem of the whole system in real time by using measured data. The effectiveness of this proposed algorithm is verified by the simulation results.
Collapse
|
15
|
Zhang H, Wang H, Niu B, Zhang L, Ahmad AM. Sliding-mode surface-based adaptive actor-critic optimal control for switched nonlinear systems with average dwell time. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.08.062] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
16
|
Ai M, Xie Y, Tang Z, Zhang J, Gui W. Deep learning feature-based setpoint generation and optimal control for flotation processes. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.07.060] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
17
|
Wen G, Chen CLP, Ge SS. Simplified Optimized Backstepping Control for a Class of Nonlinear Strict-Feedback Systems With Unknown Dynamic Functions. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4567-4580. [PMID: 32639935 DOI: 10.1109/tcyb.2020.3002108] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, a control scheme based on optimized backstepping (OB) technique is developed for a class of nonlinear strict-feedback systems with unknown dynamic functions. Reinforcement learning (RL) is employed for achieving the optimized control, and it is designed on the basis of the neural-network (NN) approximations under identifier-critic-actor architecture, where the identifier, critic, and actor are utilized for estimating the unknown dynamic, evaluating the system performance, and implementing the control action, respectively. OB control is to design all virtual controls and the actual control of backstepping to be the optimized solutions of corresponding subsystems. If the control is developed by employing the existing RL-based optimal control methods, it will become very intricate because their critic and actor updating laws are derived by carrying out gradient descent algorithm to the square of Bellman residual error, which is equal to the approximation of the Hamilton-Jacobi-Bellman (HJB) equation that contains multiple nonlinear terms. In order to effectively accomplish the optimized control, a simplified RL algorithm is designed by deriving the updating laws from the negative gradient of a simple positive function, which is generated from the partial derivative of the HJB equation. Meanwhile, the design can also release the condition of persistence excitation, which is required in most existing optimal controls. Finally, effectiveness is demonstrated by both theory and simulation.
Collapse
|
18
|
Xue S, Luo B, Liu D. Event-Triggered Adaptive Dynamic Programming for Unmatched Uncertain Nonlinear Continuous-Time Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2939-2951. [PMID: 32721899 DOI: 10.1109/tnnls.2020.3009015] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, an event-triggered adaptive dynamic programming (ADP) method is proposed to solve the robust control problem of unmatched uncertain systems. First, the robust control problem with unmatched uncertainties is transformed into the optimal control design for an auxiliary system. Subsequently, to reduce controller executions and save computational and communication resources, an event-triggering mechanism is introduced. By using a critic neural network (NN) to approximate the value function, novel concurrent learning is developed to learn NN weights, which avoids the requirement of an initial admissible control and the persistence of excitation condition. Moreover, it is proven that the developed event-triggered ADP controller guarantees the robustness of the uncertain system and the uniform ultimate boundedness of the NN weight estimation error. Finally, by using the F-16 aircraft and the inverted pendulum with unmatched uncertainties as examples, the simulation results show the effectiveness of the developed event-triggered ADP method.
Collapse
|
19
|
Wei Q, Li H, Yang X, He H. Continuous-Time Distributed Policy Iteration for Multicontroller Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2372-2383. [PMID: 32248139 DOI: 10.1109/tcyb.2020.2979614] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, a novel distributed policy iteration algorithm is established for infinite horizon optimal control problems of continuous-time nonlinear systems. In each iteration of the developed distributed policy iteration algorithm, only one controller's control law is updated and the other controllers' control laws remain unchanged. The main contribution of the present algorithm is to improve the iterative control law one by one, instead of updating all the control laws in each iteration of the traditional policy iteration algorithms, which effectively releases the computational burden in each iteration. The properties of distributed policy iteration algorithm for continuous-time nonlinear systems are analyzed. The admissibility of the present methods has also been analyzed. Monotonicity, convergence, and optimality have been discussed, which show that the iterative value function is nonincreasingly convergent to the solution of the Hamilton-Jacobi-Bellman equation. Finally, numerical simulations are conducted to illustrate the effectiveness of the proposed method.
Collapse
|
20
|
Yang X, Wei Q. Adaptive Critic Learning for Constrained Optimal Event-Triggered Control With Discounted Cost. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:91-104. [PMID: 32167914 DOI: 10.1109/tnnls.2020.2976787] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article studies an optimal event-triggered control (ETC) problem of nonlinear continuous-time systems subject to asymmetric control constraints. The present nonlinear plant differs from many studied systems in that its equilibrium point is nonzero. First, we introduce a discounted cost for such a system in order to obtain the optimal ETC without making coordinate transformations. Then, we present an event-triggered Hamilton-Jacobi-Bellman equation (ET-HJBE) arising in the discounted-cost constrained optimal ETC problem. After that, we propose an event-triggering condition guaranteeing a positive lower bound for the minimal intersample time. To solve the ET-HJBE, we construct a critic network under the framework of adaptive critic learning. The critic network weight vector is tuned through a modified gradient descent method, which simultaneously uses historical and instantaneous state data. By employing the Lyapunov method, we prove that the uniform ultimate boundedness of all signals in the closed-loop system is guaranteed. Finally, we provide simulations of a pendulum system and an oscillator system to validate the obtained optimal ETC strategy.
Collapse
|
21
|
Wei Q, Wang L, Liu Y, Polycarpou MM. Optimal Elevator Group Control via Deep Asynchronous Actor-Critic Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5245-5256. [PMID: 32071000 DOI: 10.1109/tnnls.2020.2965208] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, a new deep reinforcement learning (RL) method, called asynchronous advantage actor-critic (A3C) method, is developed to solve the optimal control problem of elevator group control systems (EGCSs). The main contribution of this article is that the optimal control law of EGCSs is designed via a new deep RL method, such that the elevator system sends passengers to the desired destination floors as soon as possible. Deep convolutional and recurrent neural networks, which can update themselves during applications, are designed to dispatch elevators. Then, the structure of the A3C method is developed, and the training phase for the learning optimal law is discussed. Finally, simulation results illustrate that the developed method effectively reduces the average waiting time in a complex building environment. Comparisons with traditional algorithms further verify the effectiveness of the developed method.
Collapse
|
22
|
Nguyen TT, Nguyen ND, Nahavandi S. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:3826-3839. [PMID: 32203045 DOI: 10.1109/tcyb.2020.2977374] [Citation(s) in RCA: 114] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Reinforcement learning (RL) algorithms have been around for decades and employed to solve various sequential decision-making problems. These algorithms, however, have faced great challenges when dealing with high-dimensional environments. The recent development of deep learning has enabled RL methods to drive optimal policies for sophisticated and capable agents, which can perform efficiently in these challenging environments. This article addresses an important aspect of deep RL related to situations that require multiple agents to communicate and cooperate to solve complex tasks. A survey of different approaches to problems related to multiagent deep RL (MADRL) is presented, including nonstationarity, partial observability, continuous state and action spaces, multiagent training schemes, and multiagent transfer learning. The merits and demerits of the reviewed methods will be analyzed and discussed with their corresponding applications explored. It is envisaged that this review provides insights about various MADRL methods and can lead to the future development of more robust and highly useful multiagent learning methods for solving real-world problems.
Collapse
|
23
|
Jiang H, Zhang H, Xie X. Critic-only adaptive dynamic programming algorithms' applications to the secure control of cyber-physical systems. ISA TRANSACTIONS 2020; 104:138-144. [PMID: 30853105 DOI: 10.1016/j.isatra.2019.02.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 01/22/2019] [Accepted: 02/14/2019] [Indexed: 06/09/2023]
Abstract
Industrial cyber-physical systems generally suffer from the malicious attacks and unmatched perturbation, and thus the security issue is always the core research topic in the related fields. This paper proposes a novel intelligent secure control scheme, which integrates optimal control theory, zero-sum game theory, reinforcement learning and neural networks. First, the secure control problem of the compromised system is converted into the zero-sum game issue of the nominal auxiliary system, and then both policy-iteration-based and value-iteration-based adaptive dynamic programming methods are introduced to solve the Hamilton-Jacobi-Isaacs equations. The proposed secure control scheme can mitigate the effects of actuator attacks and unmatched perturbation, and stabilize the compromised cyber-physical systems by tuning the system performance parameters, which is proved through the Lyapunov stability theory. Finally, the proposed approach is applied to the Quanser helicopter to verify the effectiveness.
Collapse
Affiliation(s)
- He Jiang
- College of Information Science and Engineering, Northeastern University, Box 134, 110819, Shenyang, PR China.
| | - Huaguang Zhang
- College of Information Science and Engineering, Northeastern University, Box 134, 110819, Shenyang, PR China.
| | - Xiangpeng Xie
- Institute of Advanced Technology, Nanjing University of Posts and Telecommunications, 210003, Nanjing, PR China.
| |
Collapse
|
24
|
Robust optimal control for a class of nonlinear systems with unknown disturbances based on disturbance observer and policy iteration. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.082] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
25
|
Davoud S, Gao W, Riveros-Perez E. Adaptive optimal target controlled infusion algorithm to prevent hypotension associated with labor epidural: An adaptive dynamic programming approach. ISA TRANSACTIONS 2020; 100:74-81. [PMID: 31813558 DOI: 10.1016/j.isatra.2019.11.017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2019] [Revised: 11/11/2019] [Accepted: 11/12/2019] [Indexed: 06/10/2023]
Abstract
Patients receiving labor epidurals commonly experience arterial hypotension as a complication of neuraxial block. The purpose of this study was to design an adaptive optimal controller for an infusion system to regulate mean arterial pressure. A state-space model relating mean arterial pressure to Norepinephrine (NE) infusion rate was derived for controller design. A data-driven adaptive optimal control algorithm was developed based on adaptive dynamic programming (ADP). The stability and disturbance rejection ability of the closed-loop system were tested via a simulation model calibrated using available clinical data. Simulation results indicated that the settling time was six minutes and the system showed effective disturbance rejection. The results also demonstrate that the adaptive optimal control algorithm would achieve individualized control of mean arterial pressure in pregnant patients with no prior knowledge of patient parameters.
Collapse
Affiliation(s)
- Sherwin Davoud
- Department of Anesthesiology and Perioperative Medicine, Medical College of Georgia, Augusta University, 1120 15th st, Augusta, GA 30912, United States of America
| | - Weinan Gao
- Department of Electrical and Computer Engineering, Allen E. Paulson College of Engineering and Computing, Georgia Southern University, 1100 IT Drive, Statesboro, GA 30460, United States of America.
| | - Efrain Riveros-Perez
- Department of Anesthesiology and Perioperative Medicine, Medical College of Georgia, Augusta University, 1120 15th st, Augusta, GA 30912, United States of America
| |
Collapse
|
26
|
Zhang Y, Zhao B, Liu D. Deterministic policy gradient adaptive dynamic programming for model-free optimal control. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.11.032] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
27
|
Xu B, Zhang R, Li S, He W, Shi Z. Composite Neural Learning-Based Nonsingular Terminal Sliding Mode Control of MEMS Gyroscopes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:1375-1386. [PMID: 31251201 DOI: 10.1109/tnnls.2019.2919931] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The efficient driving control of MEMS gyroscopes is an attractive way to improve the precision without hardware redesign. This paper investigates the sliding mode control (SMC) for the dynamics of MEMS gyroscopes using neural networks (NNs). Considering the existence of the dynamics uncertainty, the composite neural learning is constructed to obtain higher tracking precision using the serial-parallel estimation model (SPEM). Furthermore, the nonsingular terminal SMC (NTSMC) is proposed to achieve finite-time convergence. To obtain the prescribed performance, a time-varying barrier Lyapunov function (BLF) is introduced to the control scheme. Through simulation tests, it is observed that under the BLF-based NTSMC with composite learning design, the tracking precision of MEMS gyroscopes is highly improved.
Collapse
|
28
|
New insight into the simultaneous policy update algorithms related to H∞ state feedback control. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.01.060] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
29
|
Luo B, Yang Y, Liu D. Adaptive -Learning for Data-Based Optimal Output Regulation With Experience Replay. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:3337-3348. [PMID: 29994038 DOI: 10.1109/tcyb.2018.2821369] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, the data-based optimal output regulation problem of discrete-time systems is investigated. An off-policy adaptive -learning (QL) method is developed by using real system data without requiring the knowledge of system dynamics and the mathematical model of utility function. By introducing the -function, an off-policy adaptive QL algorithm is developed to learn the optimal -function. An adaptive parameter in the policy evaluation is used to achieve tradeoff between the current and future -functions. The convergence of adaptive QL algorithm is proved and the influence of the adaptive parameter is analyzed. To realize the adaptive QL algorithm with real system data, the actor-critic neural network (NN) structure is developed. The least-squares scheme and the batch gradient descent method are developed to update the critic and actor NN weights, respectively. The experience replay technique is employed in the learning process, which leads to simple and convenient implementation of the adaptive QL method. Finally, the effectiveness of the developed adaptive QL method is verified through numerical simulations.
Collapse
|
30
|
Zhang H, Qu Q, Xiao G, Cui Y. Optimal Guaranteed Cost Sliding Mode Control for Constrained-Input Nonlinear Systems With Matched and Unmatched Disturbances. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2112-2126. [PMID: 29771665 DOI: 10.1109/tnnls.2018.2791419] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Based on integral sliding mode and approximate dynamic programming (ADP) theory, a novel optimal guaranteed cost sliding mode control is designed for constrained-input nonlinear systems with matched and unmatched disturbances. When the system moves on the sliding surface, the optimal guaranteed cost control problem of sliding mode dynamics is transformed into the optimal control problem of a reformulated auxiliary system with a modified cost function. The ADP algorithm based on single critic neural network (NN) is applied to obtain the approximate optimal control law for the auxiliary system. Lyapunov techniques are used to demonstrate the convergence of the NN weight errors. In addition, the derived approximate optimal control is verified to guarantee the sliding mode dynamics system to be stable in the sense of uniform ultimate boundedness. Some simulation results are presented to verify the feasibility of the proposed control scheme.
Collapse
|