1
|
Xin P, Wang D, Liu A, Qiao J. Neural critic learning with accelerated value iteration for nonlinear model predictive control. Neural Netw 2024; 176:106364. [PMID: 38754288 DOI: 10.1016/j.neunet.2024.106364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 01/27/2024] [Accepted: 04/30/2024] [Indexed: 05/18/2024]
Abstract
In practical industrial processes, the receding optimization solution of nonlinear model predictive control (NMPC) is always a very knotty problem. Based on adaptive dynamic programming, the accelerated value iteration predictive control (AVI-PC) algorithm is developed in this paper. Integrating iteration learning with the receding horizon mechanism of NMPC, a novel receding optimization solution pattern is exploited to resolve the optimal control law in each prediction horizon. Besides, the basic architecture and the specific form of the AVI-PC algorithm are demonstrated, including the relationship among the iterative learning process, the prediction process, and the control process. On this basis, the convergence and admissibility conditions are established, and the relevant properties are comprehensively analyzed when the accelerated factor satisfies the established conditions. Furthermore, the accelerated value iterative function is approximated through the single critic network constructed by utilizing the multiple linear regression method. Finally, the plentiful simulation experiments are conducted from various perspectives to verify the effectiveness and progressiveness of the AVI-PC algorithm.
Collapse
Affiliation(s)
- Peng Xin
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Ding Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Ao Liu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Junfei Qiao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
2
|
Shen Y, Wu ZG, Wang X. Distributed Lebesgue Approximation Model for Distributed Continuous-Time Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2024; 54:3752-3764. [PMID: 37027284 DOI: 10.1109/tcyb.2023.3262632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Approximation models play a crucial role in model-based methods, as they enhance both accuracy and computational efficiency. This article studies distributed and asynchronous discretized models to approach continuous-time nonlinear systems. The considered continuous-time system consists of some distributed but physically coupled nonlinear subsystems that exchange information. We propose two Lebesgue approximation models (LAMs): 1) the unconditionally triggered LAM (CT-LAM) and 2) the CT-LAM. In both approaches, a specific LAM approximates an individual subsystem. The iteration of each LAM is triggered by either itself or its neighbors. The collection of different LAMs executing asynchronously together form the approximation of the overall distributed continuous-time system. The aperiodic nature of LAMs allows for a reduction in the number of iterations in the approximation process, particularly when the system has slow dynamics. The difference between the unconditionally and CT-LAMs is that the latter checks an "importance" condition, further reducing the computational effort in individual LAMs. Furthermore, the proposed LAMs are analyzed by constructing a distributed event-triggered system which is proved to have the same state trajectories as the LAMs with linear interpolation. Through this specific event-triggered system, we derive conditions on the quantization sizes in LAMs to ensure asymptotic stability of the LAMs, boundedness of the state errors, and prevention of Zeno behavior. Finally, simulations are carried out on a quarter-car suspension system to show the advantage and efficiency of the proposed approaches.
Collapse
|
3
|
Wang W, Gu H, Mei J, Hu J. Output information-based intermittent optimal control for continuous-time nonlinear systems with unmatched uncertainties via adaptive dynamic programming. ISA TRANSACTIONS 2024; 147:163-175. [PMID: 38368145 DOI: 10.1016/j.isatra.2024.02.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 02/13/2024] [Accepted: 02/13/2024] [Indexed: 02/19/2024]
Abstract
Intermittent control stands as a valuable strategy for resource conservation and cost reduction across diverse systems. Nonetheless, prevailing research is intractable to address the challenges posed by robust optimal intermittent control of nonlinear input-affine systems with unmatched uncertainties. This paper aims to fill this gap. Initially, we introduce an enhanced finite-time intermittent control approach to ensure stability within nonlinear dynamic systems harboring bounded errors. A neural networks (NNs) state observer is constructed to estimate system information. Subsequently, an optimal intermittent controller that operates within a finite time span, guaranteeing system stability by employing the Hamilton-Jacobi-Bellman (HJB) methodology. Furthermore, we devise an output information-based event-triggered intermittent (ETI) approach rooted in the robust adaptive dynamic programming (ADP) algorithm, furnishing an optimal intermittent control law. In this process, a critic NNs is introduced to estimate the cost function and optimal intermittent controller. Simulation results show that our proposed method is superior to existing intermittent control strategies.
Collapse
Affiliation(s)
- Weifeng Wang
- School of Mathematics and Statistics, South-Central Minzu University, Wuhan 430074, China.
| | - Heping Gu
- Department of Mathematics and Statistics, Sichuan Minzu College, Kangding City 626001, China.
| | - Jun Mei
- School of Mathematics and Statistics, South-Central Minzu University, Wuhan 430074, China; Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853, USA.
| | - Junhao Hu
- School of Mathematics and Statistics, South-Central Minzu University, Wuhan 430074, China.
| |
Collapse
|
4
|
Ding F, Sun C, He S. Anti-Swing Control for Quadrotor-Slung Load Transportation System with Underactuated State Constraints. SENSORS (BASEL, SWITZERLAND) 2023; 23:8995. [PMID: 37960694 PMCID: PMC10647520 DOI: 10.3390/s23218995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/24/2023] [Accepted: 10/27/2023] [Indexed: 11/15/2023]
Abstract
Quadrotors play a crucial role in the national economy. The control technology for quadrotor-slung load transportation systems has become a research hotspot. However, the underactuated load's swing poses significant challenges to the stability of the system. In this paper, we propose a Lyapunov-based control strategy, to ensure the stability of the quadrotor-slung load transportation system while satisfying the constraints of the load's swing angles. Firstly, a position controller without swing angle constraints is proposed, to ensure the stability of the system. Then, a barrier Lyapunov function based on the load's swing angle constraints is constructed, and an anti-swing controller is designed to guarantee the states' asymptotic stability. Finally, a PD controller is designed, to drive the actual angles to the virtual ones, which are extracted from the position controller. The effectiveness of the control method is verified by comparing it to the results of the LQR algorithm. The proposed control method not only guarantees the payload's swing angle constraints but also reduces energy consumption.
Collapse
Affiliation(s)
| | - Chong Sun
- Hubei Provincial Engineering Research Center for Intelligent Management of Manufacturing Enterprises, School of Computer Science, South-Central Minzu University, Wuhan 430074, China; (F.D.); (S.H.)
| | | |
Collapse
|
5
|
Yuan X, Wang Y, Liu J, Sun C. Action Mapping: A Reinforcement Learning Method for Constrained-Input Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7145-7157. [PMID: 35025751 DOI: 10.1109/tnnls.2021.3138924] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Existing approaches to constrained-input optimal control problems mainly focus on systems with input saturation, whereas other constraints, such as combined inequality constraints and state-dependent constraints, are seldom discussed. In this article, a reinforcement learning (RL)-based algorithm is developed for constrained-input optimal control of discrete-time (DT) systems. The deterministic policy gradient (DPG) is introduced to iteratively search the optimal solution to the Hamilton-Jacobi-Bellman (HJB) equation. To deal with input constraints, an action mapping (AM) mechanism is proposed. The objective of this mechanism is to transform the exploration space from the subspace generated by the given inequality constraints to the standard Cartesian product space, which can be searched effectively by existing algorithms. By using the proposed architecture, the learned policy can output control signals satisfying the given constraints, and the original reward function can be kept unchanged. In our study, the convergence analysis is given. It is shown that the iterative algorithm is convergent to the optimal solution of the HJB equation. In addition, the continuity of the iterative estimated Q -function is investigated. Two numerical examples are provided to demonstrate the effectiveness of our approach.
Collapse
|
6
|
Yang X, Zhang H, Wang Z, Yan H, Zhang C. Data-Based Predictive Control via Multistep Policy Gradient Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:2818-2828. [PMID: 34752414 DOI: 10.1109/tcyb.2021.3121078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, a model-free predictive control algorithm for the real-time system is presented. The algorithm is data driven and is able to improve system performance based on multistep policy gradient reinforcement learning. By learning from the offline dataset and real-time data, the knowledge of system dynamics is avoided in algorithm design and application. Cooperative games of the multiplayer in time horizon are presented to model the predictive control as optimization problems of multiagent and guarantee the optimality of the predictive control policy. In order to implement the algorithm, neural networks are used to approximate the action-state value function and predictive control policy, respectively. The weights are determined by using the methods of weighted residual. Numerical results show the effectiveness of the proposed algorithm.
Collapse
|
7
|
Li F, Peng H, Song X, Liu J, Tan S, Ju Z. A Physics-Guided Coordinated Distributed MPC Method for Shape Control of an Antenna Reflector. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:10263-10275. [PMID: 33784630 DOI: 10.1109/tcyb.2021.3064071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Active shape control for an antenna reflector is a significant procedure used to compensate for the impacts of a complicated space environment. In this article, a physics-guided distributed model predictive control (DMPC) framework for reflector shape control with input saturation is proposed. First, guided by the actual physical characteristics, an overall structural system is decomposed into multilevel subsystems with the help of a so-called substructuring technique. For each subsystem, a prediction model with information interaction is discretized by an explicit Newmark- β method. Then, to improve the system-wide control performance, a coordinator among all the subsystems is designed in an iterative fashion. The input saturation constraints are addressed by transforming the original problem into a linear complementarity problem (LCP). Finally, by solving the LCP, the input trajectory can be obtained. The performance of the proposed DMPC algorithm is validated through an experiment on the shape control of an antenna reflector structure.
Collapse
|
8
|
Yang X, Zhu Y, Dong N, Wei Q. Decentralized Event-Driven Constrained Control Using Adaptive Critic Designs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5830-5844. [PMID: 33861716 DOI: 10.1109/tnnls.2021.3071548] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We study the decentralized event-driven control problem of nonlinear dynamical systems with mismatched interconnections and asymmetric input constraints. To begin with, by introducing a discounted cost function for each auxiliary subsystem, we transform the decentralized event-driven constrained control problem into a group of nonlinear H2 -constrained optimal control problems. Then, we develop the event-driven Hamilton-Jacobi-Bellman equations (ED-HJBEs), which arise in the nonlinear H2 -constrained optimal control problems. Meanwhile, we demonstrate that all the solutions of the ED-HJBEs together keep the overall system stable in the sense of uniform ultimate boundedness (UUB). To solve the ED-HJBEs, we build a critic-only architecture under the framework of adaptive critic designs. The architecture only employs critic neural networks and updates their weight vectors via the gradient descent method. After that, based on the Lyapunov approach, we prove that the UUB stability of all signals in the closed-loop auxiliary subsystems is assured. Finally, simulations of an illustrated nonlinear interconnected plant are provided to validate the present designs.
Collapse
|
9
|
Zhu D, Yang SX, Biglarbegian M. A Fuzzy Logic-based Cascade Control without Actuator Saturation for the Unmanned Underwater Vehicle Trajectory Tracking. J INTELL ROBOT SYST 2022. [DOI: 10.1007/s10846-022-01742-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
10
|
Li T, Yang D, Xie X, Zhang H. Event-Triggered Control of Nonlinear Discrete-Time System With Unknown Dynamics Based on HDP(λ). IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:6046-6058. [PMID: 33531312 DOI: 10.1109/tcyb.2020.3044595] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The heuristic dynamic programming (HDP) ( λ )-based optimal control strategy, which takes a long-term prediction parameter λ into account using an iterative manner, accelerates the learning rate obviously. The computation complexity caused by the state-associated extra variable in λ -return value computing of the traditional value-gradient learning method can be reduced. However, as the iteration number increases, calculation costs have grown dramatically that bring huge challenge for the optimal control process with limited bandwidth and computational units. In this article, we propose an event-triggered HDP (ETHDP) ( λ ) optimal control strategy for nonlinear discrete-time (NDT) systems with unknown dynamics. The iterative relation for λ -return of the final target value is derived first. The event-triggered condition ensuring system stability is designed to reduce the computation and communication requirements. Next, we build a model-actor-critic neural network (NN) structure, in which the model NN evaluates the system state for getting λ -return of the current time target value, which is used to obtain the critic NN real-time update errors. The event-triggered optimal control signal and one-step-return value are approximated by actor and critic NN, respectively. Then, the event trigger-based uniformly ultimately bounded (UUB) stability of the system state and NN weight errors are demonstrated by applying the Lyapunov technology. Finally, we illustrate the effectiveness of our proposed ETHDP ( λ ) strategy by two cases.
Collapse
|
11
|
Liu X, Ma L, Kong X, Lee KY. An Efficient Iterative Learning Predictive Functional Control for Nonlinear Batch Processes. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:4147-4160. [PMID: 33055043 DOI: 10.1109/tcyb.2020.3021978] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Iterative learning model-predictive control (ILMPC) is very popular in controlling the batch process since it possesses not only the learning ability along batches but also the strong time-domain tracking properties. However, for a fast batch process with strong nonlinear dynamics, the application of the ILMPC is challenging due to the difficulty in balancing the computational efficiency and tracking accuracy. In this article, an efficient iterative learning predictive functional control (ILPFC) is proposed. The original nonlinear system is linearized along the reference trajectory to derive a 2-D tracking-error predictive model. The linearization error is compensated by utilizing the Lipschitz condition so that the objective function can be formulated with the upper bound of the actual tracking error. For enhancing control efficiency, predictive functional control (PFC) is applied in the time domain, which reduces the dimension of the decision variable in order to effectively cut down the computational burden. The stability and convergence of this ILPFC with terminal constraint are analyzed theoretically. Simulations on an unmanned ground vehicle and a typical fast batch reactor verify the effectiveness of the proposed control algorithm.
Collapse
|
12
|
Ye J, Bian Y, Luo B, Hu M, Xu B, Ding R. Costate-Supplement ADP for Model-Free Optimal Control of Discrete-Time Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:45-59. [PMID: 35544498 DOI: 10.1109/tnnls.2022.3172126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this article, an adaptive dynamic programming (ADP) scheme utilizing a costate function is proposed for optimal control of unknown discrete-time nonlinear systems. The state-action data are obtained by interacting with the environment under the iterative scheme without any model information. In contrast with the traditional ADP scheme, the collected data in the proposed algorithm are generated with different policies, which improves data utilization in the learning process. In order to approximate the cost function more accurately and to achieve a better policy improvement direction in the case of insufficient data, a separate costate network is introduced to approximate the costate function under the actor-critic framework, and the costate is utilized as supplement information to estimate the cost function more precisely. Furthermore, convergence properties of the proposed algorithm are analyzed to demonstrate that the costate function plays a positive role in the convergence process of the cost function based on the alternate iteration mode of the costate function and cost function under a mild assumption. The uniformly ultimately bounded (UUB) property of all the variables is proven by using the Lyapunov approach. Finally, two numerical examples are presented to demonstrate the effectiveness and computation efficiency of the proposed method.
Collapse
|
13
|
Yang Y, Fan X, Xu C, Wu J, Sun B. State consensus cooperative control for a class of nonlinear multi-agent systems with output constraints via ADP approach. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.046] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
14
|
Dong L, Li Y, Zhou X, Wen Y, Guan K. Intelligent Trainer for Dyna-Style Model-Based Deep Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2758-2771. [PMID: 32866102 DOI: 10.1109/tnnls.2020.3008249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Model-based reinforcement learning (MBRL) has been proposed as a promising alternative solution to tackle the high sampling cost challenge in the canonical RL, by leveraging a system dynamics model to generate synthetic data for policy training purpose. The MBRL framework, nevertheless, is inherently limited by the convoluted process of jointly optimizing control policy, learning system dynamics, and sampling data from two sources controlled by complicated hyperparameters. As such, the training process involves overwhelmingly manual tuning and is prohibitively costly. In this research, we propose a "reinforcement on reinforcement" (RoR) architecture to decompose the convoluted tasks into two decoupled layers of RL. The inner layer is the canonical MBRL training process which is formulated as a Markov decision process, called training process environment (TPE). The outer layer serves as an RL agent, called intelligent trainer, to learn an optimal hyperparameter configuration for the inner TPE. This decomposition approach provides much-needed flexibility to implement different trainer designs, referred to "train the trainer." In our research, we propose and optimize two alternative trainer designs: 1) an unihead trainer and 2) a multihead trainer. Our proposed RoR framework is evaluated for five tasks in the OpenAI gym. Compared with three other baseline methods, our proposed intelligent trainer methods have a competitive performance in autotuning capability, with up to 56% expected sampling cost saving without knowing the best parameter configurations in advance. The proposed trainer framework can be easily extended to tasks that require costly hyperparameter tuning.
Collapse
|
15
|
An Advanced Angular Velocity Error Prediction Horizon Self-Tuning Nonlinear Model Predictive Speed Control Strategy for PMSM System. ELECTRONICS 2021. [DOI: 10.3390/electronics10091123] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In nonlinear model predictive control (NMPC), higher accuracy can be obtained with a shorter prediction horizon in steady-state, better dynamics can be obtained with a longer prediction horizon in a transient state, and calculation burden is proportional to the prediction horizon which is usually pre-selected as a constant according to dynamics of the system with NMPC. The minimum calculation and prediction accuracy are hard to ensure for all operating states. This can be improved by an online changing prediction horizon. A nonlinear model predictive speed control (NMPSC) with advanced angular velocity error (AAVE) prediction horizon self-tuning method has been proposed in which the prediction horizon is improved as a discrete-time integer variable and can be adjusted during each sampling period. A permanent magnet synchronous motor (PMSM) rotor position control system with the proposed strategy is accomplished. Tracking performances including rotor position Integral of Time-weighted Absolute value of the Error (ITAE), the maximal delay time, and static error are improved about 15.033%, 23.077%, and 10.294% respectively comparing with the conventional NMPSC strategy with a certain prediction horizon. Better disturbance resisting performance, lower weighting factor sensitivities, and higher servo stiffness are achieved. Simulation and experimental results are given to demonstrate the effectiveness and correctness.
Collapse
|
16
|
Kang E, Qiao H, Gao J, Yang W. Neural network-based model predictive tracking control of an uncertain robotic manipulator with input constraints. ISA TRANSACTIONS 2021; 109:89-101. [PMID: 33616059 DOI: 10.1016/j.isatra.2020.10.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 07/09/2020] [Accepted: 10/03/2020] [Indexed: 06/12/2023]
Abstract
This paper proposes a neural network-based model predictive control (MPC) method for robotic manipulators with model uncertainty and input constraints. In the presented NN-based MPC structure, two groups of radial basis function neural networks (RBFNNs) are considered for online model estimation and effective optimization. The first group of RBFNNs is introduced as a predictive model for the robotic system with online learning strategies for handling the system uncertainty and improving the model estimation accuracy. The second one is developed for solving the optimization problem. By taking into account an actor-critic scheme with different weights and the same activation function, adaptive learning strategies are established for balancing between optimal tracking performance and predictive system stability. In addition, aiming at guaranteeing the input constraints, a nonquadratic cost function is adopted for the NN-based MPC. The ultimately uniformly boundedness (UUB) of all variables is verified through the Lyapunov approach. Simulation studies are conducted to explain the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Erlong Kang
- The State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; Beijing Key Laboratory of Research and Application for Robotic Intelligence of Hand-Eye-Brain Interaction, Beijing 100190, China
| | - Hong Qiao
- The State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai 200031, China.
| | - Jie Gao
- The State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; Beijing Key Laboratory of Research and Application for Robotic Intelligence of Hand-Eye-Brain Interaction, Beijing 100190, China
| | - Wenjing Yang
- State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha 410073, China
| |
Collapse
|
17
|
Wei Q, Liao Z, Yang Z, Li B, Liu D. Continuous-Time Time-Varying Policy Iteration. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4958-4971. [PMID: 31329153 DOI: 10.1109/tcyb.2019.2926631] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
A novel policy iteration algorithm, called the continuous-time time-varying (CTTV) policy iteration algorithm, is presented in this paper to obtain the optimal control laws for infinite horizon CTTV nonlinear systems. The adaptive dynamic programming (ADP) technique is utilized to obtain the iterative control laws for the optimization of the performance index function. The properties of the CTTV policy iteration algorithm are analyzed. Monotonicity, convergence, and optimality of the iterative value function have been analyzed, and the iterative value function can be proven to monotonically converge to the optimal solution of the Hamilton-Jacobi-Bellman (HJB) equation. Furthermore, the iterative control law is guaranteed to be admissible to stabilize the nonlinear systems. In the implementation of the presented CTTV policy algorithm, the approximate iterative control laws and iterative value function are obtained by neural networks. Finally, the numerical results are given to verify the effectiveness of the presented method.
Collapse
|
18
|
Zhang H, Li S, Zheng Y. Q-Learning-Based Model Predictive Control for Nonlinear Continuous-Time Systems. Ind Eng Chem Res 2020. [DOI: 10.1021/acs.iecr.0c02321] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Hao Zhang
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
- Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shaoyuan Li
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
- Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Zheng
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
- Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
19
|
Wei Q, Song R, Liao Z, Li B, Lewis FL. Discrete-Time Impulsive Adaptive Dynamic Programming. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4293-4306. [PMID: 30990209 DOI: 10.1109/tcyb.2019.2906694] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal impulsive control problems for infinite horizon discrete-time nonlinear systems. Considering the constraint of the impulsive interval, in each iteration, the iterative impulsive value function under each possible impulsive interval is obtained, and then the iterative value function and iterative control law are achieved. A new convergence analysis method is developed which proves an iterative value function to converge to the optimum as the iteration index increases to infinity. The properties of the iterative control law are analyzed, and the detailed implementation of the optimal impulsive control law is presented. Finally, two simulation examples with comparisons are given to show the effectiveness of the developed method.
Collapse
|