1
|
Zhang H, Zhao X, Wang H, Zong G, Xu N. Hierarchical Sliding-Mode Surface-Based Adaptive Actor-Critic Optimal Control for Switched Nonlinear Systems With Unknown Perturbation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:1559-1571. [PMID: 35834452 DOI: 10.1109/tnnls.2022.3183991] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This article studies the hierarchical sliding-mode surface (HSMS)-based adaptive optimal control problem for a class of switched continuous-time (CT) nonlinear systems with unknown perturbation under an actor-critic (AC) neural networks (NNs) architecture. First, a novel perturbation observer with a nested parameter adaptive law is designed to estimate the unknown perturbation. Then, by constructing an especial cost function related to HSMS, the original control issue is further converted into the problem of finding a series of optimal control policies. The solution to the HJB equation is identified by the HSMS-based AC NNs, where the actor and critic updating laws are developed to implement the reinforcement learning (RL) strategy simultaneously. The critic update law is designed via the gradient descent approach and the principle of standardization, such that the persistence of excitation (PE) condition is no longer needed. Based on the Lyapunov stability theory, all the signals of the closed-loop switched nonlinear systems are strictly proved to be bounded in the sense of uniformly ultimate boundedness (UUB). Finally, the simulation results are presented to verify the validity of the proposed adaptive optimal control scheme.
Collapse
|
2
|
Lin Z, Duan J, Li SE, Ma H, Li J, Chen J, Cheng B, Ma J. Policy-Iteration-Based Finite-Horizon Approximate Dynamic Programming for Continuous-Time Nonlinear Optimal Control. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:5255-5267. [PMID: 37015565 DOI: 10.1109/tnnls.2022.3225090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The Hamilton-Jacobi-Bellman (HJB) equation serves as the necessary and sufficient condition for the optimal solution to the continuous-time (CT) optimal control problem (OCP). Compared with the infinite-horizon HJB equation, the solving of the finite-horizon (FH) HJB equation has been a long-standing challenge, because the partial time derivative of the value function is involved as an additional unknown term. To address this problem, this study first-time bridges the link between the partial time derivative and the terminal-time utility function, and thus it facilitates the use of the policy iteration (PI) technique to solve the CT FH OCPs. Based on this key finding, the FH approximate dynamic programming (ADP) algorithm is proposed leveraging an actor-critic framework. It is shown that the algorithm exhibits important properties in terms of convergence and optimality. Rather importantly, with the use of multilayer neural networks (NNs) in the actor-critic architecture, the algorithm is suitable for CT FH OCPs toward more general nonlinear and complex systems. Finally, the effectiveness of the proposed algorithm is demonstrated by conducting a series of simulations on both a linear quadratic regulator (LQR) problem and a nonlinear vehicle tracking problem.
Collapse
|
3
|
Xie T, Xian B, Gu X. Fixed-time convergence attitude control for a tilt trirotor unmanned aerial vehicle based on reinforcement learning. ISA TRANSACTIONS 2023; 132:477-489. [PMID: 35753810 DOI: 10.1016/j.isatra.2022.06.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 06/08/2022] [Accepted: 06/08/2022] [Indexed: 06/15/2023]
Abstract
This paper presents a new nonlinear robust attitude control strategy for the tilt trirotor unmanned aerial vehicle (UAV). Fixed-time convergence control of the UAV's attitude tracking errors under the effects of model uncertainties and unknown external disturbances is achieved by utilizing the proposed control design. Actor-critic (AC) structure based neural networks are trained only with the information of the UAV's inputs and outputs data, to handle the UAV's modeling uncertainties with bounded estimation error. Then a sliding-mode based fixed-time controller is designed to compensate the approximation error of the neural networks and the unknown external disturbances. Based on the Lyapunov stability theory, the stability analysis of the closed-loop system is presented. The performance of the presented nonlinear robust control strategy is validated through the real-time flight experiments.
Collapse
Affiliation(s)
- Tian Xie
- Tianjin University, Tianjin 300072, PR China
| | - Bin Xian
- Tianjin University, Tianjin 300072, PR China.
| | - Xu Gu
- Tianjin University, Tianjin 300072, PR China; Guiyang University, Guiyang 55005, Guizhou Province, PR China
| |
Collapse
|
4
|
Xian B, Zhang X, Zhang H, Gu X. Robust Adaptive Control for a Small Unmanned Helicopter Using Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:7589-7597. [PMID: 34125690 DOI: 10.1109/tnnls.2021.3085767] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article presents a novel adaptive controller for a small-size unmanned helicopter using the reinforcement learning (RL) control methodology. The helicopter is subject to system uncertainties and unknown external disturbances. The dynamic unmodeling uncertainties of the system are estimated online by the actor network, and the tracking performance function is optimized via the critic network. The estimation error of the actor-critic network and the external unknown disturbances are compensated via the nonlinear robust component based on the sliding mode control method. The stability of the closed-loop system and the asymptotic convergence of the attitude tracking error are proved via the Lyapunov-based stability analysis. Finally, real-time experiments are performed on a helicopter control testbed. The experimental results show that the proposed controller achieves good control performance.
Collapse
|
5
|
Mao R, Cui R, Chen CLP. Broad Learning With Reinforcement Learning Signal Feedback: Theory and Applications. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:2952-2964. [PMID: 33460385 DOI: 10.1109/tnnls.2020.3047941] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Broad learning systems (BLSs) have attracted considerable attention due to their powerful ability in efficient discriminative learning. In this article, a modified BLS with reinforcement learning signal feedback (BLRLF) is proposed as an efficient method for improving the performance of standard BLS. The main differences between our research and BLS are as follows. First, we add weight optimization after adding additional nodes or new training samples. Motivated by the weight iterative optimization in the convolution neural network (CNN), we use the output of the network as feedback while employing value iteration (VI)-based adaptive dynamic programming (ADP) to facilitate calculation of near-optimal increments of connection weights. Second, different from the homogeneous incremental algorithms in standard BLS, we integrate those broad expansion methods, and the heuristic search method is used to enable the proposed BLRLF to optimize the network structure autonomously. Although the training time is affected to a certain extent compared with BLS, the newly proposed BLRLF still retains a fast computational nature. Finally, the proposed BLRLF is evaluated using popular benchmarks from the UC Irvine Machine Learning Repository and many other challenging data sets. These results show that BLRLF outperforms many state-of-the-art deep learning algorithms and shallow networks proposed in recent years.
Collapse
|
6
|
Han HG, Zhang L, Zhang LL, He Z, Qiao JF. Cooperative Optimal Controller and Its Application to Activated Sludge Process. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:3938-3951. [PMID: 31329145 DOI: 10.1109/tcyb.2019.2925143] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
With the increasing complexity and scale of activated sludge process (ASP), it is quite challenging to coordinate the performance indices with different time scales. To address this problem, a cooperative optimal controller (COC) is proposed to improve the operation performance in this paper. First, a cooperative optimal scheme is developed for designing the control system, where the different time-scale performance indices are formulated by two levels. Second, a data-driven surrogate-assisted optimization (DDSAO) algorithm is provided to optimize the cooperative objectives, where a surrogate model is established for evaluating the feasibility of optimal solutions based on the minimum squared error. Third, an adaptive predictive control strategy is investigated to derive the control laws for improving the tracking control performance. Finally, the proposed COC is tested on benchmark simulation model No. 1 (BSM1). The results demonstrate that the proposed COC is able to coordinate the multiple time-scale performance indices and achieve the competitive optimal control performance.
Collapse
|
7
|
Bai W, Zhou Q, Li T, Li H. Adaptive Reinforcement Learning Neural Network Control for Uncertain Nonlinear System With Input Saturation. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:3433-3443. [PMID: 31251205 DOI: 10.1109/tcyb.2019.2921057] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, an adaptive neural network (NN) control problem is investigated for discrete-time nonlinear systems with input saturation. Radial-basis-function (RBF) NNs, including critic NNs and action NNs, are employed to approximate the utility functions and system uncertainties, respectively. In the previous works, a gradient descent scheme is applied to update weight vectors, which may lead to local optimal problem. To circumvent this problem, a multigradient recursive (MGR) reinforcement learning scheme is proposed, which utilizes both the current gradient and the past gradients. As a consequence, the MGR scheme not only eliminates the local optimal problem but also guarantees faster convergence rate than the gradient descent scheme. Moreover, the constraint of actuator input saturation is considered. The closed-loop system stability is developed by using the Lyapunov stability theory, and it is proved that all the signals in the closed-loop system are semiglobal uniformly ultimately bounded (SGUUB). Finally, the effectiveness of the proposed approach is further validated via some simulation results.
Collapse
|
8
|
Robust optimal control for a class of nonlinear systems with unknown disturbances based on disturbance observer and policy iteration. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.082] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
9
|
Shi W, Song S, Wu C, Chen CLP. Multi Pseudo Q-Learning-Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3534-3546. [PMID: 30602426 DOI: 10.1109/tnnls.2018.2884797] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This paper investigates trajectory tracking problem for a class of underactuated autonomous underwater vehicles (AUVs) with unknown dynamics and constrained inputs. Different from existing policy gradient methods which employ single actor critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively. Specifically, for the critics, the expected absolute Bellman error-based updating rule is used to choose the worst critic to be updated in each time step. Subsequently, to calculate the loss function with more accurate target value for the chosen critic, Pseudo Q-learning, which uses subgreedy policy to replace the greedy policy in Q-learning, is developed for continuous action spaces, and Multi Pseudo Q-learning (MPQ) is proposed to reduce the overestimation of action-value function and to stabilize the learning. As for the actors, deterministic policy gradient is applied to update the weights, and the final learned policy is defined as the average of all actors to avoid large but bad updates. Moreover, the stability analysis of the learning is given qualitatively. The effectiveness and generality of the proposed MPQ-based deterministic policy gradient (MPQ-DPG) algorithm are verified by the application on AUV with two different reference trajectories. In addition, the results demonstrate high-level tracking control accuracy and stable learning of MPQ-DPG. Besides, the results also validate that increasing the number of the actors and critics will further improve the performance.
Collapse
|
10
|
Dong L, Yan J, Yuan X, He H, Sun C. Functional Nonlinear Model Predictive Control Based on Adaptive Dynamic Programming. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:4206-4218. [PMID: 30130246 DOI: 10.1109/tcyb.2018.2859801] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper presents a functional model predictive control (MPC) approach based on an adaptive dynamic programming (ADP) algorithm with the abilities of handling control constraints and disturbances for the optimal control of nonlinear discrete-time systems. In the proposed ADP-based nonlinear MPC (NMPC) structure, a neural-network-based identification is established first to reconstruct the unknown system dynamics. Then, the actor-critic scheme is adopted with a critic network to estimate the index performance function and an action network to approximate the optimal control input. Meanwhile, as the MPC strategy can effectively determine the current control by solving a finite horizon open-loop optimal control problem, in the proposed algorithm, the infinite horizon is decomposed into a series of finite horizons to obtain the optimal control. In each finite horizon, the finite ADP algorithm solves the optimal control problem subject to the terminal constraint, the control constraint, and the disturbance. The uniform ultimate boundedness of the closed-loop system is verified by the Lyapunov approach. Finally, the ADP-based NMPC is conducted on two different cases and the simulation results demonstrate the quick response and strong robustness of the proposed method.
Collapse
|
11
|
Wei C, Luo J, Dai H, Duan G. Learning-Based Adaptive Attitude Control of Spacecraft Formation With Guaranteed Prescribed Performance. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:4004-4016. [PMID: 30072354 DOI: 10.1109/tcyb.2018.2857400] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper investigates a novel leader-following attitude control approach for spacecraft formation under the preassigned two-layer performance with consideration of unknown inertial parameters, external disturbance torque, and unmodeled uncertainty. First, two-layer prescribed performance is preselected for both the attitude angular and angular velocity tracking errors. Subsequently, a distributed two-layer performance controller is devised, which can guarantee that all the involved closed-loop signals are uniformly ultimately bounded. In order to tackle the defect of statically two-layer performance controller, learning-based control strategy is introduced to serve as an adaptive supplementary controller based on adaptive dynamic programming technique. This enhances the adaptiveness of the statically two-layer performance controller with respect to unexpected uncertainty dramatically, without any prior knowledge of the inertial information. Furthermore, by employing the robustly positively invariant theory, the input-to-state stability is rigorously proven under the designed learning-based distributed controller. Finally, two groups of simulation examples are organized to validate the feasibility and effectiveness of the proposed distributed control approach.
Collapse
|
12
|
Synchronous optimal control method for nonlinear systems with saturating actuators and unknown dynamics using off-policy integral reinforcement learning. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.04.036] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
13
|
Neural-network-based learning algorithms for cooperative games of discrete-time multi-player systems with control constraints via adaptive dynamic programming. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.02.107] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
14
|
Song R, Zhu L. Stable value iteration for two-player zero-sum game of discrete-time nonlinear systems based on adaptive dynamic programming. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.03.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
15
|
Li L, Li D, Song T, Xu X. Actor-Critic Learning Control Based on -Regularized Temporal-Difference Prediction With Gradient Correction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5899-5909. [PMID: 29993664 DOI: 10.1109/tnnls.2018.2808203] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Actor-critic based on the policy gradient (PG-based AC) methods have been widely studied to solve learning control problems. In order to increase the data efficiency of learning prediction in the critic of PG-based AC, studies on how to use recursive least-squares temporal difference (RLS-TD) algorithms for policy evaluation have been conducted in recent years. In such contexts, the critic RLS-TD evaluates an unknown mixed policy generated by a series of different actors, but not one fixed policy generated by the current actor. Therefore, this AC framework with RLS-TD critic cannot be proved to converge to the optimal fixed point of learning problem. To address the above problem, this paper proposes a new AC framework named critic-iteration PG (CIPG), which learns the state-value function of current policy in an on-policy way and performs gradient ascent in the direction of improving discounted total reward. During each iteration, CIPG keeps the policy parameters fixed and evaluates the resulting fixed policy by -regularized RLS-TD critic. Our convergence analysis extends previous convergence analysis of PG with function approximation to the case of RLS-TD critic. The simulation results demonstrate that the -regularization term in the critic of CIPG is undamped during the learning process, and CIPG has better learning efficiency and faster convergence rate than conventional AC learning control methods.
Collapse
|
16
|
Jiang H, Zhang H, Han J, Zhang K. Iterative adaptive dynamic programming methods with neural network implementation for multi-player zero-sum games. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.04.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
17
|
Liu Y, Wang Z, Yuan Y, Alsaadi FE. Partial-Nodes-Based State Estimation for Complex Networks With Unbounded Distributed Delays. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:3906-3912. [PMID: 28910779 DOI: 10.1109/tnnls.2017.2740400] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this brief, the new problem of partial-nodes-based (PNB) state estimation problem is investigated for a class of complex network with unbounded distributed delays and energy-bounded measurement noises. The main novelty lies in that the states of the complex network are estimated through measurement outputs of a fraction of the network nodes. Such fraction of the nodes is determined by either the practical availability or the computational necessity. The PNB state estimator is designed such that the error dynamics of the network state estimation is exponentially ultimately bounded in the presence of measurement errors. Sufficient conditions are established to ensure the existence of the PNB state estimators and then the explicit expression of the gain matrices of such estimators is characterized. When the network measurements are free of noises, the main results specialize to the case of exponential stability for error dynamics. Numerical examples are presented to verify the theoretical results.
Collapse
|
18
|
Talaei B, Jagannathan S, Singler J. Boundary Control of 2-D Burgers' PDE: An Adaptive Dynamic Programming Approach. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:3669-3681. [PMID: 28866603 DOI: 10.1109/tnnls.2017.2736786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, an adaptive dynamic programming-based near optimal boundary controller is developed for partial differential equations (PDEs) modeled by the uncertain Burgers' equation under Neumann boundary condition in 2-D. Initially, Hamilton-Jacobi-Bellman equation is derived in infinite-dimensional space. Subsequently, a novel neural network (NN) identifier is introduced to approximate the nonlinear dynamics in the 2-D PDE. The optimal control input is derived by online estimation of the value function through an additional NN-based forward-in-time estimation and approximated dynamic model. Novel update laws are developed for estimation of the identifier and value function online. The designed control policy can be applied using a finite number of actuators at the boundaries. Local ultimate boundedness of the closed-loop system is studied in detail using Lyapunov theory. Simulation results confirm the optimizing performance of the proposed controller on an unstable 2-D Burgers' equation.
Collapse
|
19
|
Yang X, He H. Adaptive critic designs for optimal control of uncertain nonlinear systems with unmatched interconnections. Neural Netw 2018; 105:142-153. [PMID: 29843095 DOI: 10.1016/j.neunet.2018.05.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Revised: 04/13/2018] [Accepted: 05/04/2018] [Indexed: 10/16/2022]
Abstract
In this paper, we develop a novel optimal control strategy for a class of uncertain nonlinear systems with unmatched interconnections. To begin with, we present a stabilizing feedback controller for the interconnected nonlinear systems by modifying an array of optimal control laws of auxiliary subsystems. We also prove that this feedback controller ensures a specified cost function to achieve optimality. Then, under the framework of adaptive critic designs, we use critic networks to solve the Hamilton-Jacobi-Bellman equations associated with auxiliary subsystem optimal control laws. The critic network weights are tuned through the gradient descent method combined with an additional stabilizing term. By using the newly established weight tuning rules, we no longer need the initial admissible control condition. In addition, we demonstrate that all signals in the closed-loop auxiliary subsystems are stable in the sense of uniform ultimate boundedness by using classic Lyapunov techniques. Finally, we provide an interconnected nonlinear plant to validate the present control scheme.
Collapse
Affiliation(s)
- Xiong Yang
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China; Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881, USA.
| | - Haibo He
- Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881, USA.
| |
Collapse
|
20
|
Talaei B, Jagannathan S, Singler J. Boundary Control of Linear Uncertain 1-D Parabolic PDE Using Approximate Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1213-1225. [PMID: 28278484 DOI: 10.1109/tnnls.2017.2669944] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This paper develops a near optimal boundary control method for distributed parameter systems governed by uncertain linear 1-D parabolic partial differential equations (PDE) by using approximate dynamic programming. A quadratic surface integral is proposed to express the optimal cost functional for the infinite-dimensional state space. Accordingly, the Hamilton-Jacobi-Bellman (HJB) equation is formulated in the infinite-dimensional domain without using any model reduction. Subsequently, a neural network identifier is developed to estimate the unknown spatially varying coefficient in PDE dynamics. Novel tuning law is proposed to guarantee the boundedness of identifier approximation error in the PDE domain. A radial basis network (RBN) is subsequently proposed to generate an approximate solution for the optimal surface kernel function online. The tuning law for near optimal RBN weights is created, such that the HJB equation error is minimized while the dynamics are identified and closed-loop system remains stable. Ultimate boundedness (UB) of the closed-loop system is verified by using the Lyapunov theory. The performance of the proposed controller is successfully confirmed by simulation on an unstable diffusion-reaction process.
Collapse
|
21
|
Liu L, Wang Z, Zhang H. Neural-Network-Based Robust Optimal Tracking Control for MIMO Discrete-Time Systems With Unknown Uncertainty Using Adaptive Critic Design. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1239-1251. [PMID: 28362616 DOI: 10.1109/tnnls.2017.2660070] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper is concerned with the robust optimal tracking control strategy for a class of nonlinear multi-input multi-output discrete-time systems with unknown uncertainty via adaptive critic design (ACD) scheme. The main purpose is to establish an adaptive actor-critic control method, so that the cost function in the procedure of dealing with uncertainty is minimum and the closed-loop system is stable. Based on the neural network approximator, an action network is applied to generate the optimal control signal and a critic network is used to approximate the cost function, respectively. In contrast to the previous methods, the main features of this paper are: 1) the ACD scheme is integrated into the controllers to cope with the uncertainty and 2) a novel cost function, which is not in quadric form, is proposed so that the total cost in the design procedure is reduced. It is proved that the optimal control signals and the tracking errors are uniformly ultimately bounded even when the uncertainty exists. Finally, a numerical simulation is developed to show the effectiveness of the present approach.
Collapse
|
22
|
Zhang H, Cui X, Luo Y, Jiang H. Finite-Horizon $H_{\infty }$ Tracking Control for Unknown Nonlinear Systems With Saturating Actuators. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1200-1212. [PMID: 28362620 DOI: 10.1109/tnnls.2017.2669099] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, a neural network (NN)-based online model-free integral reinforcement learning algorithm is developed to solve the finite-horizon optimal tracking control problem for completely unknown nonlinear continuous-time systems with disturbance and saturating actuators (constrained control input). An augmented system is constructed with the tracking error system and the command generator system. A time-varying Hamilton-Jacobi-Isaacs (HJI) equation is formulated for the augmented problem, which is extremely difficult or impossible to solve due to its time-dependent property and nonlinearity. Then, an actor-critic-disturbance NN structure-based scheme is proposed to learn the time-varying solution to the HJI equation in real time without using the knowledge of system dynamics. Since the solution to the HJI equation is time-dependent, the form of NNs representation with constant weights and time-dependent activation functions is considered. Furthermore, an extra error is incorporated in order to satisfy the terminal constraints in the weight update law. Convergence and stability proofs are given based on the Lyapunov theory for nonautonomous systems. Two simulation examples are provided to demonstrate the effectiveness of the designed algorithm.
Collapse
|
23
|
Wei Q, Liu D, Lin Q, Song R. Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:957-969. [PMID: 28141530 DOI: 10.1109/tnnls.2016.2638863] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, a novel adaptive dynamic programming (ADP) algorithm, called "iterative zero-sum ADP algorithm," is developed to solve infinite-horizon discrete-time two-player zero-sum games of nonlinear systems. The present iterative zero-sum ADP algorithm permits arbitrary positive semidefinite functions to initialize the upper and lower iterations. A novel convergence analysis is developed to guarantee the upper and lower iterative value functions to converge to the upper and lower optimums, respectively. When the saddle-point equilibrium exists, it is emphasized that both the upper and lower iterative value functions are proved to converge to the optimal solution of the zero-sum game, where the existence criteria of the saddle-point equilibrium are not required. If the saddle-point equilibrium does not exist, the upper and lower optimal performance index functions are obtained, respectively, where the upper and lower performance index functions are proved to be not equivalent. Finally, simulation results and comparisons are shown to illustrate the performance of the present method.
Collapse
|
24
|
Wei Q, Li B, Song R. Discrete-Time Stable Generalized Self-Learning Optimal Control With Approximation Errors. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1226-1238. [PMID: 28362617 DOI: 10.1109/tnnls.2017.2661865] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, a generalized policy iteration (GPI) algorithm with approximation errors is developed for solving infinite horizon optimal control problems for nonlinear systems. The developed stable GPI algorithm provides a general structure of discrete-time iterative adaptive dynamic programming algorithms, by which most of the discrete-time reinforcement learning algorithms can be described using the GPI structure. It is for the first time that approximation errors are explicitly considered in the GPI algorithm. The properties of the stable GPI algorithm with approximation errors are analyzed. The admissibility of the approximate iterative control law can be guaranteed if the approximation errors satisfy the admissibility criteria. The convergence of the developed algorithm is established, which shows that the iterative value function is convergent to a finite neighborhood of the optimal performance index function, if the approximate errors satisfy the convergence criterion. Finally, numerical examples and comparisons are presented.
Collapse
|
25
|
Jiang H, Zhang H. Iterative ADP learning algorithms for discrete-time multi-player games. Artif Intell Rev 2018. [DOI: 10.1007/s10462-017-9603-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
26
|
Data-driven adaptive dynamic programming schemes for non-zero-sum games of unknown discrete-time nonlinear systems. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.09.020] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
27
|
Jiang H, Zhang H, Cui Y, Xiao G. Robust control scheme for a class of uncertain nonlinear systems with completely unknown dynamics using data-driven reinforcement learning method. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.07.058] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
28
|
Wei Q, Liu D, Lin Q. Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Admissibility and Termination Analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:2490-2502. [PMID: 27529879 DOI: 10.1109/tnnls.2016.2593743] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, a novel local value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. The focuses of this paper are to study admissibility properties and the termination criteria of discrete-time local value iteration ADP algorithms. In the discrete-time local value iteration ADP algorithm, the iterative value functions and the iterative control laws are both updated in a given subset of the state space in each iteration, instead of the whole state space. For the first time, admissibility properties of iterative control laws are analyzed for the local value iteration ADP algorithm. New termination criteria are established, which terminate the iterative local ADP algorithm with an admissible approximate optimal control law. Finally, simulation results are given to illustrate the performance of the developed algorithm.In this paper, a novel local value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. The focuses of this paper are to study admissibility properties and the termination criteria of discrete-time local value iteration ADP algorithms. In the discrete-time local value iteration ADP algorithm, the iterative value functions and the iterative control laws are both updated in a given subset of the state space in each iteration, instead of the whole state space. For the first time, admissibility properties of iterative control laws are analyzed for the local value iteration ADP algorithm. New termination criteria are established, which terminate the iterative local ADP algorithm with an admissible approximate optimal control law. Finally, simulation results are given to illustrate the performance of the developed algorithm.
Collapse
Affiliation(s)
- Qinglai Wei
- The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Derong Liu
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China
| | - Qiao Lin
- The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
29
|
Wei Q, Liu D, Lin Q, Song R. Discrete-Time Optimal Control via Local Policy Iteration Adaptive Dynamic Programming. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3367-3379. [PMID: 27448382 DOI: 10.1109/tcyb.2016.2586082] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, a discrete-time optimal control scheme is developed via a novel local policy iteration adaptive dynamic programming algorithm. In the discrete-time local policy iteration algorithm, the iterative value function and iterative control law can be updated in a subset of the state space, where the computational burden is relaxed compared with the traditional policy iteration algorithm. Convergence properties of the local policy iteration algorithm are presented to show that the iterative value function is monotonically nonincreasing and converges to the optimum under some mild conditions. The admissibility of the iterative control law is proven, which shows that the control system can be stabilized under any of the iterative control laws, even if the iterative control law is updated in a subset of the state space. Finally, two simulation examples are given to illustrate the performance of the developed method.
Collapse
|
30
|
Luo B, Liu D, Wu HN, Wang D, Lewis FL. Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3341-3354. [PMID: 27893404 DOI: 10.1109/tcyb.2016.2623859] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The model-free optimal control problem of general discrete-time nonlinear systems is considered in this paper, and a data-based policy gradient adaptive dynamic programming (PGADP) algorithm is developed to design an adaptive optimal controller method. By using offline and online data rather than the mathematical system model, the PGADP algorithm improves control policy with a gradient descent scheme. The convergence of the PGADP algorithm is proved by demonstrating that the constructed Q -function sequence converges to the optimal Q -function. Based on the PGADP algorithm, the adaptive control method is developed with an actor-critic structure and the method of weighted residuals. Its convergence properties are analyzed, where the approximate Q -function converges to its optimum. Computer simulation results demonstrate the effectiveness of the PGADP-based adaptive control method.
Collapse
|
31
|
Wang Z, Liu X, Liu K, Li S, Wang H. Backstepping-Based Lyapunov Function Construction Using Approximate Dynamic Programming and Sum of Square Techniques. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3393-3403. [PMID: 27337732 DOI: 10.1109/tcyb.2016.2574747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, backstepping for a class of block strict-feedback nonlinear systems is considered. Since the input function could be zero for each backstepping step, the backstepping technique cannot be applied directly. Based on the assumption that nonlinear systems are polynomials, for each backstepping step, Lypunov function can be constructed in a polynomial form by sum of square (SOS) technique. The virtual control can be obtained by the Sontag feedback formula, which is equivalent to an optimal control-the solution of a Hamilton-Jacobi-Bellman equation. Thus, approximate dynamic programming (ADP) could be used to estimate value functions (Lyapunov functions) instead of SOS. Through backstepping technique, the control Lyapunov function (CLF) of the full system is constructed finally making use of the strict-feedback structure and a stabilizable controller can be obtained through the constructed CLF. The contributions of the proposed method are twofold. On one hand, introducing ADP into backstepping can broaden the application of the backstepping technique. A class of block strict-feedback systems can be dealt by the proposed method and the requirement of nonzero input function for each backstepping step can be relaxed. On the other hand, backstepping with surface dynamic control actually reduces the computation complexity of ADP through constructing one part of the CLF by solving semidefinite programming using SOS. Simulation results verify contributions of the proposed method.
Collapse
|
32
|
Luo B, Liu D, Huang T, Yang X, Ma H. Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.05.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
33
|
Leader–follower optimal coordination tracking control for multi-agent systems with unknown internal states. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.03.066] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
34
|
Tracking control optimization scheme of continuous-time nonlinear system via online single network adaptive critic design method. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.04.008] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
35
|
Mu C, Ni Z, Sun C, He H. Data-Driven Tracking Control With Adaptive Dynamic Programming for a Class of Continuous-Time Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:1460-1470. [PMID: 27116758 DOI: 10.1109/tcyb.2016.2548941] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A data-driven adaptive tracking control approach is proposed for a class of continuous-time nonlinear systems using a recent developed goal representation heuristic dynamic programming (GrHDP) architecture. The major focus of this paper is on designing a multivariable tracking scheme, including the filter-based action network (FAN) architecture, and the stability analysis in continuous-time fashion. In this design, the FAN is used to observe the system function, and then generates the corresponding control action together with the reference signals. The goal network will provide an internal reward signal adaptively based on the current system states and the control action. This internal reward signal is assigned as the input for the critic network, which approximates the cost function over time. We demonstrate its improved tracking performance in comparison with the existing heuristic dynamic programming (HDP) approach under the same parameter and environment settings. The simulation results of the multivariable tracking control on two examples have been presented to show that the proposed scheme can achieve better control in terms of learning speed and overall performance.
Collapse
|
36
|
Song R, Wei Q, Song B. Neural-network-based synchronous iteration learning method for multi-player zero-sum games. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.02.051] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
37
|
Jiang H, Zhang H, Luo Y, Cui X. H ∞ control with constrained input for completely unknown nonlinear systems using data-driven reinforcement learning method. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.11.041] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
38
|
Jiang H, Zhang H, Liu Y, Han J. Neural-network-based control scheme for a class of nonlinear systems with actuator faults via data-driven reinforcement learning method. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.01.047] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
39
|
Wei Q, Lewis FL, Sun Q, Yan P, Song R. Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:1224-1237. [PMID: 27093714 DOI: 10.1109/tcyb.2016.2542923] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, a novel discrete-time deterministic Q -learning algorithm is developed. In each iteration of the developed Q -learning algorithm, the iterative Q function is updated for all the state and control spaces, instead of updating for a single state and a single control in traditional Q -learning algorithm. A new convergence criterion is established to guarantee that the iterative Q function converges to the optimum, where the convergence criterion of the learning rates for traditional Q -learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative Q function are analyzed to obtain the convergence criterion, instead of analyzing the iterative Q function itself. For convenience of analysis, the convergence properties for undiscounted case of the deterministic Q -learning algorithm are first developed. Then, considering the discounted factor, the convergence criterion for the discounted case is established. Neural networks are used to approximate the iterative Q function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic Q -learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.
Collapse
|
40
|
Mu C, Ni Z, Sun C, He H. Air-Breathing Hypersonic Vehicle Tracking Control Based on Adaptive Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:584-598. [PMID: 26863677 DOI: 10.1109/tnnls.2016.2516948] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we propose a data-driven supplementary control approach with adaptive learning capability for air-breathing hypersonic vehicle tracking control based on action-dependent heuristic dynamic programming (ADHDP). The control action is generated by the combination of sliding mode control (SMC) and the ADHDP controller to track the desired velocity and the desired altitude. In particular, the ADHDP controller observes the differences between the actual velocity/altitude and the desired velocity/altitude, and then provides a supplementary control action accordingly. The ADHDP controller does not rely on the accurate mathematical model function and is data driven. Meanwhile, it is capable to adjust its parameters online over time under various working conditions, which is very suitable for hypersonic vehicle system with parameter uncertainties and disturbances. We verify the adaptive supplementary control approach versus the traditional SMC in the cruising flight, and provide three simulation studies to illustrate the improved performance with the proposed approach.
Collapse
|
41
|
Song R, Lewis FL, Wei Q. Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:704-713. [PMID: 27448374 DOI: 10.1109/tnnls.2016.2582849] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This paper establishes an off-policy integral reinforcement learning (IRL) method to solve nonlinear continuous-time (CT) nonzero-sum (NZS) games with unknown system dynamics. The IRL algorithm is presented to obtain the iterative control and off-policy learning is used to allow the dynamics to be completely unknown. Off-policy IRL is designed to do policy evaluation and policy improvement in the policy iteration algorithm. Critic and action networks are used to obtain the performance index and control for each player. The gradient descent algorithm makes the update of critic and action weights simultaneously. The convergence analysis of the weights is given. The asymptotic stability of the closed-loop system and the existence of Nash equilibrium are proved. The simulation study demonstrates the effectiveness of the developed method for nonlinear CT NZS games with unknown system dynamics.
Collapse
|
42
|
Zhu Y, Zhao D. Comprehensive comparison of online ADP algorithms for continuous-time optimal control. Artif Intell Rev 2017. [DOI: 10.1007/s10462-017-9548-4] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
43
|
Liu YJ, Tong S. Optimal Control-Based Adaptive NN Design for a Class of Nonlinear Discrete-Time Block-Triangular Systems. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:2670-2680. [PMID: 26929080 DOI: 10.1109/tcyb.2015.2494007] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we propose an optimal control scheme-based adaptive neural network design for a class of unknown nonlinear discrete-time systems. The controlled systems are in a block-triangular multi-input-multi-output pure-feedback structure, i.e., there are both state and input couplings and nonaffine functions to be included in every equation of each subsystem. The design objective is to provide a control scheme, which not only guarantees the stability of the systems, but also achieves optimal control performance. The main contribution of this paper is that it is for the first time to achieve the optimal performance for such a class of systems. Owing to the interactions among subsystems, making an optimal control signal is a difficult task. The design ideas are that: 1) the systems are transformed into an output predictor form; 2) for the output predictor, the ideal control signal and the strategic utility function can be approximated by using an action network and a critic network, respectively; and 3) an optimal control signal is constructed with the weight update rules to be designed based on a gradient descent method. The stability of the systems can be proved based on the difference Lyapunov method. Finally, a numerical simulation is given to illustrate the performance of the proposed scheme.
Collapse
|
44
|
Hui G, Xie X. Novel observer-based output feedback control synthesis of discrete-time nonlinear control systems via a fuzzy approach. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.05.033] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
45
|
Yang X, Liu D, Luo B, Li C. Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.07.051] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
46
|
Xie S, Zhong W, Xie K, Yu R, Zhang Y. Fair Energy Scheduling for Vehicle-to-Grid Networks Using Adaptive Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:1697-1707. [PMID: 26930694 DOI: 10.1109/tnnls.2016.2526615] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Research on the smart grid is being given enormous supports worldwide due to its great significance in solving environmental and energy crises. Electric vehicles (EVs), which are powered by clean energy, are adopted increasingly year by year. It is predictable that the huge charge load caused by high EV penetration will have a considerable impact on the reliability of the smart grid. Therefore, fair energy scheduling for EV charge and discharge is proposed in this paper. By using the vehicle-to-grid technology, the scheduler controls the electricity loads of EVs considering fairness in the residential distribution network. We propose contribution-based fairness, in which EVs with high contributions have high priorities to obtain charge energy. The contribution value is defined by both the charge/discharge energy and the timing of the action. EVs can achieve higher contribution values when discharging during the load peak hours. However, charging during this time will decrease the contribution values seriously. We formulate the fair energy scheduling problem as an infinite-horizon Markov decision process. The methodology of adaptive dynamic programming is employed to maximize the long-term fairness by processing online network training. The numerical results illustrate that the proposed EV energy scheduling is able to mitigate and flatten the peak load in the distribution network. Furthermore, contribution-based fairness achieves a fast recovery of EV batteries that have deeply discharged and guarantee fairness in the full charge time of all EVs.
Collapse
|
47
|
Wang L, Gong D, Zhang B, Ma T. Novel pinning control strategy for coupled neural networks with communication column graphs. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.02.015] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
48
|
Song R, Lewis FL, Wei Q, Zhang H. Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:1041-1050. [PMID: 25935054 DOI: 10.1109/tcyb.2015.2421338] [Citation(s) in RCA: 72] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
An optimal control method is developed for unknown continuous-time systems with unknown disturbances in this paper. The integral reinforcement learning (IRL) algorithm is presented to obtain the iterative control. Off-policy learning is used to allow the dynamics to be completely unknown. Neural networks are used to construct critic and action networks. It is shown that if there are unknown disturbances, off-policy IRL may not converge or may be biased. For reducing the influence of unknown disturbances, a disturbances compensation controller is added. It is proven that the weight errors are uniformly ultimately bounded based on Lyapunov techniques. Convergence of the Hamiltonian function is also proven. The simulation study demonstrates the effectiveness of the proposed optimal control method for unknown systems with disturbances.
Collapse
|
49
|
Wei Q, Liu D, Lin H. Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:840-853. [PMID: 26552103 DOI: 10.1109/tcyb.2015.2492242] [Citation(s) in RCA: 91] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, a value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon undiscounted optimal control problems for discrete-time nonlinear systems. The present value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize the algorithm. A novel convergence analysis is developed to guarantee that the iterative value function converges to the optimal performance index function. Initialized by different initial functions, it is proven that the iterative value function will be monotonically nonincreasing, monotonically nondecreasing, or nonmonotonic and will converge to the optimum. In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms. It is emphasized that new termination criteria are established to guarantee the effectiveness of the iterative control laws. Neural networks are used to approximate the iterative value function and compute the iterative control law, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.
Collapse
|
50
|
|