1
|
Jiang Y, Liu L, Feng G. Adaptive Optimal Control of Networked Nonlinear Systems With Stochastic Sensor and Actuator Dropouts Based on Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:3107-3120. [PMID: 35731768 DOI: 10.1109/tnnls.2022.3183020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This article investigates the adaptive optimal control problem for networked discrete-time nonlinear systems with stochastic packet dropouts in both controller-to-actuator and sensor-to-controller channels. A Bernoulli model-based Hamilton-Jacobi-Bellman (BHJB) equation is first developed to deal with the corresponding nonadaptive optimal control problem with known system dynamics and probability models of packet dropouts. The solvability of the nonadaptive optimal control problem is analyzed, and the stability and optimality of the resulting closed-loop system are proven. Two reinforcement learning (RL)-based policy iteration (PI) and value iteration (VI) algorithms are further developed to obtain the solution to the BHJB equation, and their convergence analysis is also provided. Furthermore, in the absence of a priori knowledge of partial system dynamics and probabilities of packet dropouts, two more online RL-based PI and VI algorithms are developed by using critic-actor approximators and packet dropout probability estimator. It is shown that the concerned adaptive optimal control problem can be solved by the proposed online RL-based PI and VI algorithms. Finally, simulation studies of a single-link manipulator are provided to illustrate the effectiveness of the proposed approaches.
Collapse
|
2
|
Ming Z, Zhang H, Luo Y, Wang W. Dynamic Event-Based Control for Stochastic Optimal Regulation of Nonlinear Networked Control Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7299-7308. [PMID: 35038299 DOI: 10.1109/tnnls.2022.3140478] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this article, a dynamic event-triggered stochastic adaptive dynamic programming (ADP)-based problem is investigated for nonlinear systems with a communication network. First, a novel condition of obtaining stochastic input-to-state stability (SISS) of discrete version is skillfully established. Then, the event-triggered control strategy is devised, and a near-optimal control policy is designed using an identifier-actor-critic neural networks (NNs) with an event-sampled state vector. Above all, an adaptive static event sampling condition is designed by using the Lyapunov technique to ensure ultimate boundedness (UB) for the closed-loop system. However, since the static event-triggered rule only depends on the current state, regardless of previous values, this article presents an explicit dynamic event-triggered rule. Furthermore, we prove that the lower bound of sampling interval for the proposed dynamic event-triggered control strategy is greater than one, which avoids the so-called triviality phenomenon. Finally, the effectiveness of the proposed near-optimal control pattern is verified by a simulation example.
Collapse
|
3
|
Niu H, Bhowmick C, Jagannathan S. Attack Detection and Approximation in Nonlinear Networked Control Systems Using Neural Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:235-245. [PMID: 30892252 DOI: 10.1109/tnnls.2019.2900430] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In networked control systems (NCS), a certain class of attacks on the communication network is known to raise traffic flows causing delays and packet losses to increase. This paper presents a novel neural network (NN)-based attack detection and estimation scheme that captures the abnormal traffic flow due to a class of attacks on the communication links within the feedback loop of an NCS. By modeling the unknown network flow as a nonlinear function at the bottleneck node and using a NN observer, the network attack detection residual is defined and utilized to determine the onset of an attack in the communication network when the residual exceeds a predefined threshold. Upon detection, another NN is used to estimate the flow injected by the attack. For the physical system, we develop an attack detection scheme by using an adaptive dynamic programming-based optimal event-triggered NN controller in the presence of network delays and packet losses. Attacks on the network as well as on the sensors of the physical system can be detected and estimated with the proposed scheme. The simulation results confirm theoretical conclusions.
Collapse
|
4
|
Yuan Y, Wang Z, Zhang P, Liu H. Near-Optimal Resilient Control Strategy Design for State-Saturated Networked Systems Under Stochastic Communication Protocol. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:3155-3167. [PMID: 29994413 DOI: 10.1109/tcyb.2018.2840430] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, the near-optimal resilient control strategy design problem is investigated for a class of discrete time-varying system in simultaneous presence of stochastic communication protocols (SCPs), gain perturbations, state saturations, and additive nonlinearities. In the sensor-to-controller network, only one sensor is permitted to get access to the communication media so as to avoid possible data collisions. Described by a Markov chain, the SCP is employed to determine which sensor should obtain the access to the network at a certain time. Furthermore, two kinds of well-recognized complexities (i.e., state saturations and additive nonlinearities) are considered in the system model and the phenomenon of controller gain perturbation is also taken into special consideration. Accordingly, the resilient control strategy is designed by: 1) deriving a certain upper bound on the associate cost function of underlying systems and 2) minimizing such an upper bound through the utilization of the completing-the-square technique and the Moore-Penrose pseudo inverse. The resilient control strategy is obtained in an iterative manner by solving a set of coupled backward Riccati-like recursions. Furthermore, based on the proposed control strategies, the infinite horizon case is considered and the corresponding upper bound of the cost function is explicitly provided. Finally, numerical simulations are carried out on power systems in order to verify the validity of the proposed resilient control algorithms.
Collapse
|
5
|
Xu X, Chen H, Lian C, Li D. Learning-Based Predictive Control for Discrete-Time Nonlinear Systems With Stochastic Disturbances. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:6202-6213. [PMID: 29993751 DOI: 10.1109/tnnls.2018.2820019] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, a learning-based predictive control (LPC) scheme is proposed for adaptive optimal control of discrete-time nonlinear systems under stochastic disturbances. The proposed LPC scheme is different from conventional model predictive control (MPC), which uses open-loop optimization or simplified closed-loop optimal control techniques in each horizon. In LPC, the control task in each horizon is formulated as a closed-loop nonlinear optimal control problem and a finite-horizon iterative reinforcement learning (RL) algorithm is developed to obtain the closed-loop optimal/suboptimal solutions. Therefore, in LPC, RL and adaptive dynamic programming (ADP) are used as a new class of closed-loop learning-based optimization techniques for nonlinear predictive control with stochastic disturbances. Moreover, LPC also decomposes the infinite-horizon optimal control problem in previous RL and ADP methods into a series of finite horizon problems, so that the computational costs are reduced and the learning efficiency can be improved. Convergence of the finite-horizon iterative RL algorithm in each prediction horizon and the Lyapunov stability of the closed-loop control system are proved. Moreover, by using successive policy updates between adjoint time horizons, LPC also has lower computational costs than conventional MPC which has independent optimization procedures between two different prediction horizons. Simulation results illustrate that compared with conventional nonlinear MPC as well as ADP, the proposed LPC scheme can obtain a better performance both in terms of policy optimality and computational efficiency.
Collapse
|
6
|
Lewis FL. Tracking Control for Linear Discrete-Time Networked Control Systems With Unknown Dynamics and Dropout. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4607-4620. [PMID: 29990205 DOI: 10.1109/tnnls.2017.2771459] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper develops a new method for solving the optimal control tracking problem for networked control systems (NCSs), where network-induced dropout can occur and the system dynamics are unknown. First, a novel dropout Smith predictor is designed to predict the current state based on historical data measurements over the communication network. Then, it is shown that the quadratic form of the performance index is preserved even with dropout, and the optimal tracker solution with dropout is given based on a novel dropout generalized algebraic Riccati equation. New algorithms for off-line policy iteration (PI), online PI, and Q-learning PI are presented for NCS with dropout. The Q-learning algorithm adaptively learns the optimal control online using data measured over the communication network based on reinforcement learning, including dropout, without requiring any knowledge of the system dynamics. Simulation results are provided to show that the proposed approaches give proper optimal tracking performance for the NCS with unknown dynamics and dropout.
Collapse
|
7
|
Narayanan V, Jagannathan S. Event-Triggered Distributed Approximate Optimal State and Output Control of Affine Nonlinear Interconnected Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2846-2856. [PMID: 28613181 DOI: 10.1109/tnnls.2017.2693205] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper presents an approximate optimal distributed control scheme for a known interconnected system composed of input affine nonlinear subsystems using event-triggered state and output feedback via a novel hybrid learning scheme. First, the cost function for the overall system is redefined as the sum of cost functions of individual subsystems. A distributed optimal control policy for the interconnected system is developed using the optimal value function of each subsystem. To generate the optimal control policy, forward-in-time, neural networks are employed to reconstruct the unknown optimal value function at each subsystem online. In order to retain the advantages of event-triggered feedback for an adaptive optimal controller, a novel hybrid learning scheme is proposed to reduce the convergence time for the learning algorithm. The development is based on the observation that, in the event-triggered feedback, the sampling instants are dynamic and results in variable interevent time. To relax the requirement of entire state measurements, an extended nonlinear observer is designed at each subsystem to recover the system internal states from the measurable feedback. Using a Lyapunov-based analysis, it is demonstrated that the system states and the observer errors remain locally uniformly ultimately bounded and the control policy converges to a neighborhood of the optimal policy. Simulation results are presented to demonstrate the performance of the developed controller.
Collapse
|
8
|
Liu L, Wang Z, Zhang H. Neural-Network-Based Robust Optimal Tracking Control for MIMO Discrete-Time Systems With Unknown Uncertainty Using Adaptive Critic Design. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1239-1251. [PMID: 28362616 DOI: 10.1109/tnnls.2017.2660070] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper is concerned with the robust optimal tracking control strategy for a class of nonlinear multi-input multi-output discrete-time systems with unknown uncertainty via adaptive critic design (ACD) scheme. The main purpose is to establish an adaptive actor-critic control method, so that the cost function in the procedure of dealing with uncertainty is minimum and the closed-loop system is stable. Based on the neural network approximator, an action network is applied to generate the optimal control signal and a critic network is used to approximate the cost function, respectively. In contrast to the previous methods, the main features of this paper are: 1) the ACD scheme is integrated into the controllers to cope with the uncertainty and 2) a novel cost function, which is not in quadric form, is proposed so that the total cost in the design procedure is reduced. It is proved that the optimal control signals and the tracking errors are uniformly ultimately bounded even when the uncertainty exists. Finally, a numerical simulation is developed to show the effectiveness of the present approach.
Collapse
|
9
|
Ligeiro R, Vilela Mendes R. Detecting and quantifying ambiguity: a neural network approach. Soft comput 2018. [DOI: 10.1007/s00500-017-2525-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
10
|
Wei Q, Liu D, Lin Q, Song R. Discrete-Time Optimal Control via Local Policy Iteration Adaptive Dynamic Programming. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3367-3379. [PMID: 27448382 DOI: 10.1109/tcyb.2016.2586082] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, a discrete-time optimal control scheme is developed via a novel local policy iteration adaptive dynamic programming algorithm. In the discrete-time local policy iteration algorithm, the iterative value function and iterative control law can be updated in a subset of the state space, where the computational burden is relaxed compared with the traditional policy iteration algorithm. Convergence properties of the local policy iteration algorithm are presented to show that the iterative value function is monotonically nonincreasing and converges to the optimum under some mild conditions. The admissibility of the iterative control law is proven, which shows that the control system can be stabilized under any of the iterative control laws, even if the iterative control law is updated in a subset of the state space. Finally, two simulation examples are given to illustrate the performance of the developed method.
Collapse
|
11
|
Balasubramaniyan S, Srinivasan S, Kebraei H, B S, Balas VE, Glielmo L. Stochastic optimal controller design for medium access constrained networked control systems with unknown dynamics. INTELLIGENT DECISION TECHNOLOGIES 2017; 11:253-264. [DOI: 10.3233/idt-170293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
| | | | - Hamed Kebraei
- School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
| | - Subathra B
- Kalasalingam University, Krishnan Kovil, Srivilliputtur, Tamil Nadu, India
| | | | - Luigi Glielmo
- Department of Engineering, University of Sannio, Benevento, Italy
| |
Collapse
|
12
|
Balasubramaniyan S, Srinivasan S, Kebraei H, B. S, Balas VE, Glielmo L. Stochastic optimal controller design for medium access constrained networked control systems with unknown dynamics. INTELLIGENT DECISION TECHNOLOGIES 2017; 11:223-233. [DOI: 10.3233/idt-170290] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
| | | | - Hamed Kebraei
- School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
| | - Subathra B.
- Kalasalingam University, Krishnan Kovil, Srivilliputtur, Tamil Nadu, India
| | | | - Luigi Glielmo
- Department of Engineering, University of Sannio, Benevento, Italy
| |
Collapse
|
13
|
Rajagopal K, Balakrishnan SN, Busemeyer JR. Neural Network-Based Solutions for Stochastic Optimal Control Using Path Integrals. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:534-545. [PMID: 28212072 DOI: 10.1109/tnnls.2016.2544787] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, an offline approximate dynamic programming approach using neural networks is proposed for solving a class of finite horizon stochastic optimal control problems. There are two approaches available in the literature, one based on stochastic maximum principle (SMP) formalism and the other based on solving the stochastic Hamilton-Jacobi-Bellman (HJB) equation. However, in the presence of noise, the SMP formalism becomes complex and results in having to solve a couple of backward stochastic differential equations. Hence, current solution methodologies typically ignore the noise effect. On the other hand, the inclusion of noise in the HJB framework is very straightforward. Furthermore, the stochastic HJB equation of a control-affine nonlinear stochastic system with a quadratic control cost function and an arbitrary state cost function can be formulated as a path integral (PI) problem. However, due to curse of dimensionality, it might not be possible to utilize the PI formulation for obtaining comprehensive solutions over the entire operating domain. A neural network structure called the adaptive critic design paradigm is used to effectively handle this difficulty. In this paper, a novel adaptive critic approach using the PI formulation is proposed for solving stochastic optimal control problems. The potential of the algorithm is demonstrated through simulation results from a couple of benchmark problems.
Collapse
|
14
|
Sahoo A, Jagannathan S. Stochastic Optimal Regulation of Nonlinear Networked Control Systems by Using Event-Driven Adaptive Dynamic Programming. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:425-438. [PMID: 26891488 DOI: 10.1109/tcyb.2016.2519445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, an event-driven stochastic adaptive dynamic programming (ADP)-based technique is introduced for nonlinear systems with a communication network within its feedback loop. A near optimal control policy is designed using an actor-critic framework and ADP with event sampled state vector. First, the system dynamics are approximated by using a novel neural network (NN) identifier with event sampled state vector. The optimal control policy is generated via an actor NN by using the NN identifier and value function approximated by a critic NN through ADP. The stochastic NN identifier, actor, and critic NN weights are tuned at the event sampled instants leading to aperiodic weight tuning laws. Above all, an adaptive event sampling condition based on estimated NN weights is designed by using the Lyapunov technique to ensure ultimate boundedness of all the closed-loop signals along with the approximation accuracy. The net result is event-driven stochastic ADP technique that can significantly reduce the computation and network transmissions. Finally, the analytical design is substantiated with simulation results.
Collapse
|
15
|
Optimization of electricity consumption in office buildings based on adaptive dynamic programming. Soft comput 2016. [DOI: 10.1007/s00500-016-2194-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
16
|
Wei Q, Song R, Yan P. Data-Driven Zero-Sum Neuro-Optimal Control for a Class of Continuous-Time Unknown Nonlinear Systems With Disturbance Using ADP. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:444-458. [PMID: 26292346 DOI: 10.1109/tnnls.2015.2464080] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This paper is concerned with a new data-driven zero-sum neuro-optimal control problem for continuous-time unknown nonlinear systems with disturbance. According to the input-output data of the nonlinear system, an effective recurrent neural network is introduced to reconstruct the dynamics of the nonlinear system. Considering the system disturbance as a control input, a two-player zero-sum optimal control problem is established. Adaptive dynamic programming (ADP) is developed to obtain the optimal control under the worst case of the disturbance. Three single-layer neural networks, including one critic and two action networks, are employed to approximate the performance index function, the optimal control law, and the disturbance, respectively, for facilitating the implementation of the ADP method. Convergence properties of the ADP method are developed to show that the system state will converge to a finite neighborhood of the equilibrium. The weight matrices of the critic and the two action networks are also convergent to finite neighborhoods of their optimal ones. Finally, the simulation results will show the effectiveness of the developed data-driven ADP methods.
Collapse
|
17
|
Wei Q, Liu D, Lewis FL. Optimal distributed synchronization control for continuous-time heterogeneous multi-agent differential graphical games. Inf Sci (N Y) 2015. [DOI: 10.1016/j.ins.2015.04.044] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
18
|
Alanis AY, Rios JD, Arana-Daniel N, Lopez-Franco C. Neural identifier for unknown discrete-time nonlinear delayed systems. Neural Comput Appl 2015. [DOI: 10.1007/s00521-015-2016-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
19
|
Xu H, Zhao Q, Jagannathan S. Finite-Horizon Near-Optimal Output Feedback Neural Network Control of Quantized Nonlinear Discrete-Time Systems With Input Constraint. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:1776-1788. [PMID: 25794403 DOI: 10.1109/tnnls.2015.2409301] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The output feedback-based near-optimal regulation of uncertain and quantized nonlinear discrete-time systems in affine form with control constraint over finite horizon is addressed in this paper. First, the effect of input constraint is handled using a nonquadratic cost functional. Next, a neural network (NN)-based Luenberger observer is proposed to reconstruct both the system states and the control coefficient matrix so that a separate identifier is not needed. Then, approximate dynamic programming-based actor-critic framework is utilized to approximate the time-varying solution of the Hamilton-Jacobi-Bellman using NNs with constant weights and time-dependent activation functions. A new error term is defined and incorporated in the NN update law so that the terminal constraint error is also minimized over time. Finally, a novel dynamic quantizer for the control inputs with adaptive step size is designed to eliminate the quantization error overtime, thus overcoming the drawback of the traditional uniform quantizer. The proposed scheme functions in a forward-in-time manner without offline training phase. Lyapunov analysis is used to investigate the stability. Simulation results are given to show the effectiveness and feasibility of the proposed method.
Collapse
|
20
|
Song R, Wei Q, Xiao W. ADP-based optimal sensor scheduling for target tracking in energy harvesting wireless sensor networks. Neural Comput Appl 2015. [DOI: 10.1007/s00521-015-1954-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
21
|
Wei Q, Liu D, Yang X. Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:866-879. [PMID: 25751877 DOI: 10.1109/tnnls.2015.2401334] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In this paper, a novel iterative adaptive dynamic programming (ADP)-based infinite horizon self-learning optimal control algorithm, called generalized policy iteration algorithm, is developed for nonaffine discrete-time (DT) nonlinear systems. Generalized policy iteration algorithm is a general idea of interacting policy and value iteration algorithms of ADP. The developed generalized policy iteration algorithm permits an arbitrary positive semidefinite function to initialize the algorithm, where two iteration indices are used for policy improvement and policy evaluation, respectively. It is the first time that the convergence, admissibility, and optimality properties of the generalized policy iteration algorithm for DT nonlinear systems are analyzed. Neural networks are used to implement the developed algorithm. Finally, numerical examples are presented to illustrate the performance of the developed algorithm.
Collapse
|
22
|
Xu H, Jagannathan S. Neural network-based finite horizon stochastic optimal control design for nonlinear networked control systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:472-485. [PMID: 25720004 DOI: 10.1109/tnnls.2014.2315622] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The stochastic optimal control of nonlinear networked control systems (NNCSs) using neuro-dynamic programming (NDP) over a finite time horizon is a challenging problem due to terminal constraints, system uncertainties, and unknown network imperfections, such as network-induced delays and packet losses. Since the traditional iteration or time-based infinite horizon NDP schemes are unsuitable for NNCS with terminal constraints, a novel time-based NDP scheme is developed to solve finite horizon optimal control of NNCS by mitigating the above-mentioned challenges. First, an online neural network (NN) identifier is introduced to approximate the control coefficient matrix that is subsequently utilized in conjunction with the critic and actor NNs to determine a time-based stochastic optimal control input over finite horizon in a forward-in-time and online manner. Eventually, Lyapunov theory is used to show that all closed-loop signals and NN weights are uniformly ultimately bounded with ultimate bounds being a function of initial conditions and final time. Moreover, the approximated control input converges close to optimal value within finite time. The simulation results are included to show the effectiveness of the proposed scheme.
Collapse
|
23
|
Zhao Q, Xu H, Jagannathan S. Neural network-based finite-horizon optimal control of uncertain affine nonlinear discrete-time systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:486-499. [PMID: 25720005 DOI: 10.1109/tnnls.2014.2315646] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In this paper, the finite-horizon optimal control design for nonlinear discrete-time systems in affine form is presented. In contrast with the traditional approximate dynamic programming methodology, which requires at least partial knowledge of the system dynamics, in this paper, the complete system dynamics are relaxed utilizing a neural network (NN)-based identifier to learn the control coefficient matrix. The identifier is then used together with the actor-critic-based scheme to learn the time-varying solution, referred to as the value function, of the Hamilton-Jacobi-Bellman (HJB) equation in an online and forward-in-time manner. Since the solution of HJB is time-varying, NNs with constant weights and time-varying activation functions are considered. To properly satisfy the terminal constraint, an additional error term is incorporated in the novel update law such that the terminal constraint error is also minimized over time. Policy and/or value iterations are not needed and the NN weights are updated once a sampling instant. The uniform ultimate boundedness of the closed-loop system is verified by standard Lyapunov stability theory under nonautonomous analysis. Numerical examples are provided to illustrate the effectiveness of the proposed method.
Collapse
|
24
|
Wei Q, Wang FY, Liu D, Yang X. Finite-approximation-error-based discrete-time iterative adaptive dynamic programming. IEEE TRANSACTIONS ON CYBERNETICS 2014; 44:2820-2833. [PMID: 25265640 DOI: 10.1109/tcyb.2014.2354377] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for infinite horizon discrete-time nonlinear systems with finite approximation errors. First, a new generalized value iteration algorithm of ADP is developed to make the iterative performance index function converge to the solution of the Hamilton-Jacobi-Bellman equation. The generalized value iteration algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. When the iterative control law and iterative performance index function in each iteration cannot accurately be obtained, for the first time a new "design method of the convergence criteria" for the finite-approximation-error-based generalized value iteration algorithm is established. A suitable approximation error can be designed adaptively to make the iterative performance index function converge to a finite neighborhood of the optimal performance index function. Neural networks are used to implement the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the developed method.
Collapse
|
25
|
Zhang H, Qin C, Jiang B, Luo Y. Online adaptive policy learning algorithm for H∞ state feedback control of unknown affine nonlinear discrete-time systems. IEEE TRANSACTIONS ON CYBERNETICS 2014; 44:2706-2718. [PMID: 25095274 DOI: 10.1109/tcyb.2014.2313915] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The problem of H∞ state feedback control of affine nonlinear discrete-time systems with unknown dynamics is investigated in this paper. An online adaptive policy learning algorithm (APLA) based on adaptive dynamic programming (ADP) is proposed for learning in real-time the solution to the Hamilton-Jacobi-Isaacs (HJI) equation, which appears in the H∞ control problem. In the proposed algorithm, three neural networks (NNs) are utilized to find suitable approximations of the optimal value function and the saddle point feedback control and disturbance policies. Novel weight updating laws are given to tune the critic, actor, and disturbance NNs simultaneously by using data generated in real-time along the system trajectories. Considering NN approximation errors, we provide the stability analysis of the proposed algorithm with Lyapunov approach. Moreover, the need of the system input dynamics for the proposed algorithm is relaxed by using a NN identification scheme. Finally, simulation examples show the effectiveness of the proposed algorithm.
Collapse
|
26
|
Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach. Soft comput 2014. [DOI: 10.1007/s00500-014-1533-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
27
|
Comprehensive control of networked control systems with multistep delay. ScientificWorldJournal 2014; 2014:814245. [PMID: 25101322 PMCID: PMC4102029 DOI: 10.1155/2014/814245] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2014] [Revised: 05/16/2014] [Accepted: 06/05/2014] [Indexed: 11/17/2022] Open
Abstract
In networked control systems with multi-step delay, long time-delay causes vacant sampling and controller design difficulty. In order to solve the above problems, comprehensive control methods are proposed in this paper. Time-delay compensation control and linear-quadratic-Guassian (LQG) optimal control are adopted and the systems switch different controllers between two different states. LQG optimal controller is used with probability 1--α in normal state, which is shown to render the systems mean square exponentially stable. Time-delay compensation controller is used with probability α in abnormal state to compensate vacant sampling and long time-delay. In addition, a buffer window is established at the actuator of the systems to store some history control inputs which are used to estimate the control state of present sampling period under the vacant sampling cases. The comprehensive control methods simplify control design which is easier to be implemented in engineering. The performance of the systems is also improved. Simulation results verify the validity of the proposed theory.
Collapse
|
28
|
Liu D, Wei Q. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2014; 25:621-634. [PMID: 24807455 DOI: 10.1109/tnnls.2013.2281663] [Citation(s) in RCA: 181] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
This paper is concerned with a new discrete-time policy iteration adaptive dynamic programming (ADP) method for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use an iterative ADP technique to obtain the iterative control law, which optimizes the iterative performance index function. The main contribution of this paper is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems for the first time. It shows that the iterative performance index function is nonincreasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear systems. Neural networks are used to approximate the performance index function and compute the optimal control law, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, the numerical results and analysis are presented to illustrate the performance of the developed method.
Collapse
|