1
|
Wen G, Niu B. Optimized distributed formation control using identifier-critic-actor reinforcement learning for a class of stochastic nonlinear multi-agent systems. ISA TRANSACTIONS 2024; 155:1-10. [PMID: 39472256 DOI: 10.1016/j.isatra.2024.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 09/08/2024] [Accepted: 10/04/2024] [Indexed: 12/13/2024]
Abstract
This article is to propose an adaptive reinforcement learning (RL)-based optimized distributed formation control for the unknown stochastic nonlinear single-integrator dynamic multi-agent system (MAS). For solving the issue of unknown dynamic, an adaptive identifier neural network (NN) is developed to determine the stochastic MAS under expectation sense. And then, for deriving the optimized formation control, the RL is putted into effect via constructing a pair of critic and actor NNs. With regard of the traditional RL optimal controls, their algorithm exists the inherent complexity, because their adaptive RL algorithm are derived from negative gradient of the square of Hamilton-Jacobi-Bellman (HJB) equation. As a result, these methods are difficultly extended to stochastic dynamical systems. However, since this adaptive RL laws are derived from a simple positive function rather than the square of HJB equation, it can make optimal control with simple algorithm. Therefore, this optimized formation scheme can be smoothly performed to the stochastic MAS. Finally, according to theorem proof and computer simulation, the optimized method can realize the required control objective.
Collapse
Affiliation(s)
- Guoxing Wen
- Shandong University of Aeronautics, Binzhou, 256600, Shandong, China.
| | - Ben Niu
- Dalian University of Technology, Dalian, Liaoning, 116024, China.
| |
Collapse
|
2
|
Du X, Zhan X, Wu J, Yan H. Effects of Two-Channel Noise and Packet Loss on Performance of Information Time Delay Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8549-8556. [PMID: 37015669 DOI: 10.1109/tnnls.2022.3230648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The performance limitations of multiple-input multiple-output (MIMO) information time delay system (ITDS) with packet loss, codec and white Gaussian noise (WGN) are investigated in this article. By using the spectrum decomposition technique, inner-outer factorization, and partial factorization techniques, the expression of performance limitations is obtained under the two-degree-of-freedom (2DOF) compensator. The theoretical analysis results demonstrate that the system performance is related to the time delay, non-minimum phase (NMP) zeros, unstable zeros and their directions in a given device. In addition, WGN, packet loss and codec also impact the performance. Finally, the theoretical results are verified by simulation examples. Simulation results show that packet loss rate and encoding and decoding have a greater impact on system performance.
Collapse
|
3
|
Ming Z, Zhang H, Luo Y, Wang W. Dynamic Event-Based Control for Stochastic Optimal Regulation of Nonlinear Networked Control Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7299-7308. [PMID: 35038299 DOI: 10.1109/tnnls.2022.3140478] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this article, a dynamic event-triggered stochastic adaptive dynamic programming (ADP)-based problem is investigated for nonlinear systems with a communication network. First, a novel condition of obtaining stochastic input-to-state stability (SISS) of discrete version is skillfully established. Then, the event-triggered control strategy is devised, and a near-optimal control policy is designed using an identifier-actor-critic neural networks (NNs) with an event-sampled state vector. Above all, an adaptive static event sampling condition is designed by using the Lyapunov technique to ensure ultimate boundedness (UB) for the closed-loop system. However, since the static event-triggered rule only depends on the current state, regardless of previous values, this article presents an explicit dynamic event-triggered rule. Furthermore, we prove that the lower bound of sampling interval for the proposed dynamic event-triggered control strategy is greater than one, which avoids the so-called triviality phenomenon. Finally, the effectiveness of the proposed near-optimal control pattern is verified by a simulation example.
Collapse
|
4
|
Gao X, Deng F, Zeng P, Zhang H. Adaptive Neural Event-Triggered Control of Networked Markov Jump Systems Under Hybrid Cyberattacks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1502-1512. [PMID: 34428162 DOI: 10.1109/tnnls.2021.3105532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article is concerned with the neural network (NN)-based event-triggered control problem for discrete-time networked Markov jump systems with hybrid cyberattacks and unmeasured states. The event-triggered mechanism (ETM) is used to reduce the communication load, and a Luenberger observer is introduced to estimate the unmeasured states. Two kinds of cyberattacks, denial-of-service (DoS) attacks and deception attacks, are investigated due to the vulnerability of cyberlayer. For the sake of mitigating the impact of these two types of cyberattacks on system performance, the ETM under DoS jamming attacks is discussed first, and a new estimation of such mechanism is given. Then, the NN technique is applied to approximate the injected false information. Some sufficient conditions are derived to guarantee the boundedness of the closed-loop system, and the observer and controller gains are presented by solving a set of matrix inequalities. The effectiveness of the presented control method is demonstrated by a numerical example.
Collapse
|
5
|
Wen G, Xu L, Li B. Optimized Backstepping Tracking Control Using Reinforcement Learning for a Class of Stochastic Nonlinear Strict-Feedback Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1291-1303. [PMID: 34437076 DOI: 10.1109/tnnls.2021.3105176] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, an optimized backstepping (OB) control scheme is proposed for a class of stochastic nonlinear strict-feedback systems with unknown dynamics by using reinforcement learning (RL) strategy of identifier-critic-actor architecture, where the identifier aims to compensate the unknown dynamic, the critic aims to evaluate the control performance and to give the feedback to the actor, and the actor aims to perform the control action. The basic control idea is that all virtual controls and the actual control of backstepping are designed as the optimized solution of corresponding subsystems so that the entire backstepping control is optimized. Different from the deterministic system, stochastic system control needs to consider not only the stochastic disturbance depicted by the Wiener process but also the Hessian term in stability analysis. If the backstepping control is developed on the basis of the published RL optimization methods, it will be difficult to be achieved because, on the one hand, RL of these methods are very complex in the algorithm thanks to their critic and actor updating laws deriving from the negative gradient of the square of approximation of Hamilton-Jacobi-Bellman (HJB) equation; on the other hand, these methods require persistence excitation and known dynamic, where persistence excitation is for training adaptive parameters sufficiently. In this research, both critic and actor updating laws are derived from the negative gradient of a simple positive function, which is yielded on the basis of a partial derivative of the HJB equation. As a result, the RL algorithm can be significantly simplified, meanwhile, two requirements of persistence excitation and known dynamic can be released. Therefore, it can be a natural selection for stochastic optimization control. Finally, from two aspects of theory and simulation, it is demonstrated that the proposed control can arrive at the desired system performance.
Collapse
|
6
|
Liu XF, Zhan ZH, Zhang J. Resource-Aware Distributed Differential Evolution for Training Expensive Neural-Network-Based Controller in Power Electronic Circuit. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6286-6296. [PMID: 33961568 DOI: 10.1109/tnnls.2021.3075205] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The neural-network (NN)-based control method is a new emerging promising technique for controller design in a power electronic circuit (PEC). However, the optimization of NN-based controllers (NNCs) has significant challenges in two aspects. The first challenge is that the search space of the NNC optimization problem is such complex that the global optimization ability of the existing algorithms still needs to be improved. The second challenge is that the training process of the NNC parameters is very computationally expensive and requires a long execution time. Thus, in this article, we develop a powerful evolutionary computation-based algorithm to find a high-quality solution and reduce computational time. First, the differential evolution (DE) algorithm is adopted because it is a powerful global optimizer in solving a complex optimization problem. This can help to overcome the premature convergence in local optima to train the NNC parameters well. Second, to reduce the computational time, the DE is extended to distribute DE (DDE) by dispatching all the individuals to different distributed computing resources for parallel computing. Moreover, a resource-aware strategy (RAS) is designed to further efficiently utilize the resources by adaptively dispatching individuals to resources according to the real-time performance of the resources, which can simultaneously concern the computing ability and load state of each resource. Experimental results show that, compared with some other typical evolutionary algorithms, the proposed algorithm can get significantly better solutions within a shorter computational time.
Collapse
|
7
|
Zaniolo M, Giuliani M, Castelletti A. Neuro-Evolutionary Direct Policy Search for Multiobjective Optimal Control. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5926-5938. [PMID: 33882008 DOI: 10.1109/tnnls.2021.3071960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Direct policy search (DPS) is emerging as one of the most effective and widely applied reinforcement learning (RL) methods to design optimal control policies for multiobjective Markov decision processes (MOMDPs). Traditionally, DPS defines the control policy within a preselected functional class and searches its optimal parameterization with respect to a given set of objectives. The functional class should be tailored to the problem at hand and its selection is crucial, as it determines the search space within which solutions can be found. In MOMDPs problems, a different objective tradeoff determines a different fitness landscape, requiring a tradeoff-dynamic functional class selection. Yet, in state-of-the-art applications, the policy class is generally selected a priori and kept constant across the multidimensional objective space. In this work, we present a novel policy search routine called neuro-evolutionary multiobjective DPS (NEMODPS), which extends the DPS problem formulation to conjunctively search the policy functional class and its parameterization in a hyperspace containing policy architectures and coefficients. NEMODPS begins with a population of minimally structured approximating networks and progressively builds more sophisticated architectures by topological and parametrical mutation and crossover, and selection of the fittest individuals concerning multiple objectives. We tested NEMODPS for the problem of designing the control policy of a multipurpose water system. Numerical results show that the tradeoff-dynamic structural and parametrical policy search of NEMODPS is consistent across multiple runs, and outperforms the solutions designed via traditional DPS with predefined policy topologies.
Collapse
|
8
|
Data-Based Security Fault Tolerant Iterative Learning Control under Denial-of-Service Attacks. ACTUATORS 2022. [DOI: 10.3390/act11070178] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
This paper mainly studies the data-based security fault tolerant iterative learning control (SFTILC) problem of nonlinear networked control systems (NCSs) under sensor failures and denial-of-service (DoS) attacks. Firstly, the radial basis function neural network (RBFNN) is used to approximate the sensor failure function and a DoS attack compensation mechanism is proposed in the iterative domain to lessen the impact of DoS attacks. Then, using the dynamic linearization technology, the nonlinear system considering failures and network attacks is transformed into a linear data model. Further, based on the designed linearization model, a new data-based SFTILC algorithm is designed to ensure the satisfactory tracking performance of the system. This process only uses the input and output data of the system, and the stability of the system is proved by using the compression mapping principle. Finally, a digital simulation is used to demonstrate the effectiveness of the proposed SFTILC algorithm.
Collapse
|
9
|
Convex Neural Networks Based Reinforcement Learning for Load Frequency Control under Denial of Service Attacks. ALGORITHMS 2022. [DOI: 10.3390/a15020034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
With the increase in the complexity and informatization of power grids, new challenges, such as access to a large number of distributed energy sources and cyber attacks on power grid control systems, are brought to load-frequency control. As load-frequency control methods, both aggregated distributed energy sources (ADES) and artificial intelligence techniques provide flexible solution strategies to mitigate the frequency deviation of power grids. This paper proposes a load-frequency control strategy of ADES-based reinforcement learning under the consideration of reducing the impact of denial of service (DoS) attacks. Reinforcement learning is used to evaluate the pros and cons of the proposed frequency control strategy. The entire evaluation process is realized by the approximation of convex neural networks. Convex neural networks are used to convert the nonlinear optimization problems of reinforcement learning for long-term performance into the corresponding convex optimization problems. Thus, the local optimum is avoided, the optimization process of the strategy utility function is accelerated, and the response ability of controllers is improved. The stability of power grids and the convergence of convex neural networks under the proposed frequency control strategy are studied by constructing Lyapunov functions to obtain the sufficient conditions for the steady states of ADES and the weight convergence of actor–critic networks. The article uses the IEEE14, IEEE57, and IEEE118 bus testing systems to verify the proposed strategy. Our experimental results confirm that the proposed frequency control strategy can effectively reduce the frequency deviation of power grids under DoS attacks.
Collapse
|
10
|
Ma H, Zhang Q. Threshold dynamics and optimal control on an age-structured SIRS epidemic model with vaccination. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:9474-9495. [PMID: 34814354 DOI: 10.3934/mbe.2021465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We consider a vaccination control into a age-structured susceptible-infective-recovered-susceptible (SIRS) model and study the global stability of the endemic equilibrium by the iterative method. The basic reproduction number $ R_0 $ is obtained. It is shown that if $ R_0 < 1 $, then the disease-free equilibrium is globally asymptotically stable, if $ R_0 > 1 $, then the disease-free and endemic equilibrium coexist simultaneously, and the global asymptotic stability of endemic equilibrium is also shown. Additionally, the Hamilton-Jacobi-Bellman (HJB) equation is given by employing the Bellman's principle of optimality. Through proving the existence of viscosity solution for HJB equation, we obtain the optimal vaccination control strategy. Finally, numerical simulations are performed to illustrate the corresponding analytical results.
Collapse
Affiliation(s)
- Han Ma
- School of Mathematics and Statistics, Ningxia University, Yinchuan, 750021, China
| | - Qimin Zhang
- School of Mathematics and Statistics, Ningxia University, Yinchuan, 750021, China
| |
Collapse
|
11
|
Song R, Wei Q, Zhang H, Lewis FL. Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2929-2943. [PMID: 31902792 DOI: 10.1109/tcyb.2019.2957406] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, off-policy reinforcement learning (RL) algorithm is established to solve the discrete-time N -player nonzero-sum (NZS) games with completely unknown dynamics. The N -coupled generalized algebraic Riccati equations (GARE) are derived, and then policy iteration (PI) algorithm is used to obtain the N -tuple of iterative control and iterative value function. As the system dynamics is necessary in PI algorithm, off-policy RL method is developed for discrete-time N -player NZS games. The off-policy N -coupled Hamilton-Jacobi (HJ) equation is derived based on quadratic value functions. According to the Kronecker product, the N -coupled HJ equation is decomposed into unknown parameter part and the system operation data part, which makes the N -coupled HJ equation solved independent of system dynamics. The least square is used to calculate the iterative value function and N -tuple of iterative control. The existence of Nash equilibrium is proved. The result of the proposed method for discrete-time unknown dynamics NZS games is indicated by the simulation examples.
Collapse
|
12
|
Wei Q, Song R, Liao Z, Li B, Lewis FL. Discrete-Time Impulsive Adaptive Dynamic Programming. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4293-4306. [PMID: 30990209 DOI: 10.1109/tcyb.2019.2906694] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal impulsive control problems for infinite horizon discrete-time nonlinear systems. Considering the constraint of the impulsive interval, in each iteration, the iterative impulsive value function under each possible impulsive interval is obtained, and then the iterative value function and iterative control law are achieved. A new convergence analysis method is developed which proves an iterative value function to converge to the optimum as the iteration index increases to infinity. The properties of the iterative control law are analyzed, and the detailed implementation of the optimal impulsive control law is presented. Finally, two simulation examples with comparisons are given to show the effectiveness of the developed method.
Collapse
|
13
|
Liu Y, Li T, Shan Q, Yu R, Wu Y, Chen C. Online optimal consensus control of unknown linear multi-agent systems via time-based adaptive dynamic programming. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.119] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
14
|
Razmi M, Macnab C. Near-optimal neural-network robot control with adaptive gravity compensation. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.026] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
15
|
Hu R, Chang S, Wang H, He J, Huang Q. Efficient Multispike Learning for Spiking Neural Networks Using Probability-Modulated Timing Method. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:1984-1997. [PMID: 30418889 DOI: 10.1109/tnnls.2018.2875471] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Error functions are normally based on the distance between output spikes and target spikes in supervised learning algorithms for spiking neural networks (SNNs). Due to the discontinuous nature of the internal state of spiking neuron, it is challenging to ensure that the number of output spikes and target spikes kept identical in multispike learning. This problem is conventionally dealt with by using the smaller of the number of desired spikes and that of actual output spikes in learning. However, if this approach is used, information is lost as some spikes are neglected. In this paper, a probability-modulated timing mechanism is built on the stochastic neurons, where the discontinuous spike patterns are converted to the likelihood of generating the desired output spike trains. By applying this mechanism to a probability-modulated spiking classifier, a probability-modulated SNN (PMSNN) is constructed. In its multilayer and multispike learning structure, more inputs are incorporated and mapped to the target spike trains. A clustering rule connection mechanism is also applied to a reservoir to improve the efficiency of information transmission among synapses, which can map the highly correlated inputs to the adjacent neurons. Results of comparisons between the proposed method and popular the SNN algorithms showed that the PMSNN yields higher efficiency and requires fewer parameters.
Collapse
|
16
|
Ding D, Wang Z, Han QL, Wei G. Neural-Network-Based Output-Feedback Control Under Round-Robin Scheduling Protocols. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:2372-2384. [PMID: 29994553 DOI: 10.1109/tcyb.2018.2827037] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The neural-network (NN)-based output-feedback control is considered for a class of stochastic nonlinear systems under round-Robin (RR) scheduling protocols. For the purpose of effectively mitigating data congestions and saving energies, the RR protocols are implemented and the resulting nonlinear systems become the so-called protocol-induced periodic ones. Taking such a periodic characteristic into account, an NN-based observer is first proposed to reconstruct the system states where a novel adaptive tuning law on NN weights is adopted to cater to the requirement of performance analysis. In addition, with the established boundedness of the periodic systems in the mean-square sense, the desired observer gain is obtained by solving a set of matrix inequalities. Then, an actor-critic NN scheme with a time-varying step length in adaptive law is developed to handle the considered control problem with terminal constraints over finite-horizon. Some sufficient conditions are derived to guarantee the boundedness of estimation errors of critic and actor NN weights. In view of these conditions, some key parameters in adaptive tuning laws are easily determined via elementary algebraic operations. Furthermore, the stability in the mean-square sense is investigated for the discussed issue in infinite horizon. Finally, a simulation example is utilized to illustrate the applicability of the proposed control scheme.
Collapse
|
17
|
Narayanan V, Sahoo A, Jagannathan S, George K. Approximate Optimal Distributed Control of Nonlinear Interconnected Systems Using Event-Triggered Nonzero-Sum Games. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:1512-1522. [PMID: 30296241 DOI: 10.1109/tnnls.2018.2869896] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, approximate optimal distributed control schemes for a class of nonlinear interconnected systems with strong interconnections are presented using continuous and event-sampled feedback information. The optimal control design is formulated as an N -player nonzero-sum game where the control policies of the subsystems act as players. An approximate Nash equilibrium solution to the game, which is the solution to the coupled Hamilton-Jacobi equation, is obtained using the approximate dynamic programming-based approach. A critic neural network (NN) at each subsystem is utilized to approximate the Nash solution and novel event-sampling conditions, that are decentralized, are designed to asynchronously orchestrate the sampling and transmission of state vector at each subsystem. To ensure the local ultimate boundedness of the closed-loop system state and NN parameter estimation errors, a hybrid-learning scheme is introduced and the stability is guaranteed using Lyapunov-based stability analysis. Finally, implementation of the proposed event-based distributed control scheme for linear interconnected systems is discussed. For completeness, Zeno-free behavior of the event-sampled system is shown analytically and a numerical example is included to support the analytical results.
Collapse
|
18
|
Zhang H, Qu Q, Xiao G, Cui Y. Optimal Guaranteed Cost Sliding Mode Control for Constrained-Input Nonlinear Systems With Matched and Unmatched Disturbances. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2112-2126. [PMID: 29771665 DOI: 10.1109/tnnls.2018.2791419] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Based on integral sliding mode and approximate dynamic programming (ADP) theory, a novel optimal guaranteed cost sliding mode control is designed for constrained-input nonlinear systems with matched and unmatched disturbances. When the system moves on the sliding surface, the optimal guaranteed cost control problem of sliding mode dynamics is transformed into the optimal control problem of a reformulated auxiliary system with a modified cost function. The ADP algorithm based on single critic neural network (NN) is applied to obtain the approximate optimal control law for the auxiliary system. Lyapunov techniques are used to demonstrate the convergence of the NN weight errors. In addition, the derived approximate optimal control is verified to guarantee the sliding mode dynamics system to be stable in the sense of uniform ultimate boundedness. Some simulation results are presented to verify the feasibility of the proposed control scheme.
Collapse
|
19
|
Yang F, Wang C. Pattern-Based NN Control of a Class of Uncertain Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1108-1119. [PMID: 28186912 DOI: 10.1109/tnnls.2017.2655503] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This paper presents a pattern-based neural network (NN) control approach for a class of uncertain nonlinear systems. The approach consists of two phases of identification and another two phases of recognition and control. First, in the phase (i) of identification, adaptive NN controllers are designed to achieve closed-loop stability and tracking performance of nonlinear systems for different control situations, and the corresponding closed-loop control system dynamics are identified via deterministic learning. The identified control system dynamics are stored in constant radial basis function (RBF) NNs, and a set of constant NN controllers are constructed by using the obtained constant RBF networks. Second, in the phase (ii) of identification, when the plant is operated under different or abnormal conditions, the system dynamics under normal control are identified via deterministic learning. A bank of dynamical estimators is constructed for all the abnormal conditions and the learned knowledge is embedded in the estimators. Third, in the phase of recognition, when one identified control situation recurs, by using the constructed estimators, the recurred control situation will be rapidly recognized. Finally, in the phase of pattern-based control, based on the rapid recognition, the constant NN controller corresponding to the current control situation is selected, and both closed-loop stability and improved control performance can be achieved. The results presented show that the pattern-based control realizes a humanlike control process, and will provide a new framework for fast decision and control in dynamic environments. A simulation example is included to demonstrate the effectiveness of the approach.
Collapse
|
20
|
Qu Q, Zhang H, Yu R, Liu Y. Neural network-based H∞ sliding mode control for nonlinear systems with actuator faults and unmatched disturbances. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.10.041] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
21
|
Data-driven adaptive dynamic programming schemes for non-zero-sum games of unknown discrete-time nonlinear systems. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.09.020] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
22
|
Wei Q, Liu D, Lin Q. Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Admissibility and Termination Analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:2490-2502. [PMID: 27529879 DOI: 10.1109/tnnls.2016.2593743] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, a novel local value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. The focuses of this paper are to study admissibility properties and the termination criteria of discrete-time local value iteration ADP algorithms. In the discrete-time local value iteration ADP algorithm, the iterative value functions and the iterative control laws are both updated in a given subset of the state space in each iteration, instead of the whole state space. For the first time, admissibility properties of iterative control laws are analyzed for the local value iteration ADP algorithm. New termination criteria are established, which terminate the iterative local ADP algorithm with an admissible approximate optimal control law. Finally, simulation results are given to illustrate the performance of the developed algorithm.In this paper, a novel local value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. The focuses of this paper are to study admissibility properties and the termination criteria of discrete-time local value iteration ADP algorithms. In the discrete-time local value iteration ADP algorithm, the iterative value functions and the iterative control laws are both updated in a given subset of the state space in each iteration, instead of the whole state space. For the first time, admissibility properties of iterative control laws are analyzed for the local value iteration ADP algorithm. New termination criteria are established, which terminate the iterative local ADP algorithm with an admissible approximate optimal control law. Finally, simulation results are given to illustrate the performance of the developed algorithm.
Collapse
Affiliation(s)
- Qinglai Wei
- The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Derong Liu
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China
| | - Qiao Lin
- The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
23
|
|
24
|
Qu Q, Zhang H, Feng T, Jiang H. Decentralized adaptive tracking control scheme for nonlinear large-scale interconnected systems via adaptive dynamic programming. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.10.058] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
25
|
Wei Q, Liu D, Lin H. Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:840-853. [PMID: 26552103 DOI: 10.1109/tcyb.2015.2492242] [Citation(s) in RCA: 91] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, a value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon undiscounted optimal control problems for discrete-time nonlinear systems. The present value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize the algorithm. A novel convergence analysis is developed to guarantee that the iterative value function converges to the optimal performance index function. Initialized by different initial functions, it is proven that the iterative value function will be monotonically nonincreasing, monotonically nondecreasing, or nonmonotonic and will converge to the optimum. In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms. It is emphasized that new termination criteria are established to guarantee the effectiveness of the iterative control laws. Neural networks are used to approximate the iterative value function and compute the iterative control law, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.
Collapse
|