1
|
Zhang L, Lin R, Xie L, Dai W, Su H. Event-Triggered Constrained Optimal Control for Organic Rankine Cycle Systems via Safe Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7126-7137. [PMID: 37015440 DOI: 10.1109/tnnls.2022.3213825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/18/2024]
Abstract
The organic Rankine cycle (ORC) is an effective application for converting low-grade heat sources into power and is crucial for environmentally friendly production and energy recovery. However, the inherent complexity of the mechanism, its strong and unidentified nonlinearity, and the presence of control constraints severely impair the design of its optimal controller. To solve these issues, this study provides a novel event-triggered (ET) constrained optimal control approach for the ORC systems based on a safe reinforcement learning technique to find the optimal control law. Instead of employing the usual non-quadratic integral form to solve the control-limited optimal control problems, a constraint handling strategy based on a relaxed weighted barrier function (BF) technique is proposed. By adding the BF terms to the original value function, a modified value iteration algorithm is developed to make the control input solutions that tend to violate the constraints be pushed back and maintained in their safe sets. In addition, the ET mechanism proposed in this article is critically required for the ORC systems, and it can significantly reduce the computational load. The combination of these two techniques allows the ORC systems to achieve set-point tracking control and satisfy the control restrictions. The proposed approach is conducted based on a heuristic dynamic programming framework with three neural networks (NNs) involved. The safety and convergence of the proposed approach and the stability of the closed-loop system are analyzed. Simulation results and comparisons are presented to demonstrate its effectiveness.
Collapse
|
2
|
Zhou Y. Efficient Online Globalized Dual Heuristic Programming With an Associated Dual Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10079-10090. [PMID: 35436197 DOI: 10.1109/tnnls.2022.3164727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Globalized dual heuristic programming (GDHP) is the most comprehensive adaptive critic design, which employs its critic to minimize the error with respect to both the cost-to-go and its derivatives simultaneously. Its implementation, however, confronts a dilemma of either introducing more computational load by explicitly calculating the second partial derivative term or sacrificing the accuracy by loosening the association between the cost-to-go and its derivatives. This article aims at increasing the online learning efficiency of GDHP while retaining its analytical accuracy by introducing a novel GDHP design based on a critic network and an associated dual network. This associated dual network is derived from the critic network explicitly and precisely, and its structure is in the same level of complexity as dual heuristic programming critics. Three simulation experiments are conducted to validate the learning ability, efficiency, and feasibility of the proposed GDHP critic design.
Collapse
|
3
|
Yuan X, Wang Y, Liu J, Sun C. Action Mapping: A Reinforcement Learning Method for Constrained-Input Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7145-7157. [PMID: 35025751 DOI: 10.1109/tnnls.2021.3138924] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Existing approaches to constrained-input optimal control problems mainly focus on systems with input saturation, whereas other constraints, such as combined inequality constraints and state-dependent constraints, are seldom discussed. In this article, a reinforcement learning (RL)-based algorithm is developed for constrained-input optimal control of discrete-time (DT) systems. The deterministic policy gradient (DPG) is introduced to iteratively search the optimal solution to the Hamilton-Jacobi-Bellman (HJB) equation. To deal with input constraints, an action mapping (AM) mechanism is proposed. The objective of this mechanism is to transform the exploration space from the subspace generated by the given inequality constraints to the standard Cartesian product space, which can be searched effectively by existing algorithms. By using the proposed architecture, the learned policy can output control signals satisfying the given constraints, and the original reward function can be kept unchanged. In our study, the convergence analysis is given. It is shown that the iterative algorithm is convergent to the optimal solution of the HJB equation. In addition, the continuity of the iterative estimated Q -function is investigated. Two numerical examples are provided to demonstrate the effectiveness of our approach.
Collapse
|
4
|
Hu X, Zhang H, Ma D, Wang R, Wang T, Xie X. Real-Time Leak Location of Long-Distance Pipeline Using Adaptive Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7004-7013. [PMID: 34971544 DOI: 10.1109/tnnls.2021.3136939] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In traditional leak location methods, the position of the leak point is located through the time difference of pressure change points of both ends of the pipeline. The inaccurate estimation of pressure change points leads to the wrong leak location result. To address it, adaptive dynamic programming is proposed to solve the pipeline leak location problem in this article. First, a pipeline model is proposed to describe the pressure change along pipeline, which is utilized to reflect the iterative situation of the logarithmic form of pressure change. Then, under the Bellman optimality principle, a value iteration (VI) scheme is proposed to provide the optimal sequence of the nominal parameter and obtain the pipeline leak point. Furthermore, neural networks are built as the VI scheme structure to ensure the iterative performance of the proposed method. By transforming into the dynamic optimization problem, the proposed method adopts the estimation of the logarithmic form of pressure changes of both ends of the pipeline to locate the leak point, which avoids the wrong results caused by unclear pressure change points. Thus, it could be applied for real-time leak location of long-distance pipeline. Finally, the experiment cases are given to illustrate the effectiveness of the proposed method.
Collapse
|
5
|
Xu B, Wang X, Sun F, Shi Z. Intelligent Control of Flexible Hypersonic Flight Dynamics With Input Dead Zone Using Singular Perturbation Decomposition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:5926-5936. [PMID: 34932488 DOI: 10.1109/tnnls.2021.3131578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This article studies the robust intelligent control for the longitudinal dynamics of flexible hypersonic flight vehicle with input dead zone. Considering the different time-scale characteristics among the system states, the singular perturbation decomposition is employed to transform the rigid-elastic coupling model into the slow dynamics and the fast dynamics. For the slow dynamics with unknown system nonlinearities, the robust neural control is constructed using the switching mechanism to achieve the coordination between robust design and neural learning. For the time-varying control gain caused by unknown dead-zone input, the stable control is presented with an adaptive estimation design. For the fast dynamics, the sliding mode control is constructed to make the elastic modes stable and convergent. The elevator deflection is obtained by combining the two control signals. The stability of the dynamics is analyzed through the Lyapunov approach and the system tracking errors are bounded. The simulation is conducted to demonstrate the effectiveness of the proposed approach.
Collapse
|
6
|
Zhang Y, Niu B, Zhao X, Duan P, Wang H, Gao B. Global Predefined-Time Adaptive Neural Network Control for Disturbed Pure-Feedback Nonlinear Systems With Zero Tracking Error. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6328-6338. [PMID: 34951856 DOI: 10.1109/tnnls.2021.3135582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This article presents a global adaptive neural-network-based control algorithm for disturbed pure-feedback nonlinear systems to achieve zero tracking error in a predefined time. Different from the traditional works that only solve the semiglobal bounded tracking problem for pure-feedback systems, this work not only achieves that the tracking error globally converges to zero but also guarantees that the convergence time can be predefined according to the user specification. In order to get the desired predefined-time controller, first, a mild semibound assumption for nonaffine functions is skillfully proposed so that the design difficulty caused by the structure of pure feedback can be easily solved. Then, we apply the property of radial basis function (RBF) neural networks (NNs) and Young's inequality to derive the upper bound of the term that contains the unknown nonlinear function and external disturbances, and the designed adaptive parameters decide the derived upper and robust control gain. Finally, the predefined-time virtual control inputs are presented whose derivatives are further estimated by utilizing finite-time differentiators. It is strictly proved that the proposed novel predefined-time controller can guarantee that the tracking error globally converges to zero within predefined time and a practical example is shown to verify the effectiveness and practicability of the proposed predefined-time control method.
Collapse
|
7
|
Wang Z, Lee J, Wei Q, Zhang A. Event-Triggered Near-Optimal Tracking Control based on Adaptive Dynamic Programming for Discrete-Time Systems. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.03.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
8
|
Zhao S, Wang J, Xu H, Wang B. Composite Observer-Based Optimal Attitude-Tracking Control With Reinforcement Learning for Hypersonic Vehicles. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:913-926. [PMID: 35969557 DOI: 10.1109/tcyb.2022.3192871] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This article proposes an observer-based reinforcement learning (RL) control approach to address the optimal attitude-tracking problem and application for hypersonic vehicles in the reentry phase. Due to the unknown uncertainty and nonlinearity caused by parameter perturbation and external disturbance, accurate model information of hypersonic vehicles in the reentry phase is generally unavailable. For this reason, a novel synchronous estimation is proposed to construct a composite observer for hypersonic vehicles, which consists of a neural-network (NN)-based Luenberger-type observer and a synchronous disturbance observer. This solves the identification problem of nonlinear dynamics in the reference control and realizes the estimation of the system state when unknown nonlinear dynamics and unknown disturbance exist at the same time. By synthesizing the information from the composite observer, an RL tracking controller is developed to solve the optimal attitude-tracking control problem. To improve the convergence performance of critic network weights, concurrent learning is employed to replace the traditional persistent excitation condition with a historical experience replay manner. In addition, this article proves that the weight estimation error is bounded when the learning rate satisfies the given sufficient condition. Finally, the numerical simulation demonstrates the effectiveness and superiority of the proposed approaches to attitude-tracking control systems for hypersonic vehicles.
Collapse
|
9
|
Li H, Wu Y, Chen M, Lu R. Adaptive Multigradient Recursive Reinforcement Learning Event-Triggered Tracking Control for Multiagent Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:144-156. [PMID: 34197328 DOI: 10.1109/tnnls.2021.3090570] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article proposes a fault-tolerant adaptive multigradient recursive reinforcement learning (RL) event-triggered tracking control scheme for strict-feedback discrete-time multiagent systems. The multigradient recursive RL algorithm is used to avoid the local optimal problem that may exist in the gradient descent scheme. Different from the existing event-triggered control results, a new lemma about the relative threshold event-triggered control strategy is proposed to handle the compensation error, which can improve the utilization of communication resources and weaken the negative impact on tracking accuracy and closed-loop system stability. To overcome the difficulty caused by sensor fault, a distributed control method is introduced by adopting the adaptive compensation technique, which can effectively decrease the number of online estimation parameters. Furthermore, by using the multigradient recursive RL algorithm with less learning parameters, the online estimation time can be effectively reduced. The stability of closed-loop multiagent systems is proved by using the Lyapunov stability theorem, and it is verified that all signals are semiglobally uniformly ultimately bounded. Finally, two simulation examples are given to show the availability of the presented control scheme.
Collapse
|
10
|
Rizvi SAA, Pertzborn AJ, Lin Z. Reinforcement Learning Based Optimal Tracking Control Under Unmeasurable Disturbances With Application to HVAC Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:7523-7533. [PMID: 34129505 PMCID: PMC9703879 DOI: 10.1109/tnnls.2021.3085358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This paper presents the design of an optimal controller for solving tracking problems subject to unmeasurable disturbances and unknown system dynamics using reinforcement learning (RL). Many existing RL control methods take disturbance into account by directly measuring it and manipulating it for exploration during the learning process, thereby preventing any disturbance induced bias in the control estimates. However, in most practical scenarios, disturbance is neither measurable nor manipulable. The main contribution of this article is the introduction of a combination of a bias compensation mechanism and the integral action in the Q-learning framework to remove the need to measure or manipulate the disturbance, while preventing disturbance induced bias in the optimal control estimates. A bias compensated Q-learning scheme is presented that learns the disturbance induced bias terms separately from the optimal control parameters and ensures the convergence of the control parameters to the optimal solution even in the presence of unmeasurable disturbances. Both state feedback and output feedback algorithms are developed based on policy iteration (PI) and value iteration (VI) that guarantee the convergence of the tracking error to zero. The feasibility of the design is validated on a practical optimal control application of a heating, ventilating, and air conditioning (HVAC) zone controller.
Collapse
|
11
|
Liu JJR, Kwok KW, Cui Y, Shen J, Lam J. Consensus of Positive Networked Systems on Directed Graphs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4575-4583. [PMID: 33646958 DOI: 10.1109/tnnls.2021.3058184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article addresses the distributed consensus problem for identical continuous-time positive linear systems with state-feedback control. Existing works of such a problem mainly focus on the case where the networked communication topologies are of either undirected and incomplete graphs or strongly connected directed graphs. On the other hand, in this work, the communication topologies of the networked system are described by directed graphs each containing a spanning tree, which is a more general and new scenario due to the interplay between the eigenvalues of the Laplacian matrix and the controller gains. Specifically, the problem involves complex eigenvalues, the Hurwitzness of complex matrices, and positivity constraints, which make analysis difficult in the Laplacian matrix. First, a necessary and sufficient condition for the consensus analysis of directed networked systems with positivity constraints is given, by using positive systems theory and graph theory. Unlike the general Riccati design methods that involve solving an algebraic Riccati equation (ARE), a condition represented by an algebraic Riccati inequality (ARI) is obtained for the existence of a solution. Subsequently, an equivalent condition, which corresponds to the consensus design condition, is derived, and a semidefinite programming algorithm is developed. It is shown that, when a protocol is solved by the algorithm for the networked system on a specific communication graph, there exists a set of graphs such that the positive consensus problem can be solved as well.
Collapse
|
12
|
Wei Q, Yang Z, Su H, Wang L. Monte Carlo-based Reinforcement Learning Control for Unmanned Aerial Vehicle Systems. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.08.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
13
|
Peng Z, Luo R, Hu J, Shi K, Nguang SK, Ghosh BK. Optimal Tracking Control of Nonlinear Multiagent Systems Using Internal Reinforce Q-Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4043-4055. [PMID: 33587710 DOI: 10.1109/tnnls.2021.3055761] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, a novel reinforcement learning (RL) method is developed to solve the optimal tracking control problem of unknown nonlinear multiagent systems (MASs). Different from the representative RL-based optimal control algorithms, an internal reinforce Q-learning (IrQ-L) method is proposed, in which an internal reinforce reward (IRR) function is introduced for each agent to improve its capability of receiving more long-term information from the local environment. In the IrQL designs, a Q-function is defined on the basis of IRR function and an iterative IrQL algorithm is developed to learn optimally distributed control scheme, followed by the rigorous convergence and stability analysis. Furthermore, a distributed online learning framework, namely, reinforce-critic-actor neural networks, is established in the implementation of the proposed approach, which is aimed at estimating the IRR function, the Q-function, and the optimal control scheme, respectively. The implemented procedure is designed in a data-driven way without needing knowledge of the system dynamics. Finally, simulations and comparison results with the classical method are given to demonstrate the effectiveness of the proposed tracking control method.
Collapse
|
14
|
Li T, Yang D, Xie X, Zhang H. Event-Triggered Control of Nonlinear Discrete-Time System With Unknown Dynamics Based on HDP(λ). IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:6046-6058. [PMID: 33531312 DOI: 10.1109/tcyb.2020.3044595] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The heuristic dynamic programming (HDP) ( λ )-based optimal control strategy, which takes a long-term prediction parameter λ into account using an iterative manner, accelerates the learning rate obviously. The computation complexity caused by the state-associated extra variable in λ -return value computing of the traditional value-gradient learning method can be reduced. However, as the iteration number increases, calculation costs have grown dramatically that bring huge challenge for the optimal control process with limited bandwidth and computational units. In this article, we propose an event-triggered HDP (ETHDP) ( λ ) optimal control strategy for nonlinear discrete-time (NDT) systems with unknown dynamics. The iterative relation for λ -return of the final target value is derived first. The event-triggered condition ensuring system stability is designed to reduce the computation and communication requirements. Next, we build a model-actor-critic neural network (NN) structure, in which the model NN evaluates the system state for getting λ -return of the current time target value, which is used to obtain the critic NN real-time update errors. The event-triggered optimal control signal and one-step-return value are approximated by actor and critic NN, respectively. Then, the event trigger-based uniformly ultimately bounded (UUB) stability of the system state and NN weight errors are demonstrated by applying the Lyapunov technology. Finally, we illustrate the effectiveness of our proposed ETHDP ( λ ) strategy by two cases.
Collapse
|
15
|
Adaptive Fault-Tolerant Control for Flexible Variable Structure Spacecraft with Actuator Saturation and Multiple Faults. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12115319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
This study investigated the adaptive fault-tolerant control (FTC) for a flexible variable structure spacecraft in the presence of external disturbance, multiple actuator faults, and saturation. The attitude system model of a variable structure spacecraft and actuator fault model are first given. A sliding mode-based fault detection observer and a radial basis function-based fault estimation observer were designed to detect the time of actuator fault occurrence and estimate the amplitude of an unknown fault, respectively. Then, the adaptive FTC with variable structure harmonic functions was proposed to automatically repair multiple actuator faults, which first guaranteed that the state trajectory of attitude systems without actuator saturation converges to a neighborhood of the origin. Then, another improved adaptive FTC scheme was further proposed in the actuator saturation constraint case, ensuring that all the closed-loop signals are finite-time convergence. Finally, simulation results are given to illustrate the effectiveness of the proposed method.
Collapse
|
16
|
Mu C, Wang K, Ma S, Chong Z, Ni Z. Adaptive composite frequency control of power systems using reinforcement learning. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2022. [DOI: 10.1049/cit2.12103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Chaoxu Mu
- School of Electrical and Information Engineering Tianjin University Tianjin China
| | - Ke Wang
- School of Electrical and Information Engineering Tianjin University Tianjin China
| | - Shiqian Ma
- State Grid Tianjin Electric Power Company Electric Power Research Institute Tianjin China
| | - Zhiqiang Chong
- State Grid Tianjin Electric Power Company Electric Power Research Institute Tianjin China
| | - Zhen Ni
- Department of Electrical Engineering and Computer Science Florida Atlantic University Boca Raton FL USA
| |
Collapse
|
17
|
Angular-Accelerometer-Based Flexible-State Estimation and Tracking Controller Design for Hypersonic Flight Vehicle. AEROSPACE 2022. [DOI: 10.3390/aerospace9040206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The controller design of hypersonic flight vehicles is a challenging task, especially when its flexible states are immeasurable. Unfortunately, the flexible states are difficult to measure directly. In this paper, an angular-accelerometer-based method for the estimation of flexible states is proposed. By adding a pitch angel angular accelerometer and designing an Extended Kalman Filter-based online estimation method, the flexible states could be obtained in real time. Then, based on the estimated flexible states, a stable inversion-based controller-design method was utilized, and a robust tracking controller was designed for hypersonic flight vehicles. The proposed method provides an effective means of estimating flexible states and conducting the observer-based controller design of hypersonic flight vehicles. Finally, a numeral simulation is given to show the effectiveness of the proposed control method.
Collapse
|
18
|
Chai R, Tsourdos A, Savvaris A, Chai S, Xia Y, Chen CLP. Design and Implementation of Deep Neural Network-Based Control for Automatic Parking Maneuver Process. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1400-1413. [PMID: 33332277 DOI: 10.1109/tnnls.2020.3042120] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article focuses on the design, test, and validation of a deep neural network (DNN)-based control scheme capable of predicting optimal motion commands for autonomous ground vehicles (AGVs) during the parking maneuver process. The proposed design utilizes a multilayer structure. In the first layer, a desensitized trajectory optimization method is iteratively performed to establish a set of time-optimal parking trajectories with the consideration of noise-perturbed initial configurations. Subsequently, by using the preplanned optimal parking trajectory data set, several DNNs are trained in order to learn the functional relationship between the system state-control actions in the second layer. To obtain further improvements regarding the DNN performances, a simple yet effective data aggregation approach is designed and applied. These trained DNNs are then utilized as the motion controllers to generate feedback actions in real time. Numerical results were executed to demonstrate the effectiveness and the real-time applicability of using the proposed control scheme to plan and steer the AGV parking maneuver. Experimental results were also provided to justify the algorithm performance in real-world implementations.
Collapse
|
19
|
Zhang C, Zhang G, Dong Q. Fixed-time disturbance observer-based nearly optimal control for reusable launch vehicle with input constraints. ISA TRANSACTIONS 2022; 122:182-197. [PMID: 33962796 DOI: 10.1016/j.isatra.2021.04.031] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 04/24/2021] [Accepted: 04/24/2021] [Indexed: 06/12/2023]
Abstract
In this paper, a fixed-time disturbance observer-based nearly optimal control (FTDO-NOC) scheme is proposed for reusable launch vehicle (RLV) subject to model uncertainties, input constraints, and unknown mismatched/matched disturbances. The dynamics of RLV attitude motion are divided into outer-loop subsystem and inner-loop subsystem. For the outer-loop subsystem, to address the problems of unknown mismatched disturbances and model uncertainties, a novel adaptive-gain multivariable generalized super-twisting (AMGST) controller is proposed. Two modified gain-adaptation laws are derived for tuning the control gains of AMGST controller, which attenuates chattering efficiently. For the inner-loop subsystem, considering the effect of unknown matched disturbances, a fixed-time disturbance observer (FTDO) is utilized to estimate the matched disturbances and the time derivative of virtual control input. Incorporated with the designed FTDO, a nearly optimal controller (NOC), which is based on the critic-actor neural networks (NNs), is utilized to generate the approximate optimal control moments satisfying the input constraints. The tracking errors of inner-loop subsystem and the weight estimation errors of the critic-actor NNs are proved to be uniformly ultimately bounded (UUB) via Lyapunov technique. Finally, we provide simulation results to validate the effectiveness and superiority of the proposed control scheme.
Collapse
Affiliation(s)
- Chaofan Zhang
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
| | - Guoshan Zhang
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China.
| | - Qi Dong
- China Academy of Electronics and Information Technology, Beijing 100041, China
| |
Collapse
|
20
|
Zhang S, Wang L, Wang H, Xue B. Consensus Control for Heterogeneous Multivehicle Systems: An Iterative Learning Approach. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:5356-5368. [PMID: 33857003 DOI: 10.1109/tnnls.2021.3071413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article investigates the consensus tracking problem of the heterogeneous multivehicle systems (MVSs) under a repeatable control environment. First, a unified iterative learning control (ILC) algorithm is presented for all autonomous vehicles, each of which is governed by both discrete- and continuous-time nonlinear dynamics. Then, several consensus criteria for MVSs with switching topology and external disturbances are established based on our proposed distributed ILC protocols. For discrete-time systems, all vehicles can perfectly track to the common reference trajectory over a specified finite time interval, and the corresponding digraphs may not have spanning trees. Existing approaches dealing with the continuous-time systems generally require that all vehicles have strictly identical initial conditions, being too ideal in practice. We relax this unpractical assumption and propose an extra distributed initial state learning protocol such that vehicles can take different initial states, leading to the fact that the finite time tracking is achieved ultimately regardless of the initial errors. Finally, a numerical example demonstrates the effectiveness of our theoretical results.
Collapse
|
21
|
Zhang K, Su R, Zhang H, Tian Y. Adaptive Resilient Event-Triggered Control Design of Autonomous Vehicles With an Iterative Single Critic Learning Framework. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:5502-5511. [PMID: 33534717 DOI: 10.1109/tnnls.2021.3053269] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article investigates the adaptive resilient event-triggered control for rear-wheel-drive autonomous (RWDA) vehicles based on an iterative single critic learning framework, which can effectively balance the frequency/changes in adjusting the vehicle's control during the running process. According to the kinematic equation of RWDA vehicles and the desired trajectory, the tracking error system during the autonomous driving process is first built, where the denial-of-service (DoS) attacking signals are injected into the networked communication and transmission. Combining the event-triggered sampling mechanism and iterative single critic learning framework, a new event-triggered condition is developed for the adaptive resilient control algorithm, and the novel utility function design is considered for driving the autonomous vehicle, where the control input can be guaranteed into an applicable saturated bound. Finally, we apply the new adaptive resilient control scheme to a case of driving the RWDA vehicles, and the simulation results illustrate the effectiveness and practicality successfully.
Collapse
|
22
|
Kong L, He W, Yang C, Sun C. Robust Neurooptimal Control for a Robot via Adaptive Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2584-2594. [PMID: 32941154 DOI: 10.1109/tnnls.2020.3006850] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We aim at the optimization of the tracking control of a robot to improve the robustness, under the effect of unknown nonlinear perturbations. First, an auxiliary system is introduced, and optimal control of the auxiliary system can be seen as an approximate optimal control of the robot. Then, neural networks (NNs) are employed to approximate the solution of the Hamilton-Jacobi-Isaacs equation under the frame of adaptive dynamic programming. Next, based on the standard gradient attenuation algorithm and adaptive critic design, NNs are trained depending on the designed updating law with relaxing the requirement of initial stabilizing control. In light of the Lyapunov stability theory, all the error signals can be proved to be uniformly ultimately bounded. A series of simulation studies are carried out to show the effectiveness of the proposed control.
Collapse
|
23
|
Xiao F. CED: A Distance for Complex Mass Functions. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:1525-1535. [PMID: 32310802 DOI: 10.1109/tnnls.2020.2984918] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Evidence theory is an effective methodology for modeling and processing uncertainty that has been widely applied in various fields. In evidence theory, a number of distance measures have been presented, which play an important role in representing the degree of difference between pieces of evidence. However, the existing evidential distances focus on traditional basic belief assignments (BBAs) modeled in terms of real numbers and are not compatible with complex BBAs (CBBAs) extended to the complex plane. Therefore, in this article, a generalized evidential distance measure called the complex evidential distance (CED) is proposed, which can measure the difference or dissimilarity between CBBAs in complex evidence theory. This is the first work to consider distance measures for CBBAs, and it provides a promising way to measure the differences between pieces of evidence in a more general framework of complex plane space. Furthermore, the CED is a strict distance metric with the properties of nonnegativity, nondegeneracy, symmetry, and triangle inequality that satisfies the axioms of a distance. In particular, when the CBBAs degenerate into classical BBAs, the CED will degenerate into Jousselme et al.'s distance. Therefore, the proposed CED is a generalization of the traditional evidential distance, but it has a greater ability to measure the difference or dissimilarity between pieces of evidence. Finally, a decision-making algorithm for pattern recognition is devised based on the CED and is applied to a medical diagnosis problem to illustrate its practicability.
Collapse
|
24
|
Zhao W, Xu C, Guan Z, Liu Y. Multiview Concept Learning Via Deep Matrix Factorization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:814-825. [PMID: 32275617 DOI: 10.1109/tnnls.2020.2979532] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Multiview representation learning (MVRL) leverages information from multiple views to obtain a common representation summarizing the consistency and complementarity in multiview data. Most previous matrix factorization-based MVRL methods are shallow models that neglect the complex hierarchical information. The recently proposed deep multiview factorization models cannot explicitly capture consistency and complementarity in multiview data. We present the deep multiview concept learning (DMCL) method, which hierarchically factorizes the multiview data, and tries to explicitly model consistent and complementary information and capture semantic structures at the highest abstraction level. We explore two variants of the DMCL framework, DMCL-L and DMCL-N, with respectively linear/nonlinear transformations between adjacent layers. We propose two block coordinate descent-based optimization methods for DMCL-L and DMCL-N. We verify the effectiveness of DMCL on three real-world data sets for both clustering and classification tasks.
Collapse
|
25
|
Dong G, Li H, Ma H, Lu R. Finite-Time Consensus Tracking Neural Network FTC of Multi-Agent Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:653-662. [PMID: 32481227 DOI: 10.1109/tnnls.2020.2978898] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The finite-time consensus fault-tolerant control (FTC) tracking problem is studied for the nonlinear multi-agent systems (MASs) in the nonstrict feedback form. The MASs are subject to unknown symmetric output dead zones, actuator bias and gain faults, and unknown control coefficients. According to the properties of the neural network (NN), the unstructured uncertainties problem is solved. The Nussbaum function is used to address the output dead zones and unknown control directions problems. By introducing an arbitrarily small positive number, the "singularity" problem caused by combining the finite-time control and backstepping design is solved. According to the backstepping design and Lyapunov stability theory, a finite-time adaptive NN FTC controller is obtained, which guarantees that the tracking error converges to a small neighborhood of zero in a finite time, and all signals in the closed-loop system are bounded. Finally, the effectiveness of the proposed method is illustrated via a physical example.
Collapse
|
26
|
Chai R, Tsourdos A, Savvaris A, Chai S, Xia Y, Chen CLP. Six-DOF Spacecraft Optimal Trajectory Planning and Real-Time Attitude Control: A Deep Neural Network-Based Approach. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5005-5013. [PMID: 31870996 DOI: 10.1109/tnnls.2019.2955400] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This brief presents an integrated trajectory planning and attitude control framework for six-degree-of-freedom (6-DOF) hypersonic vehicle (HV) reentry flight. The proposed framework utilizes a bilevel structure incorporating desensitized trajectory optimization and deep neural network (DNN)-based control. In the upper level, a trajectory data set containing optimal system control and state trajectories is generated, while in the lower level control system, DNNs are constructed and trained using the pregenerated trajectory ensemble in order to represent the functional relationship between the optimized system states and controls. These well-trained networks are then used to produce optimal feedback actions online. A detailed simulation analysis was performed to validate the real-time applicability and the optimality of the designed bilevel framework. Moreover, a comparative analysis was also carried out between the proposed DNN-driven controller and other optimization-based techniques existing in related works. Our results verify the reliability of using the proposed bilevel design for the control of HV reentry flight in real time.
Collapse
|
27
|
Zhang J, Peng Z, Hu J, Zhao Y, Luo R, Ghosh BK. Internal reinforcement adaptive dynamic programming for optimal containment control of unknown continuous-time multi-agent systems. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.06.106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
28
|
Robust optimal control for a class of nonlinear systems with unknown disturbances based on disturbance observer and policy iteration. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.082] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
29
|
Feng C, Wang Q, Liu C, Hu C, Liang X. Variable-Structure Near-Space Vehicles with Time-Varying State Constraints Attitude Control Based on Switched Nonlinear System. SENSORS 2020; 20:s20030848. [PMID: 32033432 PMCID: PMC7038718 DOI: 10.3390/s20030848] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/01/1970] [Revised: 02/03/2020] [Accepted: 02/04/2020] [Indexed: 11/16/2022]
Abstract
This study is concerned with the attitude control problem of variable-structure near-space vehicles (VSNSVs) with time-varying state constraints based on switched nonlinear system. The full states of vehicles are constrained in the bounded sets with asymmetric time-varying boundaries. Firstly, considering modeling uncertainties and external disturbances, an extended state observer (ESO), including two distinct linear regions, is proposed with the advantage of avoiding the peaking value problem. The disturbance observer is utilized to estimate the total disturbances of the attitude angle and angular rate subsystems, which are described in switched nonlinear systems. Then, based on the estimation values, the asymmetric time-varying barrier Lyapunov function (BLF) is employed to construct the active disturbance rejection controller, which can ensure the full state constraints are not violated. Furthermore, to resolve the 'explosion of complexity' problem in backstepping control, a modified dynamic surface control is proposed. Rigorous stability analysis is given to prove that all signals of the closed-loop system are bounded. Numerical simulations are carried out to demonstrate the effectiveness of the proposed control scheme.
Collapse
Affiliation(s)
- Cong Feng
- Department of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China; (Q.W.); (X.L.)
- Correspondence: ; Tel.: +86-189-1059-1055
| | - Qing Wang
- Department of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China; (Q.W.); (X.L.)
| | - Chen Liu
- Science and Technology on Special System Simulation Laboratory, Beijing Simulation Center, Beijing 100854, China;
| | - Changhua Hu
- Department of Automation, High-Tech Institute of Xi’an, Xi’an 710000, China;
| | - Xiaohui Liang
- Department of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China; (Q.W.); (X.L.)
| |
Collapse
|
30
|
Mu C, Zhang Y. Learning-Based Robust Tracking Control of Quadrotor With Time-Varying and Coupling Uncertainties. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:259-273. [PMID: 30908267 DOI: 10.1109/tnnls.2019.2900510] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, a learning-based robust tracking control scheme is proposed for a quadrotor unmanned aerial vehicle system. The quadrotor dynamics are modeled including time-varying and coupling uncertainties. By designing position and attitude tracking error subsystems, the robust tracking control strategy is conducted by involving the approximately optimal control of associated nominal error subsystems. Furthermore, an improved weight updating rule is adopted, and neural networks are applied in the learning-based control scheme to get the approximately optimal control laws of the nominal error subsystems. The stability of tracking error subsystems with time-varying and coupling uncertainties is provided as the theoretical guarantee of learning-based robust tracking control scheme. Finally, considering the variable disturbances in the actual environment, three simulation cases are presented based on linear and nonlinear models of quadrotor with competitive results to demonstrate the effectiveness of the proposed control scheme.
Collapse
|
31
|
Shi W, Song S, Wu C, Chen CLP. Multi Pseudo Q-Learning-Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3534-3546. [PMID: 30602426 DOI: 10.1109/tnnls.2018.2884797] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This paper investigates trajectory tracking problem for a class of underactuated autonomous underwater vehicles (AUVs) with unknown dynamics and constrained inputs. Different from existing policy gradient methods which employ single actor critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively. Specifically, for the critics, the expected absolute Bellman error-based updating rule is used to choose the worst critic to be updated in each time step. Subsequently, to calculate the loss function with more accurate target value for the chosen critic, Pseudo Q-learning, which uses subgreedy policy to replace the greedy policy in Q-learning, is developed for continuous action spaces, and Multi Pseudo Q-learning (MPQ) is proposed to reduce the overestimation of action-value function and to stabilize the learning. As for the actors, deterministic policy gradient is applied to update the weights, and the final learned policy is defined as the average of all actors to avoid large but bad updates. Moreover, the stability analysis of the learning is given qualitatively. The effectiveness and generality of the proposed MPQ-based deterministic policy gradient (MPQ-DPG) algorithm are verified by the application on AUV with two different reference trajectories. In addition, the results demonstrate high-level tracking control accuracy and stable learning of MPQ-DPG. Besides, the results also validate that increasing the number of the actors and critics will further improve the performance.
Collapse
|
32
|
Li Y, Yang C, Yan W, Cui R, Annamalai A. Admittance-Based Adaptive Cooperative Control for Multiple Manipulators With Output Constraints. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3621-3632. [PMID: 30843811 DOI: 10.1109/tnnls.2019.2897847] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This paper proposes a novel adaptive control methodology based on the admittance model for multiple manipulators transporting a rigid object cooperatively along a predefined desired trajectory. First, an admittance model is creatively applied to generate reference trajectory online for each manipulator according to the desired path of the rigid object, which is the reference input of the controller. Then, an innovative integral barrier Lyapunov function is utilized to tackle the constraints due to the physical and environmental limits. Adaptive neural networks (NNs) are also employed to approximate the uncertainties of the manipulator dynamics. Different from the conventional NN approximation method, which is usually semiglobally uniformly ultimately bounded, a switching function is presented to guarantee the global stability of the closed loop. Finally, the simulation studies are conducted on planar two-link robot manipulators to validate the efficacy of the proposed approach.
Collapse
|
33
|
An H, Xia H, Ma G, Wang C. Adaptive control of a switched hypersonic vehicle model robust to scramjet choking and elevator fault. ISA TRANSACTIONS 2019; 95:45-57. [PMID: 31160038 DOI: 10.1016/j.isatra.2019.05.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 05/19/2019] [Accepted: 05/25/2019] [Indexed: 06/09/2023]
Abstract
This paper studies the longitudinal control problem for air-breathing hypersonic vehicles (AHVs) with safety considerations on scramjet choking and elevator fault. A control-oriented switched model (COSM) is employed to better describe the full-envelope flight of AHVs. Unlike the recent control strategy that utilizes neural networks to approximate the unpredictable switching nonlinearities in the COSM, a control strategy free of neural networks is proposed for AHVs. The saturation characteristic of fuel-to-air equivalent ratio is accommodated to protect the scramjet from thermal choking by constructing an adaptive velocity reference generator, which adjusts the velocity reference according to the saturation level. Meanwhile, the time-varying efficiency ratio and bias of faulty elevator are lumped into the uncertain parameters, which are handled by a bound adaption mechanism. A simulation study verifies the developed control.
Collapse
Affiliation(s)
- Hao An
- Space Control and Inertial Technology Research Center, Harbin Institute of Technology, Harbin, 150001, PR China.
| | - Hongwei Xia
- Space Control and Inertial Technology Research Center, Harbin Institute of Technology, Harbin, 150001, PR China.
| | - Guangcheng Ma
- Space Control and Inertial Technology Research Center, Harbin Institute of Technology, Harbin, 150001, PR China.
| | - Changhong Wang
- Space Control and Inertial Technology Research Center, Harbin Institute of Technology, Harbin, 150001, PR China.
| |
Collapse
|
34
|
Wei C, Luo J, Dai H, Duan G. Learning-Based Adaptive Attitude Control of Spacecraft Formation With Guaranteed Prescribed Performance. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:4004-4016. [PMID: 30072354 DOI: 10.1109/tcyb.2018.2857400] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper investigates a novel leader-following attitude control approach for spacecraft formation under the preassigned two-layer performance with consideration of unknown inertial parameters, external disturbance torque, and unmodeled uncertainty. First, two-layer prescribed performance is preselected for both the attitude angular and angular velocity tracking errors. Subsequently, a distributed two-layer performance controller is devised, which can guarantee that all the involved closed-loop signals are uniformly ultimately bounded. In order to tackle the defect of statically two-layer performance controller, learning-based control strategy is introduced to serve as an adaptive supplementary controller based on adaptive dynamic programming technique. This enhances the adaptiveness of the statically two-layer performance controller with respect to unexpected uncertainty dramatically, without any prior knowledge of the inertial information. Furthermore, by employing the robustly positively invariant theory, the input-to-state stability is rigorously proven under the designed learning-based distributed controller. Finally, two groups of simulation examples are organized to validate the feasibility and effectiveness of the proposed distributed control approach.
Collapse
|
35
|
Mu C, Zhao Q, Sun C, Gao Z. An ADDHP-based Q-learning algorithm for optimal tracking control of linear discrete-time systems with unknown dynamics. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105593] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
36
|
Song R, Xie Y, Zhang Z. Data-driven finite-horizon optimal tracking control scheme for completely unknown discrete-time nonlinear systems. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.05.026] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
37
|
He S, Fang H, Zhang M, Liu F, Luan X, Ding Z. Online policy iterative-based H∞ optimization algorithm for a class of nonlinear systems. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.04.027] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
38
|
Rizvi SAA, Lin Z. Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:1523-1536. [PMID: 30296242 DOI: 10.1109/tnnls.2018.2870075] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Approximate dynamic programming (ADP) and reinforcement learning (RL) have emerged as important tools in the design of optimal and adaptive control systems. Most of the existing RL and ADP methods make use of full-state feedback, a requirement that is often difficult to satisfy in practical applications. As a result, output feedback methods are more desirable as they relax this requirement. In this paper, we present a new output feedback-based Q-learning approach to solving the linear quadratic regulation (LQR) control problem for discrete-time systems. The proposed scheme is completely online in nature and works without requiring the system dynamics information. More specifically, a new representation of the LQR Q-function is developed in terms of the input-output data. Based on this new Q-function representation, output feedback LQR controllers are designed. We present two output feedback iterative Q-learning algorithms based on the policy iteration and the value iteration methods. This scheme has the advantage that it does not incur any excitation noise bias, and therefore, the need of using discounted cost functions is circumvented, which in turn ensures closed-loop stability. It is shown that the proposed algorithms converge to the solution of the LQR Riccati equation. A comprehensive simulation study is carried out, which illustrates the proposed scheme.
Collapse
|
39
|
He S, Zhang M, Fang H, Liu F, Luan X, Ding Z. Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic information. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04180-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
40
|
He W, Yan Z, Sun Y, Ou Y, Sun C. Neural-Learning-Based Control for a Constrained Robotic Manipulator With Flexible Joints. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5993-6003. [PMID: 29993842 DOI: 10.1109/tnnls.2018.2803167] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Nowadays, the control technology of the robotic manipulator with flexible joints (RMFJ) is not mature enough. The flexible-joint manipulator dynamic system possesses many uncertainties, which brings a great challenge to the controller design. This paper is motivated by this problem. In order to deal with this and enhance the system robustness, the full-state feedback neural network (NN) control is proposed. Moreover, output constraints of the RMFJ are achieved, which improve the security of the robot. Through the Lyapunov stability analysis, we identify that the proposed controller can guarantee not only the stability of flexible-joint manipulator system but also the boundedness of system state variables by choosing appropriate control gains. Then, we make some necessary simulation experiments to verify the rationality of our controllers. Finally, a series of control experiments are conducted on the Baxter. By comparing with the proportional-derivative control and the NN control with the rigid manipulator model, the feasibility and the effectiveness of NN control based on flexible-joint manipulator model are verified.
Collapse
|
41
|
Sun C, Gao H, He W, Yu Y. Fuzzy Neural Network Control of a Flexible Robotic Manipulator Using Assumed Mode Method. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5214-5227. [PMID: 29994372 DOI: 10.1109/tnnls.2017.2743103] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, in order to analyze the single-link flexible structure, the assumed mode method is employed to develop the dynamic model. Based on the discrete dynamic model, fuzzy neural network (NN) control is investigated to track the desired trajectory accurately and to suppress the flexible vibration maximally. To ensure the stability rigorously as the goal, the system is proved to be uniform ultimate boundedness by Lyapunov's stability method. Eventually, simulations verify that the proposed control strategy is effective, and the control performance is compared with the proportion derivative control. The experiments are implemented on the Quanser platform to further demonstrate the feasibility of the proposed fuzzy NN control.
Collapse
|
42
|
Mu C, Wang D, He H. Data-Driven Finite-Horizon Approximate Optimal Control for Discrete-Time Nonlinear Systems Using Iterative HDP Approach. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:2948-2961. [PMID: 29028219 DOI: 10.1109/tcyb.2017.2752845] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper presents a data-based finite-horizon optimal control approach for discrete-time nonlinear affine systems. The iterative adaptive dynamic programming (ADP) is used to approximately solve Hamilton-Jacobi-Bellman equation by minimizing the cost function in finite time. The idea is implemented with the heuristic dynamic programming (HDP) involved the model network, which makes the iterative control at the first step can be obtained without the system function, meanwhile the action network is used to obtain the approximate optimal control law and the critic network is utilized for approximating the optimal cost function. The convergence of the iterative ADP algorithm and the stability of the weight estimation errors based on the HDP structure are intensively analyzed. Finally, two simulation examples are provided to demonstrate the theoretical results and show the performance of the proposed method.
Collapse
|
43
|
Near-optimal output tracking controller design for nonlinear systems using an event-driven ADP approach. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.05.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
44
|
Zhai D, An L, Li X, Zhang Q. Adaptive Fault-Tolerant Control for Nonlinear Systems With Multiple Sensor Faults and Unknown Control Directions. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4436-4446. [PMID: 29990256 DOI: 10.1109/tnnls.2017.2766283] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper investigates the problem of adaptive fault-tolerant control for a class of nonlinear parametric strict-feedback systems with multiple unknown control directions. Multiple sensor faults are first considered such that all real state variables are unavailable. Then, a constructive design method for the problem is set up by exploiting a parameter separation and regrouping technique. To circumvent the main obstacle caused by the coupling effects of multiple unknown control directions and sensor faults, a region-dependent segmentation analysis method is proposed. It is proven that the closed-loop system is globally exponentially stable. Simulation results are presented to illustrate the effectiveness of the proposed scheme.
Collapse
|
45
|
Xu B, Yang D, Shi Z, Pan Y, Chen B, Sun F. Online Recorded Data-Based Composite Neural Control of Strict-Feedback Systems With Application to Hypersonic Flight Dynamics. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:3839-3849. [PMID: 28952951 DOI: 10.1109/tnnls.2017.2743784] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper investigates the online recorded data-based composite neural control of uncertain strict-feedback systems using the backstepping framework. In each step of the virtual control design, neural network (NN) is employed for uncertainty approximation. In previous works, most designs are directly toward system stability ignoring the fact how the NN is working as an approximator. In this paper, to enhance the learning ability, a novel prediction error signal is constructed to provide additional correction information for NN weight update using online recorded data. In this way, the neural approximation precision is highly improved, and the convergence speed can be faster. Furthermore, the sliding mode differentiator is employed to approximate the derivative of the virtual control signal, and thus, the complex analysis of the backstepping design can be avoided. The closed-loop stability is rigorously established, and the boundedness of the tracking error can be guaranteed. Through simulation of hypersonic flight dynamics, the proposed approach exhibits better tracking performance.
Collapse
|
46
|
Wang Y, Hu J. Improved prescribed performance control for air-breathing hypersonic vehicles with unknown deadzone input nonlinearity. ISA TRANSACTIONS 2018; 79:95-107. [PMID: 29789154 DOI: 10.1016/j.isatra.2018.05.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Revised: 04/27/2018] [Accepted: 05/10/2018] [Indexed: 06/08/2023]
Abstract
An improved prescribed performance controller is proposed for the longitudinal model of an air-breathing hypersonic vehicle (AHV) subject to uncertain dynamics and input nonlinearity. Different from the traditional non-affine model requiring non-affine functions to be differentiable, this paper utilizes a semi-decomposed non-affine model with non-affine functions being locally semi-bounded and possibly in-differentiable. A new error transformation combined with novel prescribed performance functions is proposed to bypass complex deductions caused by conventional error constraint approaches and circumvent high frequency chattering in control inputs. On the basis of backstepping technique, the improved prescribed performance controller with low structural and computational complexity is designed. The methodology guarantees the altitude and velocity tracking error within transient and steady state performance envelopes and presents excellent robustness against uncertain dynamics and deadzone input nonlinearity. Simulation results demonstrate the efficacy of the proposed method.
Collapse
Affiliation(s)
- Yingyang Wang
- Equipment Management and Unmanned Aerial Vehicle Engineering College, Air Force Engineering University, Xi'an, 710051, China.
| | - Jianbo Hu
- Equipment Management and Unmanned Aerial Vehicle Engineering College, Air Force Engineering University, Xi'an, 710051, China
| |
Collapse
|
47
|
Guo W, Si J, Liu F, Mei S. Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2794-2807. [PMID: 28600262 DOI: 10.1109/tnnls.2017.2702566] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Policy iteration approximate dynamic programming (DP) is an important algorithm for solving optimal decision and control problems. In this paper, we focus on the problem associated with policy approximation in policy iteration approximate DP for discrete-time nonlinear systems using infinite-horizon undiscounted value functions. Taking policy approximation error into account, we demonstrate asymptotic stability of the control policy under our problem setting, show boundedness of the value function during each policy iteration step, and introduce a new sufficient condition for the value function to converge to a bounded neighborhood of the optimal value function. Aiming for practical implementation of an approximate policy, we consider using Volterra series, which has been extensively covered in controls literature for its good theoretical properties and for its success in practical applications. We illustrate the effectiveness of the main ideas developed in this paper using several examples including a practical problem of excitation control of a hydrogenerator.
Collapse
|
48
|
Deptula P, Rosenfeld JA, Kamalapurkar R, Dixon WE. Approximate Dynamic Programming: Combining Regional and Local State Following Approximations. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2154-2166. [PMID: 29771668 DOI: 10.1109/tnnls.2018.2808102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
An infinite-horizon optimal regulation problem for a control-affine deterministic system is solved online using a local state following (StaF) kernel and a regional model-based reinforcement learning (R-MBRL) method to approximate the value function. Unlike traditional methods such as R-MBRL that aim to approximate the value function over a large compact set, the StaF kernel approach aims to approximate the value function in a local neighborhood of the state that travels within a compact set. In this paper, the value function is approximated using a state-dependent convex combination of the StaF-based and the R-MBRL-based approximations. As the state enters a neighborhood containing the origin, the value function transitions from being approximated by the StaF approach to the R-MBRL approach. Semiglobal uniformly ultimately bounded (SGUUB) convergence of the system states to the origin is established using a Lyapunov-based analysis. Simulation results are provided for two, three, six, and ten-state dynamical systems to demonstrate the scalability and performance of the developed method.
Collapse
|
49
|
Wang D, Mu C, Liu D, Ma H. On Mixed Data and Event Driven Design for Adaptive-Critic-Based Nonlinear $H_{\infty}$ Control. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:993-1005. [PMID: 28166505 DOI: 10.1109/tnnls.2016.2642128] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, based on the adaptive critic learning technique, the control for a class of unknown nonlinear dynamic systems is investigated by adopting a mixed data and event driven design approach. The nonlinear control problem is formulated as a two-player zero-sum differential game and the adaptive critic method is employed to cope with the data-based optimization. The novelty lies in that the data driven learning identifier is combined with the event driven design formulation, in order to develop the adaptive critic controller, thereby accomplishing the nonlinear control. The event driven optimal control law and the time driven worst case disturbance law are approximated by constructing and tuning a critic neural network. Applying the event driven feedback control, the closed-loop system is built with stability analysis. Simulation studies are conducted to verify the theoretical results and illustrate the control performance. It is significant to observe that the present research provides a new avenue of integrating data-based control and event-triggering mechanism into establishing advanced adaptive critic systems.
Collapse
|
50
|
|