1
|
Wu H, Hu Q, Zheng J, Dong F, Ouyang Z, Li D. Discounted Inverse Reinforcement Learning for Linear Quadratic Control. IEEE TRANSACTIONS ON CYBERNETICS 2025; 55:1995-2007. [PMID: 40036510 DOI: 10.1109/tcyb.2025.3540967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Linear quadratic control with unknown value functions and dynamics is extremely challenging, and most of the existing studies have focused on the regulation problem, incapable of dealing with the tracking problem. To solve both linear quadratic regulation and tracking problems for continuous-time systems with unknown value functions, this article develops a discounted inverse reinforcement learning (DIRL) method that inherits the model-independent property of reinforcement learning (RL). More specifically, we first formulate a standard paradigm for solving linear quadratic control using DIRL. To recover the value function and the target control gain, an error metric is elaborately constructed, and a quasi-Newton algorithm is adopted to minimize it. Furthermore, three DIRL algorithms, including model-based, model-free off-policy, and model-free on-policy algorithms, are proposed. The latter two rely on the expert's demonstration data or the online observed data, requiring no prior knowledge of the system dynamics and value function. The stability, convergence, and existence conditions of multiple solutions are thoroughly analyzed. Finally, numerical simulations demonstrate the effectiveness of the theoretical results.
Collapse
|
2
|
Wang J, Wu J, Cao J, Chadli M, Shen H. Nonfragile Output Feedback Tracking Control for Markov Jump Fuzzy Systems Based on Integral Reinforcement Learning Scheme. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:4521-4530. [PMID: 36194715 DOI: 10.1109/tcyb.2022.3203795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this article, a novel integral reinforcement learning (RL)-based nonfragile output feedback tracking control algorithm is proposed for uncertain Markov jump nonlinear systems presented by the Takagi-Sugeno fuzzy model. The problem of nonfragile control is converted into solving the zero-sum games, where the control input and uncertain disturbance input can be regarded as two rival players. Based on the RL architecture, an offline parallel output feedback tracking learning algorithm is first designed to solve fuzzy stochastic coupled algebraic Riccati equations for Markov jump fuzzy systems. Furthermore, to overcome the requirement of a precise system information and transition probability, an online parallel integral RL-based algorithm is designed. Besides, the tracking object is achieved and the stochastically asymptotic stability, and expected H∞ performance for considered systems is ensured via the Lyapunov stability theory and stochastic analysis method. Furthermore, the effectiveness of the proposed control algorithm is verified by a robot arm system.
Collapse
|
3
|
Arogeti SA, Lewis FL. Static Output-Feedback H ∞ Control Design Procedures for Continuous-Time Systems With Different Levels of Model Knowledge. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1432-1446. [PMID: 34570712 DOI: 10.1109/tcyb.2021.3103148] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article suggests a collection of model-based and model-free output-feedback optimal solutions to a general H∞ control design criterion of a continuous-time linear system. The goal is to obtain a static output-feedback controller while the design criterion is formulated with an exponential term, divergent or convergent, depending on the designer's choice. Two offline policy-iteration algorithms are presented first, which form the foundations for a family of online off-policy designs. These algorithms cover all different cases of partial or complete model knowledge and provide the designer with a collection of design alternatives. It is shown that such a design for partial model knowledge can reduce the number of unknown matrices to be solved online. In particular, if the disturbance input matrix of the model is given, off-policy learning can be done with no disturbance excitation. This alternative is useful in situations where a measurable disturbance is not available in the learning phase. The utility of these design procedures is demonstrated for the case of an optimal lane tracking controller of an automated car.
Collapse
|
4
|
Mazouchi M, Yang Y, Modares H. Data-Driven Dynamic Multiobjective Optimal Control: An Aspiration-Satisfying Reinforcement Learning Approach. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6183-6193. [PMID: 33886483 DOI: 10.1109/tnnls.2021.3072571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article presents an iterative data-driven algorithm for solving dynamic multiobjective (MO) optimal control problems arising in control of nonlinear continuous-time systems. It is first shown that the Hamiltonian functional corresponding to each objective can be leveraged to compare the performance of admissible policies. Hamiltonian inequalities are then used for which their satisfaction guarantees satisfying the objectives' aspirations. Relaxed Hamilton-Jacobi-Bellman (HJB) equations in terms of HJB inequalities are then solved in a dynamic constrained MO framework to find Pareto optimal solutions. Relation to satisficing (good enough) decision-making framework is shown. A sum-of-square (SOS)-based iterative algorithm is developed to solve the formulated aspiration-satisfying MO optimization. To obviate the requirement of complete knowledge of the system dynamics, a data-driven satisficing reinforcement learning approach is proposed to solve the SOS optimization problem in real time using only the information of the system trajectories measured during a time interval without having full knowledge of the system dynamics. Finally, two simulation examples are utilized to verify the analytical results of the proposed algorithm.
Collapse
|
5
|
Cheng Y, Huang L, Wang X. Authentic Boundary Proximal Policy Optimization. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9428-9438. [PMID: 33705327 DOI: 10.1109/tcyb.2021.3051456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging tasks. However, there is still a large space for theoretical explanation of the mechanism of PPO's horizontal clipping operation, which is a key means to improve the performance of PPO. In addition, while PPO is inspired by the learning theory of trust region policy optimization (TRPO), the theoretical connection between PPO's clipping operation and TRPO's trust region constraint has not been well studied. In this article, we first analyze the effect of PPO's clipping operation on the objective function of conservative policy iteration, and strictly give the theoretical relationship between PPO and TRPO. Then, a novel first-order policy gradient algorithm called authentic boundary PPO (ABPPO) is proposed, which is based on the authentic boundary setting rule. To ensure the difference between the new and old policies is better kept within the clipping range, by borrowing the idea of ABPPO, we proposed two novel improved PPO algorithms called rollback mechanism-based ABPPO (RMABPPO) and penalized point policy difference-based ABPPO (P3DABPPO), which are based on the ideas of rollback clipping and penalized point policy difference, respectively. Experiments on the continuous robotic control tasks implemented in MuJoCo show that our proposed improved PPO algorithms can effectively improve the learning stability and accelerate the learning speed compared with the original PPO.
Collapse
|
6
|
Wang N, Gao Y, Yang C, Zhang X. Reinforcement learning-based finite-time tracking control of an unknown unmanned surface vehicle with input constraints. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.04.133] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
7
|
Duan K, Fong S, Chen CP. Reinforcement learning based model-free optimized trajectory tracking strategy design for an AUV. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.10.056] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
8
|
Zhou P, Zhao W, Li J, Li A, Du W, Wen S. Massive Maritime Path Planning: A Contextual Online Learning Approach. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:6262-6273. [PMID: 32112685 DOI: 10.1109/tcyb.2019.2959543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The ocean has been investigated for centuries across the world, and planning the travel path for vessels in the ocean has become a hot topic in recent decades as the increasing development of worldwide business trading. Planning such suitable paths is often based on big data processing in cybernetics, while not many investigations have been done. We attempt to find the optimal path for vessels in the ocean by proposing an online learning dispatch approach on studying the mission-executing-feedback (MEF) model. The proposed approach explores the ocean subdomain (OS) to achieve the largest average traveling feedback for different vessels. It balances the ocean path by a deep and wide search, and considers adaptation for these vessels. Further, we propose a contextual multiarmed bandit-based algorithm, which provides accurate exploration results with sublinear regret and significantly improves the learning speed. The experimental results show that the proposed MEF approach possesses 90% accuracy gain over random exploration and achieves about 25% accuracy improvement over other contextual bandit models on supporting big data online learning pre-eminently.
Collapse
|
9
|
Integral reinforcement learning-based optimal output feedback control for linear continuous-time systems with input delay. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.06.073] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
10
|
Hao Y, Wang T, Li G, Wen C. Linear Quadratic Optimal Control of Time-Invariant Linear Networks With Selectable Input Matrix. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4743-4754. [PMID: 31804949 DOI: 10.1109/tcyb.2019.2953218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Optimal control of networks is to minimize the cost function of a network in a dynamical process with an optimal control strategy. For the time-invariant linear systems, · x(t)=A x(t)+B u(t) , and the traditional linear quadratic regulator (LQR), which minimizes a quadratic cost function, has been well established given both the adjacency matrix A and the control input matrix B . However, this conventional approach is not applicable when we have the freedom to design B . In this article, we investigate the situation when the input matrix B is a variable to be designed to reduce the control cost. First, the problem is formulated and we establish an equivalent expression of the quadratic cost function with respect to B , which is difficult to obtain within the traditional theoretical framework as it requires obtaining an explicit solution of a Riccati differential equation (RDE). Next, we derive the gradient of the quadratic cost function with respect to the matrix variable B analytically. Further, we obtain three inequalities of the cost functions, after which several possible design (optimization) problems are discussed, and algorithms based on gradient information are proposed. It is shown that the cost of controlling the LTI systems can be significantly reduced when the input matrix becomes "designable." We find that the nodes connected to input sources can be sparsely identified and they are distributed as evenly as possible in the LTI networks if one wants to control the networks with the lowest cost. Our findings help us better understand how the LTI systems should be controlled through designing the input matrix.
Collapse
|
11
|
Calafiore GC, Possieri C. Output Feedback Q-Learning for Linear-Quadratic Discrete-Time Finite-Horizon Control Problems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3274-3281. [PMID: 32745011 DOI: 10.1109/tnnls.2020.3010304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
An algorithm is proposed to determine output feedback policies that solve finite-horizon linear-quadratic (LQ) optimal control problems without requiring knowledge of the system dynamical matrices. To reach this goal, the Q -factors arising from finite-horizon LQ problems are first characterized in the state feedback case. It is then shown how they can be parameterized as functions of the input-output vectors. A procedure is then proposed for estimating these functions from input/output data and using these estimates for computing the optimal control via the measured inputs and outputs.
Collapse
|
12
|
Na J, Zhao J, Gao G, Li Z. Output-Feedback Robust Control of Uncertain Systems via Online Data-Driven Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2650-2662. [PMID: 32706646 DOI: 10.1109/tnnls.2020.3007414] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Although robust control has been studied for decades, the output-feedback robust control design is still challenging in the control field. This article proposes a new approach to address the output-feedback robust control for continuous-time uncertain systems. First, we transform the robust control problem into an optimal control problem of the nominal linear system with a constructive cost function, which allows simplifying the control design. Then, a modified algebraic Riccati equation (MARE) is constructed by further investigating the corresponding relationship with the state-feedback optimal control. To solve the derived MARE online, the vectorization operation and Kronecker's product are applied to reformulate the output Lyapunov function, and then, a new online data-driven learning method is suggested to learn its solution. Consequently, only the measurable system input and output are used to derive the solution of the MARE. In this case, the output-feedback robust control gain can be obtained without using the unknown system states. The control system stability and convergence of the derived solution are rigorously proved. Two simulation examples are provided to demonstrate the efficacy of the suggested methods.
Collapse
|
13
|
Yang X, He H, Zhong X. Approximate Dynamic Programming for Nonlinear-Constrained Optimizations. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2419-2432. [PMID: 31329149 DOI: 10.1109/tcyb.2019.2926248] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this paper, we study the constrained optimization problem of a class of uncertain nonlinear interconnected systems. First, we prove that the solution of the constrained optimization problem can be obtained through solving an array of optimal control problems of constrained auxiliary subsystems. Then, under the framework of approximate dynamic programming, we present a simultaneous policy iteration (SPI) algorithm to solve the Hamilton-Jacobi-Bellman equations corresponding to the constrained auxiliary subsystems. By building an equivalence relationship, we demonstrate the convergence of the SPI algorithm. Meanwhile, we implement the SPI algorithm via an actor-critic structure, where actor networks are used to approximate optimal control policies and critic networks are applied to estimate optimal value functions. By using the least squares method and the Monte Carlo integration technique together, we are able to determine the weight vectors of actor and critic networks. Finally, we validate the developed control method through the simulation of a nonlinear interconnected plant.
Collapse
|
14
|
Zhu P, Zeng J. Observer-based control for nonlinear parameter-varying systems: A sum-of-squares approach. ISA TRANSACTIONS 2021; 111:121-131. [PMID: 33220944 DOI: 10.1016/j.isatra.2020.11.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Revised: 10/21/2020] [Accepted: 11/06/2020] [Indexed: 06/11/2023]
Abstract
This paper investigates the design problem of nonlinear and time-varying observer-based controllers for nonlinear parameter-varying systems without/with input constraints. With the aid of Lyapunov stability theory, the state-and-parameter-dependent linear matrix inequality conditions are obtained. These conditions are developed as convex programming problems. And a feasible solution can be obtained via sum-of-squares techniques. Thus, the commonly used backstepping/iterative methods are avoided. In addition, the effect of the bilinear product forms for the controller gain matrix and the Lyapunov functional are eliminated. A remarkable advantage of this proposed approach is that the state-and-parameter-dependent observer and the state-feedback controller can be designed independently, which significantly reduces the computational complexity. Finally, the feasibility and validity of the proposed method can be illustrated by simulation results.
Collapse
Affiliation(s)
- Pingfang Zhu
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China.
| | - Jianping Zeng
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China.
| |
Collapse
|
15
|
Ballesteros M, Chairez I, Poznyak A. Robust optimal feedback control design for uncertain systems based on artificial neural network approximation of the Bellman’s value function. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.06.085] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
16
|
Huang M, Liu C, He X, Ma L, Lu Z, Su H. Reinforcement Learning-Based Control for Nonlinear Discrete-Time Systems with Unknown Control Directions and Control Constraints. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.03.061] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
17
|
Zhou Y, Wang H, Li L, Lian J. Bench calibration method for automotive electric motors based on deep reinforcement learning. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-191567] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Yafu Zhou
- School of Automotive Engineering, Faculty of Vehicle Engineering and Mechanics, State Key Laboratory of Structural Analysis for Industrial Equipment, Dalian University of Technology, Liaoning Province, China
| | - Hantao Wang
- School of Automotive Engineering, Faculty of Vehicle Engineering and Mechanics, State Key Laboratory of Structural Analysis for Industrial Equipment, Dalian University of Technology, Liaoning Province, China
| | - Linhui Li
- School of Automotive Engineering, Faculty of Vehicle Engineering and Mechanics, State Key Laboratory of Structural Analysis for Industrial Equipment, Dalian University of Technology, Liaoning Province, China
| | - Jing Lian
- School of Automotive Engineering, Faculty of Vehicle Engineering and Mechanics, State Key Laboratory of Structural Analysis for Industrial Equipment, Dalian University of Technology, Liaoning Province, China
| |
Collapse
|
18
|
Fuzzy Reinforcement Learning and Curriculum Transfer Learning for Micromanagement in Multi-Robot Confrontation. INFORMATION 2019. [DOI: 10.3390/info10110341] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Multi-Robot Confrontation on physics-based simulators is a complex and time-consuming task, but simulators are required to evaluate the performance of the advanced algorithms. Recently, a few advanced algorithms have been able to produce considerably complex levels in the context of the robot confrontation system when the agents are facing multiple opponents. Meanwhile, the current confrontation decision-making system suffers from difficulties in optimization and generalization. In this paper, a fuzzy reinforcement learning (RL) and the curriculum transfer learning are applied to the micromanagement for robot confrontation system. Firstly, an improved Qlearning in the semi-Markov decision-making process is designed to train the agent and an efficient RL model is defined to avoid the curse of dimensionality. Secondly, a multi-agent RL algorithm with parameter sharing is proposed to train the agents. We use a neural network with adaptive momentum acceleration as a function approximator to estimate the state-action function. Then, a method of fuzzy logic is used to regulate the learning rate of RL. Thirdly, a curriculum transfer learning method is used to extend the RL model to more difficult scenarios, which ensures the generalization of the decision-making system. The experimental results show that the proposed method is effective.
Collapse
|
19
|
Banerjee S, Chatterjee A. ALERA. ACM T INTEL SYST TEC 2019. [DOI: 10.1145/3338123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
The successful deployment of autonomous real-time systems is contingent on their ability to recover from performance degradation of sensors, actuators, and other electro-mechanical subsystems with low latency. In this article, we introduce ALERA, a novel framework for real-time control law adaptation in nonlinear control systems assisted by system state encodings that generate an error signal when the code properties are violated in the presence of failures. The fundamental contributions of this methodology are twofold—first, we show that the time-domain error signal contains perturbed system parameters’ diagnostic information that can be used for quick control law adaptation to failure conditions and second, this quick adaptation is performed via reinforcement learning algorithms that relearn the control law of the perturbed system from a starting condition dictated by the diagnostic information, thus achieving significantly faster recovery. The fast (up to 80X faster than traditional reinforcement learning paradigms) performance recovery enabled by ALERA is demonstrated on an inverted pendulum balancing problem, a brake-by-wire system, and a self-balancing robot.
Collapse
|
20
|
Rizvi SAA, Lin Z. Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:1523-1536. [PMID: 30296242 DOI: 10.1109/tnnls.2018.2870075] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Approximate dynamic programming (ADP) and reinforcement learning (RL) have emerged as important tools in the design of optimal and adaptive control systems. Most of the existing RL and ADP methods make use of full-state feedback, a requirement that is often difficult to satisfy in practical applications. As a result, output feedback methods are more desirable as they relax this requirement. In this paper, we present a new output feedback-based Q-learning approach to solving the linear quadratic regulation (LQR) control problem for discrete-time systems. The proposed scheme is completely online in nature and works without requiring the system dynamics information. More specifically, a new representation of the LQR Q-function is developed in terms of the input-output data. Based on this new Q-function representation, output feedback LQR controllers are designed. We present two output feedback iterative Q-learning algorithms based on the policy iteration and the value iteration methods. This scheme has the advantage that it does not incur any excitation noise bias, and therefore, the need of using discounted cost functions is circumvented, which in turn ensures closed-loop stability. It is shown that the proposed algorithms converge to the solution of the LQR Riccati equation. A comprehensive simulation study is carried out, which illustrates the proposed scheme.
Collapse
|
21
|
Rizvi SAA, Lin Z. Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback. IEEE TRANSACTIONS ON CYBERNETICS 2019; 50:4670-4679. [PMID: 30605117 DOI: 10.1109/tcyb.2018.2886735] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.
Collapse
|
22
|
Training a robust reinforcement learning controller for the uncertain system based on policy gradient method. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.08.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
23
|
Li X, Xue L, Sun C. Linear quadratic tracking control of unknown discrete-time systems using value iteration algorithm. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.05.111] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
24
|
Narayanan V, Jagannathan S. Event-Triggered Distributed Control of Nonlinear Interconnected Systems Using Online Reinforcement Learning With Exploration. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:2510-2519. [PMID: 28885167 DOI: 10.1109/tcyb.2017.2741342] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, a distributed control scheme for an interconnected system composed of uncertain input affine nonlinear subsystems with event triggered state feedback is presented by using a novel hybrid learning scheme-based approximate dynamic programming with online exploration. First, an approximate solution to the Hamilton-Jacobi-Bellman equation is generated with event sampled neural network (NN) approximation and subsequently, a near optimal control policy for each subsystem is derived. Artificial NNs are utilized as function approximators to develop a suite of identifiers and learn the dynamics of each subsystem. The NN weight tuning rules for the identifier and event-triggering condition are derived using Lyapunov stability theory. Taking into account, the effects of NN approximation of system dynamics and boot-strapping, a novel NN weight update is presented to approximate the optimal value function. Finally, a novel strategy to incorporate exploration in online control framework, using identifiers, is introduced to reduce the overall cost at the expense of additional computations during the initial online learning phase. System states and the NN weight estimation errors are regulated and local uniformly ultimately bounded results are achieved. The analytical results are substantiated using simulation studies.
Collapse
|
25
|
Qi Q, Zhang H. Output Feedback Control and Stabilization for Multiplicative Noise Systems With Intermittent Observations. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:2128-2138. [PMID: 28767382 DOI: 10.1109/tcyb.2017.2728078] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper mainly focuses on the optimal output feedback control and stabilization problems for discrete-time multiplicative noise system with intermittent observations. The main contributions of this paper can be concluded as follows. First, different from the previous literatures, this paper overcomes the barrier of the celebrated separation principle for stochastic control problems of multiplicative noise systems. Based on the measurement process, the optimal estimation is presented, and by using dynamic programming principle, the optimal output feedback controller is designed with feedback gain based on the given coupled Riccati equations. Second, the necessary and sufficient stabilization conditions for multiplicative noise system with intermittent observation in the mean square sense are developed for the first time. Finally, the novel results developed in this paper can be applied to solve the output feedback control and stabilization problems for general networked control system of user datagram protocol network case. The range of packet losses rate and the allowable maximum packet losses rate are presented explicitly.
Collapse
|
26
|
Wang Z, Liu L, Wu Y, Zhang H. Optimal Fault-Tolerant Control for Discrete-Time Nonlinear Strict-Feedback Systems Based on Adaptive Critic Design. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2179-2191. [PMID: 29771670 DOI: 10.1109/tnnls.2018.2810138] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper investigates the problem of optimal fault-tolerant control (FTC) for a class of unknown nonlinear discrete-time systems with actuator fault in the framework of adaptive critic design (ACD). A pivotal highlight is the adaptive auxiliary signal of the actuator fault, which is designed to offset the effect of the fault. The considered systems are in strict-feedback forms and involve unknown nonlinear functions, which will result in the causal problem. To solve this problem, the original nonlinear systems are transformed into a novel system by employing the diffeomorphism theory. Besides, the action neural networks (ANNs) are utilized to approximate a predefined unknown function in the backstepping design procedure. Combined the strategic utility function and the ACD technique, a reinforcement learning algorithm is proposed to set up an optimal FTC, in which the critic neural networks (CNNs) provide an approximate structure of the cost function. In this case, it not only guarantees the stability of the systems, but also achieves the optimal control performance as well. In the end, two simulation examples are used to show the effectiveness of the proposed optimal FTC strategy.
Collapse
|
27
|
Yang Y, Modares H, Wunsch DC, Yin Y. Leader-Follower Output Synchronization of Linear Heterogeneous Systems With Active Leader Using Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2139-2153. [PMID: 29771667 DOI: 10.1109/tnnls.2018.2803059] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper develops optimal control protocols for the distributed output synchronization problem of leader-follower multiagent systems with an active leader. Agents are assumed to be heterogeneous with different dynamics and dimensions. The desired trajectory is assumed to be preplanned and is generated by the leader. Other follower agents autonomously synchronize to the leader by interacting with each other using a communication network. The leader is assumed to be active in the sense that it has a nonzero control input so that it can act independently and update its control to keep the followers away from possible danger. A distributed observer is first designed to estimate the leader's state and generate the reference signal for each follower. Then, the output synchronization of leader-follower systems with an active leader is formulated as a distributed optimal tracking problem, and inhomogeneous algebraic Riccati equations (AREs) are derived to solve it. The resulting distributed optimal control protocols not only minimize the steady-state error but also optimize the transient response of the agents. An off-policy reinforcement learning algorithm is developed to solve the inhomogeneous AREs online in real time and without requiring any knowledge of the agents' dynamics. Finally, two simulation examples are conducted to illustrate the effectiveness of the proposed algorithm.
Collapse
|
28
|
Yang X, He H. Adaptive critic designs for optimal control of uncertain nonlinear systems with unmatched interconnections. Neural Netw 2018; 105:142-153. [PMID: 29843095 DOI: 10.1016/j.neunet.2018.05.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Revised: 04/13/2018] [Accepted: 05/04/2018] [Indexed: 10/16/2022]
Abstract
In this paper, we develop a novel optimal control strategy for a class of uncertain nonlinear systems with unmatched interconnections. To begin with, we present a stabilizing feedback controller for the interconnected nonlinear systems by modifying an array of optimal control laws of auxiliary subsystems. We also prove that this feedback controller ensures a specified cost function to achieve optimality. Then, under the framework of adaptive critic designs, we use critic networks to solve the Hamilton-Jacobi-Bellman equations associated with auxiliary subsystem optimal control laws. The critic network weights are tuned through the gradient descent method combined with an additional stabilizing term. By using the newly established weight tuning rules, we no longer need the initial admissible control condition. In addition, we demonstrate that all signals in the closed-loop auxiliary subsystems are stable in the sense of uniform ultimate boundedness by using classic Lyapunov techniques. Finally, we provide an interconnected nonlinear plant to validate the present control scheme.
Collapse
Affiliation(s)
- Xiong Yang
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China; Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881, USA.
| | - Haibo He
- Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881, USA.
| |
Collapse
|