1
|
Wang W, Li Y. Distributed Fuzzy Optimal Consensus Control of State-Constrained Nonlinear Strict-Feedback Systems. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:2914-2929. [PMID: 35077380 DOI: 10.1109/tcyb.2021.3140104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This article investigates the distributed fuzzy optimal consensus control problem for state-constrained nonlinear strict-feedback systems under an identifier-actor-critic architecture. First, a fuzzy identifier is designed to approximate each agent's unknown nonlinear dynamics. Then, by defining multiple barrier-type local optimal performance indexes for each agent, the optimal virtual and actual control laws are obtained, where two fuzzy-logic systems working as the actor network and critic network are used to execute control behavior and evaluate control performance, respectively. It is proved that the proposed control protocol can drive all agents to reach consensus without violating state constraints, and make the local performance indexes reach the Nash equilibrium simultaneously. Simulation studies are given to verify the effectiveness of the developed fuzzy optimal consensus control approach.
Collapse
|
2
|
Wang Z, Wang X, Pang N. Dynamic event-triggered controller design for nonlinear systems: Reinforcement learning strategy. Neural Netw 2023; 163:341-353. [PMID: 37099897 DOI: 10.1016/j.neunet.2023.04.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/21/2023] [Accepted: 04/10/2023] [Indexed: 04/28/2023]
Abstract
The current investigation aims at the optimal control problem for discrete-time nonstrict-feedback nonlinear systems by invoking the reinforcement learning-based backstepping technique and neural networks. The dynamic-event-triggered control strategy introduced in this paper can alleviate the communication frequency between the actuator and controller. Based on the reinforcement learning strategy, actor-critic neural networks are employed to implement the n-order backstepping framework. Then, a neural network weight-updated algorithm is developed to minimize the computational burden and avoid the local optimal problem. Furthermore, a novel dynamic-event-triggered strategy is introduced, which can remarkably outperform the previously studied static-event-triggered strategy. Moreover, combined with the Lyapunov stability theory, all signals in the closed-loop system are strictly proven to be semiglobal uniformly ultimately bounded. Finally, the practicality of the offered control algorithms is further elucidated by the numerical simulation examples.
Collapse
Affiliation(s)
- Zichen Wang
- College of Westa, Southwest University, Chongqing, 400715, China
| | - Xin Wang
- College of Electronic and Information Engineering, Southwest University, Chongqing, 400715, China.
| | - Ning Pang
- College of Westa, Southwest University, Chongqing, 400715, China
| |
Collapse
|
3
|
Wu Y, Niu W, Kong L, Yu X, He W. Fixed-time neural network control of a robotic manipulator with input deadzone. ISA TRANSACTIONS 2023; 135:449-461. [PMID: 36272839 DOI: 10.1016/j.isatra.2022.09.030] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 09/16/2022] [Accepted: 09/17/2022] [Indexed: 06/16/2023]
Abstract
In this paper, a fixed-time control method is proposed for an uncertain robotic system with actuator saturation and constraints that occur a period of time after the system operation. A model-based control and a neural network-based learning approach are proposed under the framework of fixed-time convergence, respectively. We use neural networks to handle the uncertainty, and design an adaptive law driven by approximation errors to compensate the input deadzone. In addition, a new structure of stabilizing function combining with an error shifting function is introduced to demonstrate the robotic system stability and the boundedness of all error signals. It is proved that all the tracking errors converge into the compact sets near zero in fixed-time according to the Lyapunov stability theory. Simulations on a two-joint robot manipulator and experiments on a six-joint robot manipulator verified the effectiveness of the proposed fixed-time control algorithm.
Collapse
Affiliation(s)
- Yifan Wu
- School of Intelligence Science and Technology, University of Science & Technology Beijing, Beijing 100083, China; Institute of Artificial Intelligence, University of Science & Technology Beijing, Beijing 100083, China
| | - Wenkai Niu
- School of Intelligence Science and Technology, University of Science & Technology Beijing, Beijing 100083, China; Institute of Artificial Intelligence, University of Science & Technology Beijing, Beijing 100083, China
| | - Linghuan Kong
- School of Intelligence Science and Technology, University of Science & Technology Beijing, Beijing 100083, China; Institute of Artificial Intelligence, University of Science & Technology Beijing, Beijing 100083, China
| | - Xinbo Yu
- Institute of Artificial Intelligence, University of Science & Technology Beijing, Beijing 100083, China
| | - Wei He
- School of Intelligence Science and Technology, University of Science & Technology Beijing, Beijing 100083, China; Institute of Artificial Intelligence, University of Science & Technology Beijing, Beijing 100083, China.
| |
Collapse
|
4
|
Wang Z, Wang X. Fault-tolerant control for nonlinear systems with a dead zone: Reinforcement learning approach. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:6334-6357. [PMID: 37161110 DOI: 10.3934/mbe.2023274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
This paper focuses on the adaptive reinforcement learning-based optimal control problem for standard nonstrict-feedback nonlinear systems with the actuator fault and an unknown dead zone. To simultaneously reduce the computational complexity and eliminate the local optimal problem, a novel neural network weight updated algorithm is presented to replace the classic gradient descent method. By utilizing the backstepping technique, the actor critic-based reinforcement learning control strategy is developed for high-order nonlinear nonstrict-feedback systems. In addition, two auxiliary parameters are presented to deal with the input dead zone and actuator fault respectively. All signals in the system are proven to be semi-globally uniformly ultimately bounded by Lyapunov theory analysis. At the end of the paper, some simulation results are shown to illustrate the remarkable effect of the proposed approach.
Collapse
Affiliation(s)
- Zichen Wang
- College of Westa, Southwest University, Chongqing 400715, China
| | - Xin Wang
- College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China
| |
Collapse
|
5
|
Li H, Wu Y, Chen M, Lu R. Adaptive Multigradient Recursive Reinforcement Learning Event-Triggered Tracking Control for Multiagent Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:144-156. [PMID: 34197328 DOI: 10.1109/tnnls.2021.3090570] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article proposes a fault-tolerant adaptive multigradient recursive reinforcement learning (RL) event-triggered tracking control scheme for strict-feedback discrete-time multiagent systems. The multigradient recursive RL algorithm is used to avoid the local optimal problem that may exist in the gradient descent scheme. Different from the existing event-triggered control results, a new lemma about the relative threshold event-triggered control strategy is proposed to handle the compensation error, which can improve the utilization of communication resources and weaken the negative impact on tracking accuracy and closed-loop system stability. To overcome the difficulty caused by sensor fault, a distributed control method is introduced by adopting the adaptive compensation technique, which can effectively decrease the number of online estimation parameters. Furthermore, by using the multigradient recursive RL algorithm with less learning parameters, the online estimation time can be effectively reduced. The stability of closed-loop multiagent systems is proved by using the Lyapunov stability theorem, and it is verified that all signals are semiglobally uniformly ultimately bounded. Finally, two simulation examples are given to show the availability of the presented control scheme.
Collapse
|
6
|
Xian B, Zhang X, Zhang H, Gu X. Robust Adaptive Control for a Small Unmanned Helicopter Using Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:7589-7597. [PMID: 34125690 DOI: 10.1109/tnnls.2021.3085767] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article presents a novel adaptive controller for a small-size unmanned helicopter using the reinforcement learning (RL) control methodology. The helicopter is subject to system uncertainties and unknown external disturbances. The dynamic unmodeling uncertainties of the system are estimated online by the actor network, and the tracking performance function is optimized via the critic network. The estimation error of the actor-critic network and the external unknown disturbances are compensated via the nonlinear robust component based on the sliding mode control method. The stability of the closed-loop system and the asymptotic convergence of the attitude tracking error are proved via the Lyapunov-based stability analysis. Finally, real-time experiments are performed on a helicopter control testbed. The experimental results show that the proposed controller achieves good control performance.
Collapse
|
7
|
Reinforcement learning for industrial process control: A case study in flatness control in steel industry. COMPUT IND 2022. [DOI: 10.1016/j.compind.2022.103748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
8
|
Yang X, Zhu Y, Dong N, Wei Q. Decentralized Event-Driven Constrained Control Using Adaptive Critic Designs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5830-5844. [PMID: 33861716 DOI: 10.1109/tnnls.2021.3071548] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We study the decentralized event-driven control problem of nonlinear dynamical systems with mismatched interconnections and asymmetric input constraints. To begin with, by introducing a discounted cost function for each auxiliary subsystem, we transform the decentralized event-driven constrained control problem into a group of nonlinear H2 -constrained optimal control problems. Then, we develop the event-driven Hamilton-Jacobi-Bellman equations (ED-HJBEs), which arise in the nonlinear H2 -constrained optimal control problems. Meanwhile, we demonstrate that all the solutions of the ED-HJBEs together keep the overall system stable in the sense of uniform ultimate boundedness (UUB). To solve the ED-HJBEs, we build a critic-only architecture under the framework of adaptive critic designs. The architecture only employs critic neural networks and updates their weight vectors via the gradient descent method. After that, based on the Lyapunov approach, we prove that the UUB stability of all signals in the closed-loop auxiliary subsystems is assured. Finally, simulations of an illustrated nonlinear interconnected plant are provided to validate the present designs.
Collapse
|
9
|
Zhang Y, Li S, Weng J. Learning and Near-Optimal Control of Underactuated Surface Vessels With Periodic Disturbances. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:7453-7463. [PMID: 33400666 DOI: 10.1109/tcyb.2020.3041368] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, we propose a novel learning and near-optimal control approach for underactuated surface (USV) vessels with unknown mismatched periodic external disturbances and unknown hydrodynamic parameters. Given a prior knowledge of the periods of the disturbances, an analytical near-optimal control law is derived through the approximation of the integral-type quadratic performance index with respect to the tracking error, where the equivalent unknown parameters are generated online by an auxiliary system that can learn the dynamics of the controlled system. It is proved that the state differences between the auxiliary system and the corresponding controlled USV vessel are globally asymptotically convergent to zero. Besides, the approach theoretically guarantees asymptotic optimality of the performance index. The efficacy of the method is demonstrated via simulations based on the real parameters of an USV vessel.
Collapse
|
10
|
Yuan L, Li T, Tong S, Xiao Y, Gao X. NN adaptive optimal tracking control for a class of uncertain nonstrict feedback nonlinear systems. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.03.049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
11
|
Chen Q, Jin Y, Song Y. Fault-tolerant adaptive tracking control of Euler-Lagrange systems – An echo state network approach driven by reinforcement learning. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.10.083] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
12
|
Li XJ, Wang N. Data-Driven Fault Estimation and Control for Unknown Discrete-Time Systems via Multiobjective Optimization Method. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:3289-3301. [PMID: 32784145 DOI: 10.1109/tcyb.2020.3010222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article investigates the fault estimation and control problem for linear discrete-time systems with completely unknown system dynamics. The considered problem is formulated into a multiobjective composite optimization one and a data-driven H∞/H∞ controller is then designed to ensure the fault estimation and control performances. Different from the existing multiobjective optimization strategies, where only one system performance can be optimized, a two-step design method is introduced in this article to optimize different system performances. Especially, each step design contains a novel constraint-type optimization algorithm, and the matrix inequality involved in the constraint condition has no structure restriction. In addition, by applying policy iterations (PIs) and Q -learning techniques, the controller parameters are obtained by solving a set of linear matrix inequalities (LMIs) only relying on the system states and inputs. Finally, the effectiveness of the proposed approach is illustrated through three examples.
Collapse
|
13
|
Fu H, Chen X, Wang W, Wu M. Observer-Based Adaptive Synchronization Control of Unknown Discrete-Time Nonlinear Heterogeneous Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:681-693. [PMID: 33079683 DOI: 10.1109/tnnls.2020.3028569] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article is concerned with the optimal synchronization problem for discrete-time nonlinear heterogeneous multiagent systems (MASs) with an active leader. To overcome the difficulty in the derivation of the optimal control protocols for these systems, we develop an observer-based adaptive synchronization control approach, including the designs of a distributed observer and a distributed model reference adaptive controller with no prior knowledge of all agents' dynamics. To begin with, for the purpose of estimating the state of a nonlinear active leader for each follower, an adaptive neural network distributed observer is designed. Such an observer serves as a reference model in the distributed model reference adaptive control (MRAC). Then, a reinforcement learning-based distributed MRAC algorithm is presented to make every follower track its corresponding reference model on behavior in real time. In this algorithm, a distributed actor-critic network is employed to approximate the optimal distributed control protocols and the cost function. Through convergence analysis, the overall observer estimation error, the model reference tracking error, and the weight estimation errors are proved to be uniformly ultimately bounded. The developed approach further achieves the synchronization by means of synthesizing these results. The effectiveness of the developed approach is verified through a numerical example.
Collapse
|
14
|
Naeem M, De Pietro G, Coronato A. Application of Reinforcement Learning and Deep Learning in Multiple-Input and Multiple-Output (MIMO) Systems. SENSORS 2021; 22:s22010309. [PMID: 35009848 PMCID: PMC8749942 DOI: 10.3390/s22010309] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 12/23/2021] [Accepted: 12/24/2021] [Indexed: 01/02/2023]
Abstract
The current wireless communication infrastructure has to face exponential development in mobile traffic size, which demands high data rate, reliability, and low latency. MIMO systems and their variants (i.e., Multi-User MIMO and Massive MIMO) are the most promising 5G wireless communication systems technology due to their high system throughput and data rate. However, the most significant challenges in MIMO communication are substantial problems in exploiting the multiple-antenna and computational complexity. The recent success of RL and DL introduces novel and powerful tools that mitigate issues in MIMO communication systems. This article focuses on RL and DL techniques for MIMO systems by presenting a comprehensive review on the integration between the two areas. We first briefly provide the necessary background to RL, DL, and MIMO. Second, potential RL and DL applications for different MIMO issues, such as detection, classification, and compression; channel estimation; positioning, sensing, and localization; CSI acquisition and feedback, security, and robustness; mmWave communication and resource allocation, are presented.
Collapse
|
15
|
Single‐network ADP for solving optimal event‐triggered tracking control problem of completely unknown nonlinear systems. INT J INTELL SYST 2021. [DOI: 10.1002/int.22491] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
16
|
Liu Q, Li T, Shan Q, Yu R, Gao X. Virtual guide automatic berthing control of marine ships based on heuristic dynamic programming iteration method. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.01.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
17
|
Li H, Wu Y, Chen M. Adaptive Fault-Tolerant Tracking Control for Discrete-Time Multiagent Systems via Reinforcement Learning Algorithm. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1163-1174. [PMID: 32386171 DOI: 10.1109/tcyb.2020.2982168] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article investigates the adaptive fault-tolerant tracking control problem for a class of discrete-time multiagent systems via a reinforcement learning algorithm. The action neural networks (NNs) are used to approximate unknown and desired control input signals, and the critic NNs are employed to estimate the cost function in the design procedure. Furthermore, the direct adaptive optimal controllers are designed by combining the backstepping technique with the reinforcement learning algorithm. Comparing the existing reinforcement learning algorithm, the computational burden can be effectively reduced by using the method of less learning parameters. The adaptive auxiliary signals are established to compensate for the influence of the dead zones and actuator faults on the control performance. Based on the Lyapunov stability theory, it is proved that all signals of the closed-loop system are semiglobally uniformly ultimately bounded. Finally, some simulation results are presented to illustrate the effectiveness of the proposed approach.
Collapse
|
18
|
Long T, Li E, Hu Y, Yang L, Fan J, Liang Z, Guo R. A Vibration Control Method for Hybrid-Structured Flexible Manipulator Based on Sliding Mode Control and Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:841-852. [PMID: 32275619 DOI: 10.1109/tnnls.2020.2979600] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The hybrid-structured flexible manipulator has a complex structure and strong coupling between state variables. Meanwhile, the natural frequency of the hybrid-structured flexible manipulator varies with the motion of the telescopic joint, so it is difficult to suppress the vibration quickly. In this article, the tip state signal of the hybrid-structured flexible manipulator is decomposed into elastic vibration signal and tip vibration equilibrium position signal, and a combined control method is proposed to improve tip positioning accuracy and trajectory tracking accuracy. In the proposed combined control method, an improved nominal model-based sliding mode controller (NMBSMC) is used as the main controller to output the driving torque, and an actor-critic-based reinforcement learning controller (ACBRLC) is used as an auxiliary controller to output small compensation torque. The improved NMBSMC can be divided into a nominal model-based sliding mode robust controller and a practical model-based integral sliding mode controller. Two sliding mode controllers with different structures make full use of the mathematical model and the measured data of the actual system to improve the vibration equilibrium position tracking accuracy. The ACBRLC uses the tip elastic vibration signal and the prioritized experience replay method to obtain the small reverse compensation torque, which is superimposed with the output of the NMBSMC to suppress tip vibration and improve the positioning accuracy of the hybrid-structured flexible manipulator. Finally, several groups of experiments are designed to verify the effectiveness and robustness of the proposed combined control method.
Collapse
|
19
|
Bai W, Li T, Tong S. NN Reinforcement Learning Adaptive Control for a Class of Nonstrict-Feedback Discrete-Time Systems. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4573-4584. [PMID: 31995515 DOI: 10.1109/tcyb.2020.2963849] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article investigates an adaptive reinforcement learning (RL) optimal control design problem for a class of nonstrict-feedback discrete-time systems. Based on the neural network (NN) approximating ability and RL control design technique, an adaptive backstepping RL optimal controller and a minimal learning parameter (MLP) adaptive RL optimal controller are developed by establishing a novel strategic utility function and introducing external function terms. It is proved that the proposed adaptive RL optimal controllers can guarantee that all signals in the closed-loop systems are semiglobal uniformly ultimately bounded (SGUUB). The main feature is that the proposed schemes can solve the optimal control problem that the previous literature cannot deal with. Furthermore, the proposed MPL adaptive optimal control scheme can reduce the number of adaptive laws, and thus the computational complexity is decreased. Finally, the simulation results illustrate the validity of the proposed optimal control schemes.
Collapse
|
20
|
Guo X, Yan W, Cui R. Reinforcement Learning-Based Nearly Optimal Control for Constrained-Input Partially Unknown Systems Using Differentiator. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4713-4725. [PMID: 31880567 DOI: 10.1109/tnnls.2019.2957287] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, a synchronous reinforcement-learning-based algorithm is developed for input-constrained partially unknown systems. The proposed control also alleviates the need for an initial stabilizing control. A first-order robust exact differentiator is employed to approximate unknown drift dynamics. Critic, actor, and disturbance neural networks (NNs) are established to approximate the value function, the control policy, and the disturbance policy, respectively. The Hamilton-Jacobi-Isaacs equation is solved by applying the value function approximation technique. The stability of the closed-loop system can be ensured. The state and weight errors of the three NNs are all uniformly ultimately bounded. Finally, the simulation results are provided to verify the effectiveness of the proposed method.
Collapse
|
21
|
Neural networks-based optimal tracking control for nonzero-sum games of multi-player continuous-time nonlinear systems via reinforcement learning. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.06.083] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
22
|
Yu J, Shi P, Lin C, Yu H. Adaptive Neural Command Filtering Control for Nonlinear MIMO Systems With Saturation Input and Unknown Control Direction. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:2536-2545. [PMID: 30872252 DOI: 10.1109/tcyb.2019.2901250] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, the tracking control problem is considered for a class of multiple-input multiple-output (MIMO) nonlinear systems with input saturation and unknown direction control gains. A command filtered adaptive neural networks (NNs) control method is presented with regard to the MIMO systems by designing the virtual controllers and error compensation signals. First, the command filtering is used to solve the "explosion of complexity" problem in the conventional backstepping design and the nonlinearities are approximated by NNs. Then, the error compensation signals are developed to conquer the shortcoming of the dynamic surface method. In addition, the Nussbaum-type functions are utilized to cope with the unknown direction control gains. The effectiveness of the proposed new design scheme is illustrated by simulation examples.
Collapse
|
23
|
A Confrontation Decision-Making Method with Deep Reinforcement Learning and Knowledge Transfer for Multi-Agent System. Symmetry (Basel) 2020. [DOI: 10.3390/sym12040631] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
In this paper, deep reinforcement learning (DRL) and knowledge transfer are used to achieve the effective control of the learning agent for the confrontation in the multi-agent systems. Firstly, a multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm with parameter sharing is proposed to achieve confrontation decision-making of multi-agent. In the process of training, the information of other agents is introduced to the critic network to improve the strategy of confrontation. The parameter sharing mechanism can reduce the loss of experience storage. In the DDPG algorithm, we use four neural networks to generate real-time action and Q-value function respectively and use a momentum mechanism to optimize the training process to accelerate the convergence rate for the neural network. Secondly, this paper introduces an auxiliary controller using a policy-based reinforcement learning (RL) method to achieve the assistant decision-making for the game agent. In addition, an effective reward function is used to help agents balance losses of enemies and our side. Furthermore, this paper also uses the knowledge transfer method to extend the learning model to more complex scenes and improve the generalization of the proposed confrontation model. Two confrontation decision-making experiments are designed to verify the effectiveness of the proposed method. In a small-scale task scenario, the trained agent can successfully learn to fight with the competitors and achieve a good winning rate. For large-scale confrontation scenarios, the knowledge transfer method can gradually improve the decision-making level of the learning agent.
Collapse
|
24
|
Yang T, Sun N, Chen H, Fang Y. Neural Network-Based Adaptive Antiswing Control of an Underactuated Ship-Mounted Crane With Roll Motions and Input Dead Zones. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:901-914. [PMID: 31059458 DOI: 10.1109/tnnls.2019.2910580] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
As a type of indispensable oceanic transportation tools, ship-mounted crane systems are widely employed to transport cargoes and containers on vessels due to their extraordinary flexibility. However, various working requirements and the oceanic environment may cause some uncertain and unfavorable factors for ship-mounted crane control. In particular, to accomplish different control tasks, some plant parameters (e.g., boom lengths, payload masses, and so on) frequently change; hence, most existing model-based controllers cannot ensure satisfactory control performance any longer. For example, inaccurate gravity compensation may result in positioning errors. Additionally, due to ship roll motions caused by sea waves, residual payload swing generally exists, which may result in safety risks in practice. To solve the above-mentioned issues, this paper designs a neural network-based adaptive control method that can provide effective control for both actuated and unactuated state variables based on the original nonlinear ship-mounted crane dynamics without any linearizing operations. In particular, the proposed update law availably compensates parameter/structure uncertainties for ship-mounted crane systems. Based on a 2-D sliding surface, the boom and rope can arrive at their preset positions in finite time, and the payload swing can be completely suppressed. Furthermore, the problem of nonlinear input dead zones is also taken into account. The stability of the equilibrium point of all state variables in ship-mounted crane systems is theoretically proven by a rigorous Lyapunov-based analysis. The hardware experimental results verify the practicability and robustness of the presented control approach.
Collapse
|
25
|
Shao S, Chen M, Zhang Y. Adaptive Discrete-Time Flight Control Using Disturbance Observer and Neural Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3708-3721. [PMID: 30763247 DOI: 10.1109/tnnls.2019.2893643] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This paper studies the adaptive neural control (ANC)-based tracking problem for discrete-time nonlinear dynamics of an unmanned aerial vehicle subject to system uncertainties, bounded time-varying disturbances, and input saturation by using a discrete-time disturbance observer (DTDO). Based on the approximation approach of neural network, system uncertainties are tackled approximately. To restrain the negative effects of bounded disturbances, a nonlinear DTDO is designed. Then, a backstepping technique-based ANC strategy is proposed by utilizing a constructed auxiliary system and a discrete-time tracking differentiator. The boundness of all signals is proven in the closed-loop system under the discrete-time Lyapunov analysis. Finally, the feasibility of the proposed ANC technique is further specified based on numerical simulation results.
Collapse
|
26
|
Abstract
The optimal tracking problem is addressed in the robotics literature by using a variety of robust and adaptive control approaches. However, these schemes are associated with implementation limitations such as applicability in uncertain dynamical environments with complete or partial model-based control structures, complexity and integrity in discrete-time environments, and scalability in complex coupled dynamical systems. An online adaptive learning mechanism is developed to tackle the above limitations and provide a generalized solution platform for a class of tracking control problems. This scheme minimizes the tracking errors and optimizes the overall dynamical behavior using simultaneous linear feedback control strategies. Reinforcement learning approaches based on value iteration processes are adopted to solve the underlying Bellman optimality equations. The resulting control strategies are updated in real time in an interactive manner without requiring any information about the dynamics of the underlying systems. Means of adaptive critics are employed to approximate the optimal solving value functions and the associated control strategies in real time. The proposed adaptive tracking mechanism is illustrated in simulation to control a flexible wing aircraft under uncertain aerodynamic learning environment.
Collapse
|
27
|
Zhang S, Zhang D, Chang C, Fu Q, Wang Y. Adaptive neural control of quadruped robots with input deadzone. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.09.032] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
28
|
Liu YJ, Li S, Tong S, Chen CLP. Adaptive Reinforcement Learning Control Based on Neural Approximation for Nonlinear Discrete-Time Systems With Unknown Nonaffine Dead-Zone Input. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:295-305. [PMID: 29994726 DOI: 10.1109/tnnls.2018.2844165] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, an optimal control algorithm is designed for uncertain nonlinear systems in discrete-time, which are in nonaffine form and with unknown dead-zone. The main contributions of this paper are that an optimal control algorithm is for the first time framed in this paper for nonlinear systems with nonaffine dead-zone, and the adaptive parameter law for dead-zone is calculated by using the gradient rules. The mean value theory is employed to deal with the nonaffine dead-zone input and the implicit function theory based on reinforcement learning is appropriately introduced to find an unknown ideal controller which is approximated by using the action network. Other neural networks are taken as the critic networks to approximate the strategic utility functions. Based on the Lyapunov stability analysis theory, we can prove the stability of systems, i.e., the optimal control laws can guarantee that all the signals in the closed-loop system are bounded and the tracking errors are converged to a small compact set. Finally, two simulation examples demonstrate the effectiveness of the design algorithm.
Collapse
|
29
|
Xie K, Chen C, Lewis FL, Xie S. Adaptive Asymptotic Neural Network Control of Nonlinear Systems With Unknown Actuator Quantization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:6303-6312. [PMID: 29994544 DOI: 10.1109/tnnls.2018.2828315] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
In this paper, we propose an adaptive neural-network-based asymptotic control algorithm for a class of nonlinear systems subject to unknown actuator quantization. To this end, we exploit the sector property of the quantization nonlinearity and transform actuator quantization control problem into analyzing its upper bounds, which are then handled by a dynamic loop gain function-based approach. In our adaptive control scheme, there is only one parameter required to be estimated online for updating weights of neural networks. Within the framework of Lyapunov theory, it is shown that the proposed algorithm ensures that all the signals in the closed-loop system are ultimately bounded. Moreover, an asymptotic tracking error is obtained by means of introducing Barbalat's lemma to the proposed adaptive law.
Collapse
|
30
|
Tang L, Liu YJ, Chen CLP. Adaptive Critic Design for Pure-Feedback Discrete-Time MIMO Systems Preceded by Unknown Backlashlike Hysteresis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5681-5690. [PMID: 29993785 DOI: 10.1109/tnnls.2018.2805689] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper concentrates on the adaptive critic design (ACD) issue for a class of uncertain multi-input multioutput (MIMO) nonlinear discrete-time systems preceded by unknown backlashlike hysteresis. The considered systems are in a block-triangular pure-feedback form, in which there exist nonaffine functions and couplings between states and inputs. This makes that the ACD-based optimal control becomes very difficult and complicated. To this end, the mean value theorem is employed to transform the original systems into input-output models. Based on the reinforcement learning algorithm, the optimal control strategy is established with an actor-critic structure. Not only the stability of the systems is ensured but also the performance index is minimized. In contrast to the previous results, the main contributions are: 1) it is the first time to build an ACD framework for such MIMO systems with unknown hysteresis and 2) an adaptive auxiliary signal is developed to compensate the influence of hysteresis. In the end, a numerical study is provided to demonstrate the effectiveness of the present method.
Collapse
|
31
|
Yang Y, Arias G. Identification of hinging hyperplane autoregressive exogenous model using efficient mixed-integer programming. ISA TRANSACTIONS 2018; 81:18-31. [PMID: 30100238 DOI: 10.1016/j.isatra.2018.07.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Revised: 04/13/2018] [Accepted: 07/20/2018] [Indexed: 06/08/2023]
Abstract
A computationally efficient algorithm for hinging hyperplane autoregressive exogenous (HHARX) model identification via mixed-integer programming technique is proposed in this paper. The HHARX model is attractive since it accurately approximates a general nonlinear process as a sum of hinge functions and preserves the continuity even in a piecewise affine form. Traditional mixed-integer programming-based method for HHARX model identification can only be applied on small-scale input/output datasets due to its significant computational demands. The contribution of this paper is to develop a sequential optimization approach to build accurate HHARX model more efficiently on a relatively large number of experimental data. Moreover, the proposed framework can handle more difficult and practical cases in piecewise model identification, such as: limited submodel switching, missing output data and specified steady state. Finally, the efficiency and accuracy of the proposed computational scheme are demonstrated through modeling of two simulated examples and a pilot-scale heat exchanger.
Collapse
Affiliation(s)
- Yu Yang
- Chemical Engineering Department, California State University Long Beach, CA 90840, USA.
| | - Gabriel Arias
- Chemical Engineering Department, California State University Long Beach, CA 90840, USA
| |
Collapse
|
32
|
Pandian BJ, Noel MM. Tracking Control of a Continuous Stirred Tank Reactor Using Direct and Tuned Reinforcement Learning Based Controllers. CHEMICAL PRODUCT AND PROCESS MODELING 2018. [DOI: 10.1515/cppm-2017-0040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractThe need for linear model, of the nonlinear system, while tuning controllers limits the use of classic controllers. Also, the tuning procedure involves complex computations. This is further complicated when it is necessary to operate the nonlinear system under different operating constraints. Continues Stirred Tank Reactor (CSTR) is one of those non-linear systems which is studied extensively in control and chemical engineering due to its highly non-linear characteristics and its diverse operating range. This paper proposes two different control schemes based on reinforcement learning algorithm to achieve both servo as well as regulatory control. One approach is the direct application of Reinforcement Learning (RL) with ANN approximation and another is tuning of PID controller parameters using reinforcement learning. The main objective of this paper is to handle multiple set point control for the CSTR system using RL. The temperature of the CSTR system is controlled here for multiple setpoint changes. A comparative study is also done between the two proposed algorithm and from the test result, it is seen that direct RL approach with approximation performs better than tuning a PID using RL as oscillations and overshoot are less for direct RL approach. Also, the learning time for the direct RL based controller is lesser than the later.
Collapse
|
33
|
Gai K, Qiu M. Optimal resource allocation using reinforcement learning for IoT content-centric services. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.03.056] [Citation(s) in RCA: 153] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
34
|
Wang H, Liu PX, Li S, Wang D. Adaptive Neural Output-Feedback Control for a Class of Nonlower Triangular Nonlinear Systems With Unmodeled Dynamics. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:3658-3668. [PMID: 28866601 DOI: 10.1109/tnnls.2017.2716947] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper presents the development of an adaptive neural controller for a class of nonlinear systems with unmodeled dynamics and immeasurable states. An observer is designed to estimate system states. The structure consistency of virtual control signals and the variable partition technique are combined to overcome the difficulties appearing in a nonlower triangular form. An adaptive neural output-feedback controller is developed based on the backstepping technique and the universal approximation property of the radial basis function (RBF) neural networks. By using the Lyapunov stability analysis, the semiglobally and uniformly ultimate boundedness of all signals within the closed-loop system is guaranteed. The simulation results show that the controlled system converges quickly, and all the signals are bounded. This paper is novel at least in the two aspects: 1) an output-feedback control strategy is developed for a class of nonlower triangular nonlinear systems with unmodeled dynamics and 2) the nonlinear disturbances and their bounds are the functions of all states, which is in a more general form than existing results.
Collapse
|
35
|
Liang Y, Zhang H, Xiao G, Jiang H. Reinforcement learning-based online adaptive controller design for a class of unknown nonlinear discrete-time systems with time delays. Neural Comput Appl 2018. [DOI: 10.1007/s00521-018-3537-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
36
|
Hu Y, Si B. A Reinforcement Learning Neural Network for Robotic Manipulator Control. Neural Comput 2018; 30:1983-2004. [DOI: 10.1162/neco_a_01079] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
We propose a neural network model for reinforcement learning to control a robotic manipulator with unknown parameters and dead zones. The model is composed of three networks. The state of the robotic manipulator is predicted by the state network of the model, the action policy is learned by the action network, and the performance index of the action policy is estimated by a critic network. The three networks work together to optimize the performance index based on the reinforcement learning control scheme. The convergence of the learning methods is analyzed. Application of the proposed model on a simulated two-link robotic manipulator demonstrates the effectiveness and the stability of the model.
Collapse
Affiliation(s)
- Yazhou Hu
- State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, P.R.C., and University of Chinese Academy of Sciences, Beijing 100049, P.R.C
| | - Bailu Si
- State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, Shenyang, P.R.C
| |
Collapse
|
37
|
Luo B, Liu D, Wu HN. Adaptive Constrained Optimal Control Design for Data-Based Nonlinear Discrete-Time Systems With Critic-Only Structure. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2099-2111. [PMID: 28981435 DOI: 10.1109/tnnls.2017.2751018] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Reinforcement learning has proved to be a powerful tool to solve optimal control problems over the past few years. However, the data-based constrained optimal control problem of nonaffine nonlinear discrete-time systems has rarely been studied yet. To solve this problem, an adaptive optimal control approach is developed by using the value iteration-based Q-learning (VIQL) with the critic-only structure. Most of the existing constrained control methods require the use of a certain performance index and only suit for linear or affine nonlinear systems, which is unreasonable in practice. To overcome this problem, the system transformation is first introduced with the general performance index. Then, the constrained optimal control problem is converted to an unconstrained optimal control problem. By introducing the action-state value function, i.e., Q-function, the VIQL algorithm is proposed to learn the optimal Q-function of the data-based unconstrained optimal control problem. The convergence results of the VIQL algorithm are established with an easy-to-realize initial condition . To implement the VIQL algorithm, the critic-only structure is developed, where only one neural network is required to approximate the Q-function. The converged Q-function obtained from the critic-only VIQL method is employed to design the adaptive constrained optimal controller based on the gradient descent scheme. Finally, the effectiveness of the developed adaptive control method is tested on three examples with computer simulation.
Collapse
|
38
|
Robust adaptive neural tracking control for a class of nonlinear systems with unmodeled dynamics using disturbance observer. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.02.082] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
39
|
Mannucci T, van Kampen EJ, de Visser C, Chu Q. Safe Exploration Algorithms for Reinforcement Learning Controllers. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1069-1081. [PMID: 28182560 DOI: 10.1109/tnnls.2017.2654539] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.
Collapse
|
40
|
He W, Dong Y. Adaptive Fuzzy Neural Network Control for a Constrained Robot Using Impedance Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1174-1186. [PMID: 28362618 DOI: 10.1109/tnnls.2017.2665581] [Citation(s) in RCA: 141] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper investigates adaptive fuzzy neural network (NN) control using impedance learning for a constrained robot, subject to unknown system dynamics, the effect of state constraints, and the uncertain compliant environment with which the robot comes into contact. A fuzzy NN learning algorithm is developed to identify the uncertain plant model. The prominent feature of the fuzzy NN is that there is no need to get the prior knowledge about the uncertainty and a sufficient amount of observed data. Also, impedance learning is introduced to tackle the interaction between the robot and its environment, so that the robot follows a desired destination generated by impedance learning. A barrier Lyapunov function is used to address the effect of state constraints. With the proposed control, the stability of the closed-loop system is achieved via Lyapunov's stability theory, and the tracking performance is guaranteed under the condition of state constraints and uncertainty. Some simulation studies are carried out to illustrate the effectiveness of the proposed scheme.
Collapse
|
41
|
Fan QY, Yang GH, Ye D. Quantization-Based Adaptive Actor-Critic Tracking Control With Tracking Error Constraints. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:970-980. [PMID: 28166508 DOI: 10.1109/tnnls.2017.2651104] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
In this paper, the problem of adaptive actor-critic (AC) tracking control is investigated for a class of continuous-time nonlinear systems with unknown nonlinearities and quantized inputs. Different from the existing results based on reinforcement learning, the tracking error constraints are considered and new critic functions are constructed to improve the performance further. To ensure that the tracking errors keep within the predefined time-varying boundaries, a tracking error transformation technique is used to constitute an augmented error system. Specific critic functions, rather than the long-term cost function, are introduced to supervise the tracking performance and tune the weights of the AC neural networks (NNs). A novel adaptive controller with a special structure is designed to reduce the effect of the NN reconstruction errors, input quantization, and disturbances. Based on the Lyapunov stability theory, the boundedness of the closed-loop signals and the desired tracking performance can be guaranteed. Finally, simulations on two connected inverted pendulums are given to illustrate the effectiveness of the proposed method.
Collapse
|
42
|
Adaptive neural network tracking control-based reinforcement learning for wheeled mobile robots with skidding and slipping. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.12.051] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
43
|
Cao Z, Xiao Q, Huang R, Zhou M. Robust Neuro-Optimal Control of Underactuated Snake Robots With Experience Replay. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:208-217. [PMID: 29300697 DOI: 10.1109/tnnls.2017.2768820] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, the problem of path following for underactuated snake robots is investigated by using approximate dynamic programming and neural networks (NNs). The lateral undulatory gait of a snake robot is stabilized in a virtual holonomic constraint manifold through a partial feedback linearizing control law. Based on a dynamic compensator and Line-of-Sight guidance law, the path-following problem is transformed to a regulation problem of a nonlinear system with uncertainties. Subsequently, it is solved by an infinite horizon optimal control scheme using a single critic NN. A novel fluctuating learning algorithm is derived to approximate the associated cost function online and relax the initial stabilizing control requirement. The approximate optimal control input is derived by solving a modified Hamilton-Jacobi-Bellman equation. The conventional persistence of excitation condition is relaxed by using experience replay technique. The proposed control scheme ensures that all states of the snake robot are uniformly ultimate bounded which is analyzed by using the Lyapunov approach, and the tracking error asymptotically converges to a residual set. Simulation results are presented to verify the effectiveness of the proposed method.
Collapse
|
44
|
Skach J, Kiumarsi B, Lewis FL, Straka O. Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:29-40. [PMID: 27831897 DOI: 10.1109/tcyb.2016.2618926] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, motivated by human neurocognitive experiments, a model-free off-policy reinforcement learning algorithm is developed to solve the optimal tracking control of multiple-model linear discrete-time systems. First, an adaptive self-organizing map neural network is used to determine the system behavior from measured data and to assign a responsibility signal to each of system possible behaviors. A new model is added if a sudden change of system behavior is detected from the measured data and the behavior has not been previously detected. A value function is represented by partially weighted value functions. Then, the off-policy iteration algorithm is generalized to multiple-model learning to find a solution without any knowledge about the system dynamics or reference trajectory dynamics. The off-policy approach helps to increase data efficiency and speed of tuning since a stream of experiences obtained from executing a behavior policy is reused to update several value functions corresponding to different learning policies sequentially. Two numerical examples serve as a demonstration of the off-policy algorithm performance.
Collapse
|
45
|
Practical adaptive fuzzy tracking control for a class of perturbed nonlinear systems with backlash nonlinearity. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.08.085] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
46
|
Luo B, Liu D, Wu HN, Wang D, Lewis FL. Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3341-3354. [PMID: 27893404 DOI: 10.1109/tcyb.2016.2623859] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The model-free optimal control problem of general discrete-time nonlinear systems is considered in this paper, and a data-based policy gradient adaptive dynamic programming (PGADP) algorithm is developed to design an adaptive optimal controller method. By using offline and online data rather than the mathematical system model, the PGADP algorithm improves control policy with a gradient descent scheme. The convergence of the PGADP algorithm is proved by demonstrating that the constructed Q -function sequence converges to the optimal Q -function. Based on the PGADP algorithm, the adaptive control method is developed with an actor-critic structure and the method of weighted residuals. Its convergence properties are analyzed, where the approximate Q -function converges to its optimum. Computer simulation results demonstrate the effectiveness of the PGADP-based adaptive control method.
Collapse
|
47
|
Luo B, Liu D, Huang T, Yang X, Ma H. Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.05.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
48
|
Chen CLP. Neural Approximation-Based Adaptive Control for a Class of Nonlinear Nonstrict Feedback Discrete-Time Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1531-1541. [PMID: 28113479 DOI: 10.1109/tnnls.2016.2531089] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, an adaptive control approach-based neural approximation is developed for a class of uncertain nonlinear discrete-time (DT) systems. The main characteristic of the considered systems is that they can be viewed as a class of multi-input multioutput systems in the nonstrict feedback structure. The similar control problem of this class of systems has been addressed in the past, but it focused on the continuous-time systems. Due to the complicacies of the system structure, it will become more difficult for the controller design and the stability analysis. To stabilize this class of systems, a new recursive procedure is developed, and the effect caused by the noncausal problem in the nonstrict feedback DT structure can be solved using a semirecurrent neural approximation. Based on the Lyapunov difference approach, it is proved that all the signals of the closed-loop system are semiglobal, ultimately uniformly bounded, and a good tracking performance can be guaranteed. The feasibility of the proposed controllers can be validated by setting a simulation example.
Collapse
|
49
|
Wang T, Sui S, Tong S. Data-based adaptive neural network optimal output feedback control for nonlinear systems with actuator saturation. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.03.053] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
50
|
Song R, Wei Q, Song B. Neural-network-based synchronous iteration learning method for multi-player zero-sum games. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.02.051] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|