1
|
Qin ZC, Zhu HT, Wang SJ, Xin Y, Sun JQ. A reinforcement learning-based near-optimal hierarchical approach for motion control: Design and experiment. ISA TRANSACTIONS 2022; 129:673-683. [PMID: 35279310 DOI: 10.1016/j.isatra.2022.02.034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 02/16/2022] [Accepted: 02/16/2022] [Indexed: 06/14/2023]
Abstract
As a data-driven design method, model-free optimal control based on reinforcement learning provides an effective way to find optimal control strategies. The design of model-free optimal control is sensitive to system data because it relies on data rather than detailed dynamic models. A prerequisite for generating applicable data is that the system must be open-loop stable (with a stable equilibrium point), which restricts the data-based control design methods in actual control problems and leads to rare experimental studies or verification in the literature. To improve this situation and enrich its applications, we propose a pre-stabilized mechanism and apply it to the motion control of a mechanical system together with a reinforcement learning-based model-free optimal control method, which constitutes a so-called hierarchical control structure. We design two real-time control experiments on an underactuated system to verify its effectiveness. The control results show that the proposed hierarchical control is quite promising in controlling this mechanical system, even though it is open-loop unstable with unknown dynamics.
Collapse
Affiliation(s)
- Zhi-Chang Qin
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin 300384, China; National Demonstration Center for Experimental Mechanical and Electrical Engineering Education, Tianjin University of Technology, Tianjin 300384, China; Key Laboratory of Earthquake Engineering Simulation and Seismic Resilience of China, Earthquake Administration (Tianjin University), Tianjin, 300350, China
| | - Hai-Tao Zhu
- State Key Laboratory of Hydraulic Engineering Simulation and Safety (Tianjin University), Tianjin, 300072, China
| | - Shou-Jun Wang
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin 300384, China; National Demonstration Center for Experimental Mechanical and Electrical Engineering Education, Tianjin University of Technology, Tianjin 300384, China
| | - Ying Xin
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin 300384, China; National Demonstration Center for Experimental Mechanical and Electrical Engineering Education, Tianjin University of Technology, Tianjin 300384, China.
| | - Jian-Qiao Sun
- School of Engineering, University of California, Merced, CA 95343, USA
| |
Collapse
|
2
|
Bessa JA, Barreto GA, Rocha-Neto AR. An Outlier-Robust Growing Local Model Network for Recursive System Identification. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-11040-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
3
|
Neural network based asynchronous synchronization for fuzzy hidden Markov jump complex dynamical networks. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-021-00370-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractThis paper investigates the drive-response synchronization problem of Takagi–Sugeno fuzzy hidden Markov jump complex dynamical networks. More precisely, a novel asynchronous synchronization control strategy is developed for coping with mismatched hidden jumping modes. Furthermore, the neural network is adopted with online learning laws for unknown function approximation. By taking advantage of Lyapunov method, sufficient conditions are established to ensure mean-square synchronization performance with disturbances. Based on the synchronization criterion, asynchronous controller gains are designed in terms of linear matrix inequalities. An illustrative example is finally given to validate the effectiveness of the proposed synchronization techniques.
Collapse
|
4
|
Köpf F, Westermann J, Flad M, Hohmann S. Adaptive optimal control for reference tracking independent of exo-system dynamics. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.140] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
5
|
|
6
|
Ding D, Wang Z, Han QL, Wei G. Neural-Network-Based Output-Feedback Control Under Round-Robin Scheduling Protocols. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:2372-2384. [PMID: 29994553 DOI: 10.1109/tcyb.2018.2827037] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The neural-network (NN)-based output-feedback control is considered for a class of stochastic nonlinear systems under round-Robin (RR) scheduling protocols. For the purpose of effectively mitigating data congestions and saving energies, the RR protocols are implemented and the resulting nonlinear systems become the so-called protocol-induced periodic ones. Taking such a periodic characteristic into account, an NN-based observer is first proposed to reconstruct the system states where a novel adaptive tuning law on NN weights is adopted to cater to the requirement of performance analysis. In addition, with the established boundedness of the periodic systems in the mean-square sense, the desired observer gain is obtained by solving a set of matrix inequalities. Then, an actor-critic NN scheme with a time-varying step length in adaptive law is developed to handle the considered control problem with terminal constraints over finite-horizon. Some sufficient conditions are derived to guarantee the boundedness of estimation errors of critic and actor NN weights. In view of these conditions, some key parameters in adaptive tuning laws are easily determined via elementary algebraic operations. Furthermore, the stability in the mean-square sense is investigated for the discussed issue in infinite horizon. Finally, a simulation example is utilized to illustrate the applicability of the proposed control scheme.
Collapse
|
7
|
Wang JS, Yang GH. Output-Feedback Control of Unknown Linear Discrete-Time Systems With Stochastic Measurement and Process Noise via Approximate Dynamic Programming. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:1977-1988. [PMID: 28749361 DOI: 10.1109/tcyb.2017.2726004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper studies the optimal output-feedback control problem for unknown linear discrete-time systems with stochastic measurement and process noise. A dithered Bellman equation with the innovation covariance matrix is constructed via the expectation operator given in the form of a finite summation. On this basis, an output-feedback-based approximate dynamic programming method is developed, where the terms depending on the innovation covariance matrix are available with the aid of the innovation covariance matrix identified beforehand. Therefore, by iterating the Bellman equation, the resulting value function can converge to the optimal one in the presence of the aforementioned noise, and the nearly optimal control laws are delivered. To show the effectiveness and the advantages of the proposed approach, a simulation example and a velocity control experiment on a dc machine are employed.
Collapse
|
8
|
Yang F, Wang C. Pattern-Based NN Control of a Class of Uncertain Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1108-1119. [PMID: 28186912 DOI: 10.1109/tnnls.2017.2655503] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This paper presents a pattern-based neural network (NN) control approach for a class of uncertain nonlinear systems. The approach consists of two phases of identification and another two phases of recognition and control. First, in the phase (i) of identification, adaptive NN controllers are designed to achieve closed-loop stability and tracking performance of nonlinear systems for different control situations, and the corresponding closed-loop control system dynamics are identified via deterministic learning. The identified control system dynamics are stored in constant radial basis function (RBF) NNs, and a set of constant NN controllers are constructed by using the obtained constant RBF networks. Second, in the phase (ii) of identification, when the plant is operated under different or abnormal conditions, the system dynamics under normal control are identified via deterministic learning. A bank of dynamical estimators is constructed for all the abnormal conditions and the learned knowledge is embedded in the estimators. Third, in the phase of recognition, when one identified control situation recurs, by using the constructed estimators, the recurred control situation will be rapidly recognized. Finally, in the phase of pattern-based control, based on the rapid recognition, the constant NN controller corresponding to the current control situation is selected, and both closed-loop stability and improved control performance can be achieved. The results presented show that the pattern-based control realizes a humanlike control process, and will provide a new framework for fast decision and control in dynamic environments. A simulation example is included to demonstrate the effectiveness of the approach.
Collapse
|
9
|
Fan QY, Yang GH, Ye D. Quantization-Based Adaptive Actor-Critic Tracking Control With Tracking Error Constraints. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:970-980. [PMID: 28166508 DOI: 10.1109/tnnls.2017.2651104] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
In this paper, the problem of adaptive actor-critic (AC) tracking control is investigated for a class of continuous-time nonlinear systems with unknown nonlinearities and quantized inputs. Different from the existing results based on reinforcement learning, the tracking error constraints are considered and new critic functions are constructed to improve the performance further. To ensure that the tracking errors keep within the predefined time-varying boundaries, a tracking error transformation technique is used to constitute an augmented error system. Specific critic functions, rather than the long-term cost function, are introduced to supervise the tracking performance and tune the weights of the AC neural networks (NNs). A novel adaptive controller with a special structure is designed to reduce the effect of the NN reconstruction errors, input quantization, and disturbances. Based on the Lyapunov stability theory, the boundedness of the closed-loop signals and the desired tracking performance can be guaranteed. Finally, simulations on two connected inverted pendulums are given to illustrate the effectiveness of the proposed method.
Collapse
|
10
|
Skach J, Kiumarsi B, Lewis FL, Straka O. Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:29-40. [PMID: 27831897 DOI: 10.1109/tcyb.2016.2618926] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, motivated by human neurocognitive experiments, a model-free off-policy reinforcement learning algorithm is developed to solve the optimal tracking control of multiple-model linear discrete-time systems. First, an adaptive self-organizing map neural network is used to determine the system behavior from measured data and to assign a responsibility signal to each of system possible behaviors. A new model is added if a sudden change of system behavior is detected from the measured data and the behavior has not been previously detected. A value function is represented by partially weighted value functions. Then, the off-policy iteration algorithm is generalized to multiple-model learning to find a solution without any knowledge about the system dynamics or reference trajectory dynamics. The off-policy approach helps to increase data efficiency and speed of tuning since a stream of experiences obtained from executing a behavior policy is reused to update several value functions corresponding to different learning policies sequentially. Two numerical examples serve as a demonstration of the off-policy algorithm performance.
Collapse
|
11
|
Li K, Wu Y, Nan Y, Li P, Li Y. Hierarchical multi-class classification in multimodal spacecraft data using DNN and weighted support vector machine. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.08.131] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
12
|
Rizzi C, Johnson CG, Fabris F, Vargas PA. A Situation-Aware Fear Learning (SAFEL) model for robots. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.09.035] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
13
|
Luo B, Liu D, Huang T, Wang D. Model-Free Optimal Tracking Control via Critic-Only Q-Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:2134-2144. [PMID: 27416608 DOI: 10.1109/tnnls.2016.2585520] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Model-free control is an important and promising topic in control fields, which has attracted extensive attention in the past few years. In this paper, we aim to solve the model-free optimal tracking control problem of nonaffine nonlinear discrete-time systems. A critic-only Q-learning (CoQL) method is developed, which learns the optimal tracking control from real system data, and thus avoids solving the tracking Hamilton-Jacobi-Bellman equation. First, the Q-learning algorithm is proposed based on the augmented system, and its convergence is established. Using only one neural network for approximating the Q-function, the CoQL method is developed to implement the Q-learning algorithm. Furthermore, the convergence of the CoQL method is proved with the consideration of neural network approximation error. With the convergent Q-function obtained from the CoQL method, the adaptive optimal tracking control is designed based on the gradient descent scheme. Finally, the effectiveness of the developed CoQL method is demonstrated through simulation studies. The developed CoQL method learns with off-policy data and implements with a critic-only structure, thus it is easy to realize and overcome the inadequate exploration problem.
Collapse
|
14
|
Ding C, Sun Y, Zhu Y. A NN-Based Hybrid Intelligent Algorithm for a Discrete Nonlinear Uncertain Optimal Control Problem. Neural Process Lett 2016. [DOI: 10.1007/s11063-016-9536-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
15
|
Wang JS, Yang GH. Data-driven output-feedback fault-tolerant L2 control of unknown dynamic systems. ISA TRANSACTIONS 2016; 63:182-195. [PMID: 27178710 DOI: 10.1016/j.isatra.2016.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2015] [Revised: 02/26/2016] [Accepted: 04/08/2016] [Indexed: 06/05/2023]
Abstract
This paper studies the data-driven output-feedback fault-tolerant L2-control problem for unknown dynamic systems. In a framework of active fault-tolerant control (FTC), three issues are addressed, including fault detection, controller reconfiguration for optimal guaranteed cost control, and tracking control. According to the data-driven form of observer-based residual generators, the system state is expressed in the form of the measured input-output data. On this basis, a model-free approach to L2 control of unknown linear time-invariant (LTI) discrete-time plants is given. To achieve tracking control, a design method for a pre-filter is also presented. With the aid of the aforementioned results and the input-output data-based time-varying value function approximation structure, a data-driven FTC scheme ensuring L2-gain properties is developed. To illustrate the effectiveness of the proposed methodology, two simulation examples are employed.
Collapse
Affiliation(s)
- Jun-Sheng Wang
- College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning 110819, PR China; State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, Liaoning 110819, PR China.
| | - Guang-Hong Yang
- College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning 110819, PR China; State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, Liaoning 110819, PR China.
| |
Collapse
|
16
|
Fuzzy Counter Propagation Neural Network Control for a Class of Nonlinear Dynamical Systems. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2015; 2015:719620. [PMID: 26366169 PMCID: PMC4558459 DOI: 10.1155/2015/719620] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2015] [Revised: 08/01/2015] [Accepted: 08/03/2015] [Indexed: 11/25/2022]
Abstract
Fuzzy Counter Propagation Neural Network (FCPN) controller design is developed, for a class of nonlinear dynamical systems. In this process, the weight connecting between the instar and outstar, that is, input-hidden and hidden-output layer, respectively, is adjusted by using Fuzzy Competitive Learning (FCL). FCL paradigm adopts the principle of learning, which is used to calculate Best Matched Node (BMN) which is proposed. This strategy offers a robust control of nonlinear dynamical systems. FCPN is compared with the existing network like Dynamic Network (DN) and Back Propagation Network (BPN) on the basis of Mean Absolute Error (MAE), Mean Square Error (MSE), Best Fit Rate (BFR), and so forth. It envisages that the proposed FCPN gives better results than DN and BPN. The effectiveness of the proposed FCPN algorithms is demonstrated through simulations of four nonlinear dynamical systems and multiple input and single output (MISO) and a single input and single output (SISO) gas furnace Box-Jenkins time series data.
Collapse
|