1
|
Wu J, Wang J, Shen H, Basin MV. Multiplayer Differential Games of Markov Jump Systems via Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2025; 55:1860-1872. [PMID: 40031853 DOI: 10.1109/tcyb.2025.3538787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
In this article, we focus on solving the problem of online multiplayer differential games (MDGs) of Markov jump systems (MJSs) using a reinforcement learning (RL) method. We consider MDGs of MJSs from the following two scenarios. In the first scenario, we propose a distributed minmax strategy, where each player can derive their optimal control policy from distributed game algebraic Riccati equations (DGAREs) without prior knowledge of the policies adopted by other players, distinguishing it from existing RL algorithms. We design a novel online distributed RL algorithm to approximate the solution of DGAREs without completely knowing system dynamics and initial admissible control policy. The second scenario involves applying Nash strategy to address MDGs of MJSs. Different from existing synchronous RL algorithm, we propose a novel online asynchronous RL algorithm that employs asynchronous iterative calculations for both policy evaluation and policy improvement, incorporating the latest information into the iterative process. The convergence of the designed RL algorithms is rigorously analyzed. Finally, two inverted pendulum system applications validate the effectiveness of the proposed methods.
Collapse
|
2
|
Yang F, Gong Z, Wei Q, Lei Y. Secure Containment Control for Multi-UAV Systems by Fixed-Time Convergent Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2025; 55:1981-1994. [PMID: 40031616 DOI: 10.1109/tcyb.2025.3534463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
This article concerns the secure containment control problem for multiple autonomous aerial vehicles. The cyber attacker can manipulate control commands, resulting in containment failure in the position loop. Within a zero-sum graphical game framework, secure containment controllers and malicious attackers are regarded as game players, and the attack-defense process is recast as a min-max optimization problem. Acquiring optimal distributed secure control policies requires solving the game-related Hamilton-Jacobi-Isaacs (HJI) equations. Based on the critic-only neural network (NN) structure, the reinforcement learning (RL) method is employed in solving coupled HJI equations. The fixed-time convergence technique is introduced to improve the convergence rate of RL, and the experience replay mechanism is utilized to relax the persistence of excitation condition. The associated NN convergence and closed-loop stability are analyzed. In the attitude loop, the optimal feedback control law is obtained by solving Hamilton-Jacobi-Bellman equations using the fixed-time convergent RL method. The simulation example and the quadrotor experiment are given to show the effectiveness of the proposed scheme.
Collapse
|
3
|
You S, Byeon K, Seo J, Kim W, Tomizuka M. Policy-Iteration-Based Active Disturbance Rejection Control for Uncertain Nonlinear Systems With Unknown Relative Degree. IEEE TRANSACTIONS ON CYBERNETICS 2025; 55:1347-1358. [PMID: 40031680 DOI: 10.1109/tcyb.2025.3532518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
In this article, a policy-iteration-based active disturbance rejection control (ADRC) is proposed for uncertain nonlinear systems to achieve real-time output tracking performance, regardless of the specific relative degree of the system. The approach integrates a partial control input generator with a policy-iteration-based reinforcement learning (RL) agent for degree weight adjustment. The partial control input generator includes each ith order partial control input, which is constructed following the ADRC design framework for an ith order system. The RL agent adjusts the degree weights (its actions) to enhance the dominance of the partial control input corresponding to the unknown relative degree through iterative policy refinement. The RL agent is designed to minimize the quadratic reward as the performance index function while enhancing the influence of the partial control input associated with the correct relative degree via the policy iteration procedure. All signals in the closed-loop system (including the time-varying degree weights) ensure semi-global uniformly ultimately boundness using the Lyapunov stability theorem and the affinely quadratically stable property. Consequently, the degree weight adjustments by the RL agent do not affect the closed-loop stability. The proposed method does not require system dynamics, specific relative degree, external disturbances, and other state variable sensing beyond output sensing. The performance of the proposed method was validated via simulations for two different-order uncertain nonlinear systems and experiments using a permanent magnet synchronous motor testbed.
Collapse
|
4
|
Jiao S, Wei Q, Chen W, Wang FY. Parallel Control With Adaptive Critic-Actor Learning Implementation for State and Input Time-Delayed Nonlinear Continuous-Time Systems. IEEE TRANSACTIONS ON CYBERNETICS 2025; 55:996-1007. [PMID: 40030806 DOI: 10.1109/tcyb.2024.3519140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
This study seeks to develop a constructive approach that settles the optimal control issue for nonlinear systems with known time delays. The feedback system, which depends on the state and control input, is built to identify the actual control rule utilizing the backstepping integral technique. Optimal control of the augmented system established on parallel control delivers a solution for nonlinear time-delayed systems. At the cost of the modified gain condition, the value function is specified in terms of the state and input delays, transforming the optimal control issue into a minimax task. Then, the critic-actor framework is employed to reconstruct the cost function and control rule while maintaining the persistently exciting (PE) condition so that the online optimal control algorithm is investigated. In addition, the Lyapunov proof discusses the system's stability. Ultimately, the remarkable properties become visible through experimental findings.
Collapse
|
5
|
Song S, Gong D, Zhu M, Zhao Y, Huang C. Data-Driven Optimal Tracking Control for Discrete-Time Nonlinear Systems With Unknown Dynamics Using Deterministic ADP. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:1184-1198. [PMID: 37847626 DOI: 10.1109/tnnls.2023.3323142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2023]
Abstract
This article aims to solve the optimal tracking problem (OTP) for a class of discrete-time (DT) nonlinear systems with completely unknown dynamics. A novel data-driven deterministic approximate dynamic programming (ADP) algorithm is proposed to solve this kind of problem with only input-output (I/O) data. The proposed algorithm has two advantages compared to existing data-driven deterministic ADP algorithms for the OTP. First, our algorithm can guarantee optimality while achieving better performance in the aspects of time-saving and robustness to data. Second, the near-optimal control policy learned by our algorithm can be implemented without considering expected control and enable the system states to track the user-specified reference signals. Therefore, the tracking performance is guaranteed while simplifying the algorithm implementation. Furthermore, the convergence and stability of the proposed algorithm are strictly proved through theoretical analysis, in which the errors caused by neural networks (NNs) are considered. At the end of this article, the developed algorithm is compared with two representative deterministic ADP algorithms through a numerical example and applied to solve the tracking problem for a two-link robotic manipulator. The simulation results demonstrate the effectiveness and advantages of the developed algorithm.
Collapse
|
6
|
Wen G, Niu B. Optimized distributed formation control using identifier-critic-actor reinforcement learning for a class of stochastic nonlinear multi-agent systems. ISA TRANSACTIONS 2024; 155:1-10. [PMID: 39472256 DOI: 10.1016/j.isatra.2024.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 09/08/2024] [Accepted: 10/04/2024] [Indexed: 12/13/2024]
Abstract
This article is to propose an adaptive reinforcement learning (RL)-based optimized distributed formation control for the unknown stochastic nonlinear single-integrator dynamic multi-agent system (MAS). For solving the issue of unknown dynamic, an adaptive identifier neural network (NN) is developed to determine the stochastic MAS under expectation sense. And then, for deriving the optimized formation control, the RL is putted into effect via constructing a pair of critic and actor NNs. With regard of the traditional RL optimal controls, their algorithm exists the inherent complexity, because their adaptive RL algorithm are derived from negative gradient of the square of Hamilton-Jacobi-Bellman (HJB) equation. As a result, these methods are difficultly extended to stochastic dynamical systems. However, since this adaptive RL laws are derived from a simple positive function rather than the square of HJB equation, it can make optimal control with simple algorithm. Therefore, this optimized formation scheme can be smoothly performed to the stochastic MAS. Finally, according to theorem proof and computer simulation, the optimized method can realize the required control objective.
Collapse
Affiliation(s)
- Guoxing Wen
- Shandong University of Aeronautics, Binzhou, 256600, Shandong, China.
| | - Ben Niu
- Dalian University of Technology, Dalian, Liaoning, 116024, China.
| |
Collapse
|
7
|
Liu Q, Yan H, Zhang H, Wang M, Tian Y. Data-Driven H∞ Output Consensus for Heterogeneous Multiagent Systems Under Switching Topology via Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2024; 54:7865-7876. [PMID: 39120994 DOI: 10.1109/tcyb.2024.3419056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/11/2024]
Abstract
In this article, a novel model-free policy gradient reinforcement learning algorithm is proposed to solve the tracking problem for discrete-time heterogeneous multiagent systems with external disturbances over switching topology. The dynamics of the followers and the leader are unknown, and the leader's information is missing for each agent due to the switching topology. Therefore, a distributed adaptive observer is introduced to learn the leader's dynamic model and estimate its state for each agent. For the tracking problem, an exponential discount value function is established and the related discrete-time game algebraic Riccati equation (DTGARE) is derived, which is the key to obtaining the control strategy. Furthermore, a data-based policy gradient algorithm is proposed to approximate the solution of the GAREs online and the utilization of agents' accurate knowledge is avoided. To improve the efficiency of data utilization, an offline dataset and the experience replay scheme are used. In addition, the lower bound of the exponential discount value is explored to ensure the stability of the systems. In the end, a simulation is provided to show the validity of the proposed method.
Collapse
|
8
|
Li H, Wang X, Guo Z, Zhang J, Qi S. D2CFR: Minimize Counterfactual Regret With Deep Dueling Neural Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18343-18356. [PMID: 37801386 DOI: 10.1109/tnnls.2023.3314638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
Counterfactual regret minimization (CFR) is a popular method for finding approximate Nash equilibrium in two-player zero-sum games with imperfect information. Solving large-scale games with CFR needs a combination of abstraction techniques and certain expert knowledge, which constrains its scalability. Recent neural-based CFR methods mitigate the need for abstraction and expert knowledge by training an efficient network to directly obtain counterfactual regret without abstraction. However, these methods only consider estimating regret values for individual actions, neglecting the evaluation of state values, which are significant for decision-making. In this article, we introduce deep dueling CFR (D2CFR), which emphasizes the state value estimation by employing a novel value network with a dueling structure. Moreover, a rectification module based on a time-shifted Monte Carlo simulation is designed to rectify the inaccurate state value estimation. Extensive experimental results are conducted to show that D2CFR converges faster and outperforms comparison methods on test games.
Collapse
|
9
|
Wang H, Zheng J, Fu J, Wang Y. Bounded Containment Maneuvering Protocols for Marine Surface Vehicles With Quantized Communications and Tracking Errors Constrained Guidance: Theory and Experiment. IEEE TRANSACTIONS ON CYBERNETICS 2024; 54:7691-7702. [PMID: 39361463 DOI: 10.1109/tcyb.2024.3462757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
A new type of containment maneuvering protocols for multiple marine surface vehicles (MSVs) is developed to follow a parameterized path in this work, where the tracking errors are constrained within finite time and the information needed to be transmitted is quantized during coordination. To achieve containment maneuvering of multiple MSVs, a two-objective coordinated control framework is proposed. For the geometric objective, by developing tan-type barrier Lyapunov functions (BLFs) and extended Lyapunov condition-based finite-time guidance laws, the performance of the parameterized line-of-sight guidance framework, including convergence speed and tracking error constraints, is improved. For the dynamic objective, based on quantized control strategy and smooth saturation functions, novel bounded containment maneuvering protocols are proposed to dramatically alleviate the burden of communications among MSVs and ensure more faster dynamic behavior on tracking the path updating speed. Both theoretical analysis and experimental tests with comparative studies illustrate the validity of the proposed containment maneuvering strategy.
Collapse
|
10
|
Wang R, Wang Z, Liu S, Li T, Li F, Qin B, Wei Q. Optimal Spin Polarization Control for the Spin-Exchange Relaxation-Free System Using Adaptive Dynamic Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5835-5847. [PMID: 37015668 DOI: 10.1109/tnnls.2022.3230200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
This work is the first to solve the 3-D spin polarization control (3DSPC) problem of atomic ensembles, which controls the spin polarization to achieve arbitrary states with the cooperation of multiphysics fields. First, a novel adaptive dynamic programming (ADP) structure is proposed based on the developed multicritic multiaction neural network (MCMANN) structure with nonquadratic performance functions, as a way to solve the multiplayer nonzero-sum game (MP-NZSG) problem in 3DSPC under the constraints of asymmetric saturation inputs. Then, we utilize the MCMANNs to implement the multicritic multiaction ADP (MCMA-ADP) algorithm, whose convergence is proven by the compression mapping principle. Finally, the MCMA-ADP is deployed in the spin-exchange relaxation-free (SERF) system to provide a set of control laws in 3DSPC that fully exploits the multiphysics fields to achieve arbitrary spin polarization states. Numerical simulations support the theoretical results.
Collapse
|
11
|
Lin Z, Duan J, Li SE, Ma H, Li J, Chen J, Cheng B, Ma J. Policy-Iteration-Based Finite-Horizon Approximate Dynamic Programming for Continuous-Time Nonlinear Optimal Control. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:5255-5267. [PMID: 37015565 DOI: 10.1109/tnnls.2022.3225090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The Hamilton-Jacobi-Bellman (HJB) equation serves as the necessary and sufficient condition for the optimal solution to the continuous-time (CT) optimal control problem (OCP). Compared with the infinite-horizon HJB equation, the solving of the finite-horizon (FH) HJB equation has been a long-standing challenge, because the partial time derivative of the value function is involved as an additional unknown term. To address this problem, this study first-time bridges the link between the partial time derivative and the terminal-time utility function, and thus it facilitates the use of the policy iteration (PI) technique to solve the CT FH OCPs. Based on this key finding, the FH approximate dynamic programming (ADP) algorithm is proposed leveraging an actor-critic framework. It is shown that the algorithm exhibits important properties in terms of convergence and optimality. Rather importantly, with the use of multilayer neural networks (NNs) in the actor-critic architecture, the algorithm is suitable for CT FH OCPs toward more general nonlinear and complex systems. Finally, the effectiveness of the proposed algorithm is demonstrated by conducting a series of simulations on both a linear quadratic regulator (LQR) problem and a nonlinear vehicle tracking problem.
Collapse
|
12
|
Wang J, Wu J, Cao J, Chadli M, Shen H. Nonfragile Output Feedback Tracking Control for Markov Jump Fuzzy Systems Based on Integral Reinforcement Learning Scheme. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:4521-4530. [PMID: 36194715 DOI: 10.1109/tcyb.2022.3203795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this article, a novel integral reinforcement learning (RL)-based nonfragile output feedback tracking control algorithm is proposed for uncertain Markov jump nonlinear systems presented by the Takagi-Sugeno fuzzy model. The problem of nonfragile control is converted into solving the zero-sum games, where the control input and uncertain disturbance input can be regarded as two rival players. Based on the RL architecture, an offline parallel output feedback tracking learning algorithm is first designed to solve fuzzy stochastic coupled algebraic Riccati equations for Markov jump fuzzy systems. Furthermore, to overcome the requirement of a precise system information and transition probability, an online parallel integral RL-based algorithm is designed. Besides, the tracking object is achieved and the stochastically asymptotic stability, and expected H∞ performance for considered systems is ensured via the Lyapunov stability theory and stochastic analysis method. Furthermore, the effectiveness of the proposed control algorithm is verified by a robot arm system.
Collapse
|
13
|
Sun J, Dai J, Zhang H, Yu S, Xu S, Wang J. Neural-Network-Based Immune Optimization Regulation Using Adaptive Dynamic Programming. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1944-1953. [PMID: 35767503 DOI: 10.1109/tcyb.2022.3179302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This article investigates optimal regulation scheme between tumor and immune cells based on the adaptive dynamic programming (ADP) approach. The therapeutic goal is to inhibit the growth of tumor cells to allowable injury degree and maximize the number of immune cells in the meantime. The reliable controller is derived through the ADP approach to make the number of cells achieve the specific ideal states. First, the main objective is to weaken the negative effect caused by chemotherapy and immunotherapy, which means that the minimal dose of chemotherapeutic and immunotherapeutic drugs can be operational in the treatment process. Second, according to the nonlinear dynamical mathematical model of tumor cells, chemotherapy and immunotherapeutic drugs can act as powerful regulatory measures, which is a closed-loop control behavior. Finally, states of the system and critic weight errors are proved to be ultimately uniformly bounded with the appropriate optimization control strategy and the simulation results are shown to demonstrate the effectiveness of the cybernetics methodology.
Collapse
|
14
|
Zhao S, Wang J, Xu H, Wang B. Composite Observer-Based Optimal Attitude-Tracking Control With Reinforcement Learning for Hypersonic Vehicles. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:913-926. [PMID: 35969557 DOI: 10.1109/tcyb.2022.3192871] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This article proposes an observer-based reinforcement learning (RL) control approach to address the optimal attitude-tracking problem and application for hypersonic vehicles in the reentry phase. Due to the unknown uncertainty and nonlinearity caused by parameter perturbation and external disturbance, accurate model information of hypersonic vehicles in the reentry phase is generally unavailable. For this reason, a novel synchronous estimation is proposed to construct a composite observer for hypersonic vehicles, which consists of a neural-network (NN)-based Luenberger-type observer and a synchronous disturbance observer. This solves the identification problem of nonlinear dynamics in the reference control and realizes the estimation of the system state when unknown nonlinear dynamics and unknown disturbance exist at the same time. By synthesizing the information from the composite observer, an RL tracking controller is developed to solve the optimal attitude-tracking control problem. To improve the convergence performance of critic network weights, concurrent learning is employed to replace the traditional persistent excitation condition with a historical experience replay manner. In addition, this article proves that the weight estimation error is bounded when the learning rate satisfies the given sufficient condition. Finally, the numerical simulation demonstrates the effectiveness and superiority of the proposed approaches to attitude-tracking control systems for hypersonic vehicles.
Collapse
|
15
|
Tutsoy O, Barkana DE, Balikci K. A Novel Exploration-Exploitation-Based Adaptive Law for Intelligent Model-Free Control Approaches. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:329-337. [PMID: 34398780 DOI: 10.1109/tcyb.2021.3091680] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Model-free control approaches require advanced exploration-exploitation policies to achieve practical tasks such as learning to bipedal robot walk in unstructured environments. In this article, we first construct a comprehensive exploration-exploitation policy that carries quality knowledge about the long-term predictor and the control policy, and the control signal of the model-free algorithms. Therefore, the developed model-free algorithm continues exploration by adjusting its unknown parameters until the desired learning and control are accomplished. Second, we provide an utterly model-free adaptive law enriched with the exploration-exploitation policy and derived step-by-step using the exact analogy of the model-based solution. The obtained adaptive control law considers the control signal saturation and the control signal (input) delay. Performed Lyapunov stability analysis ensures the convergence of the adaptive law that can also be used for intelligent control approaches. Third, we implement the adaptive algorithm in real time on a challenging benchmark system: a fourth-order, coupled dynamics, input saturated, and time-delayed underactuated manipulator. The results show that the proposed adaptive algorithm explores larger state-action spaces and treats the vanishing gradient problem in both learning and control. Also, we notice from the results that the learning and control properties of the adaptive algorithm are optimized as required.
Collapse
|
16
|
Oh K, Seo J. Development of a Sliding-Mode-Control-Based Path-Tracking Algorithm with Model-Free Adaptive Feedback Action for Autonomous Vehicles. SENSORS (BASEL, SWITZERLAND) 2022; 23:405. [PMID: 36617002 PMCID: PMC9824019 DOI: 10.3390/s23010405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 12/26/2022] [Accepted: 12/27/2022] [Indexed: 06/17/2023]
Abstract
This paper presents a sliding mode control (SMC)-based path-tracking algorithm for autonomous vehicles by considering model-free adaptive feedback actions. In autonomous vehicles, safe path tracking requires adaptive and robust control algorithms because driving environment and vehicle conditions vary in real time. In this study, the SMC was adopted as a robust control method to adjust the switching gain, taking into account the sliding surface and unknown uncertainty to make the control error zero. The sliding surface can be designed mathematically, but it is difficult to express the unknown uncertainty mathematically. Information of priori bounded uncertainties is needed to obtain closed-loop stability of the control system, and the unknown uncertainty can vary with changes in internal and external factors. In the literature, ongoing efforts have been made to overcome the limitation of losing control stability due to unknown uncertainty. This study proposes an integrated method of adaptive feedback control (AFC) and SMC that can adjust a bounded uncertainty. Some illustrative and representative examples, such as autonomous driving scenarios, are also provided to show the main properties of the designed integrated controller. The examples show superior control performance, and it is expected that the integrated controller could be widely used for the path-tracking algorithms of autonomous vehicles.
Collapse
Affiliation(s)
- Kwangseok Oh
- School of ICT, Robotics & Mechanical Engineering, Hankyong National University, Anseong-si 17579, Republic of Korea
| | - Jaho Seo
- Department of Automotive and Mechatronics Engineering, Ontario Tech University, Oshawa, ON L1G 0C5, Canada
| |
Collapse
|
17
|
Neural network-based event-triggered data-driven control of disturbed nonlinear systems with quantized input. Neural Netw 2022; 156:152-159. [DOI: 10.1016/j.neunet.2022.09.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Revised: 07/12/2022] [Accepted: 09/19/2022] [Indexed: 11/07/2022]
|
18
|
Xu S, Liu J, Yang C, Wu X, Xu T. A Learning-Based Stable Servo Control Strategy Using Broad Learning System Applied for Microrobotic Control. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13727-13737. [PMID: 34714762 DOI: 10.1109/tcyb.2021.3121080] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
As the controller parameter adjustment process is simplified significantly by using learning algorithms, the studies about learning-based control attract a lot of interest in recent years. This article focuses on the intelligent servo control problem using learning from desired demonstrations. Compared with the previous studies about the learning-based servo control, a control policy using the broad learning system (BLS) is developed and first applied to a microrobotic system, since the advantages of the BLS, such as simple structure and no-requirement for retraining when new demos' data is provided. Then, the Lyapunov theory is skillfully combined with the complex learning algorithm to derive the controller parameters' constraints. Thus, the final control policy not only can obtain the movement skills of the desired demonstrations but also have the strong ability of generalization and error convergence. Finally, simulation and experimental examples verify the effectiveness of the proposed strategy using MATLAB and a microswimmer trajectory tracking system.
Collapse
|
19
|
Adaptive dynamic event-triggered control for constrained modular reconfigurable robot. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
20
|
Leader-following consensus of second-order multi-agent systems with intermittent communication via persistent-hold control. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.10.111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|