1
|
Lin M, Zhao B, Liu D. Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:5574-5585. [PMID: 38530722 DOI: 10.1109/tnnls.2024.3379207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]
Abstract
Optimal learning output tracking control (OLOTC) in a model-free manner has received increasing attention in both the intelligent control and the reinforcement learning (RL) communities. Although the model-free tracking control has been achieved via off-policy learning and Q-learning, another popular RL idea of direct policy learning, with its easy-to-implement feature, is still rarely considered. To fill this gap, this article aims to develop a novel model-free policy optimization (PO) algorithm to achieve the OLOTC for unknown linear discrete-time (DT) systems. The iterative control policy is parameterized to directly improve the discounted value function of the augmented system via the gradient-based method. To implement this algorithm in a model-free manner, a model-free two-point policy gradient (PG) algorithm is designed to approximate the gradient of discounted value function by virtue of the sampled states and the reference trajectories. The global convergence of model-free PO algorithm to the optimal value function is demonstrated with the sufficient quantity of samples and proper conditions. Finally, numerical simulation results are provided to validate the effectiveness of the present method.
Collapse
|
2
|
Hu L, Wang D, Wang G, Qiao J. Self-triggered neural tracking control for discrete-time nonlinear systems via adaptive critic learning. Neural Netw 2025; 186:107280. [PMID: 40015032 DOI: 10.1016/j.neunet.2025.107280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Revised: 01/08/2025] [Accepted: 02/12/2025] [Indexed: 03/01/2025]
Abstract
In this paper, a novel self-triggered optimal tracking control method is developed based on the online action-critic technique for discrete-time nonlinear systems. First, an augmented plant is constructed by integrating the system state with the reference trajectory. This transformation redefines the optimal tracking control design as the optimal regulation issue of the reconstructed nonlinear error system. Subsequently, under the premise of ensuring the controlled system stability, a self-sampling function that depends solely on the sampling tracking error is devised, thereby determining the next triggering instant. This approach not only effectively reduces the computational burden but also eliminates the need for continuous evaluation of the triggering condition, as required in traditional event-based methods. Furthermore, the developed control method can be found to possess excellent triggering performance. The model, critic, and action neural networks are constructed to implement the online critic learning algorithm, enabling real-time adjustment of the tracking control policy to achieve optimal performance. Finally, an experimental plant with nonlinear characteristics is presented to illustrate the overall performance of the proposed online self-triggered tracking control strategy.
Collapse
Affiliation(s)
- Lingzhi Hu
- School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing 100124, China.
| | - Ding Wang
- School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing 100124, China.
| | - Gongming Wang
- School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing 100124, China.
| | - Junfei Qiao
- School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing 100124, China.
| |
Collapse
|
3
|
Song S, Gong D, Zhu M, Zhao Y, Huang C. Data-Driven Optimal Tracking Control for Discrete-Time Nonlinear Systems With Unknown Dynamics Using Deterministic ADP. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:1184-1198. [PMID: 37847626 DOI: 10.1109/tnnls.2023.3323142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2023]
Abstract
This article aims to solve the optimal tracking problem (OTP) for a class of discrete-time (DT) nonlinear systems with completely unknown dynamics. A novel data-driven deterministic approximate dynamic programming (ADP) algorithm is proposed to solve this kind of problem with only input-output (I/O) data. The proposed algorithm has two advantages compared to existing data-driven deterministic ADP algorithms for the OTP. First, our algorithm can guarantee optimality while achieving better performance in the aspects of time-saving and robustness to data. Second, the near-optimal control policy learned by our algorithm can be implemented without considering expected control and enable the system states to track the user-specified reference signals. Therefore, the tracking performance is guaranteed while simplifying the algorithm implementation. Furthermore, the convergence and stability of the proposed algorithm are strictly proved through theoretical analysis, in which the errors caused by neural networks (NNs) are considered. At the end of this article, the developed algorithm is compared with two representative deterministic ADP algorithms through a numerical example and applied to solve the tracking problem for a two-link robotic manipulator. The simulation results demonstrate the effectiveness and advantages of the developed algorithm.
Collapse
|
4
|
Lin M, Zhao B, Liu D. Policy gradient adaptive dynamic programming for nonlinear discrete-time zero-sum games with unknown dynamics. Soft comput 2023. [DOI: 10.1007/s00500-023-07817-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|