1
|
Zhao M, Wang D, Qiao J. Neural-network-based accelerated safe Q-learning for optimal control of discrete-time nonlinear systems with state constraints. Neural Netw 2025; 186:107249. [PMID: 39955957 DOI: 10.1016/j.neunet.2025.107249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 01/11/2025] [Accepted: 02/02/2025] [Indexed: 02/18/2025]
Abstract
For unknown nonlinear systems with state constraints, it is difficult to achieve the safe optimal control by using Q-learning methods based on traditional quadratic utility functions. To solve this problem, this article proposes an accelerated safe Q-learning (SQL) technique that addresses the concurrent requirements of safety and optimality for discrete-time nonlinear systems within an integrated framework. First, an adjustable control barrier function is designed and integrated into the cost function, aiming to facilitate the transformation of constrained optimal control problems into unconstrained cases. The augmented cost function is closely linked to the next state, enabling quicker deviation of the state from constraint boundaries. Second, leveraging offline data that adheres to safety constraints, we introduce an off-policy value iteration SQL approach for searching a safe optimal policy, thus mitigating the risk of unsafe interactions that may result from suboptimal iterative policies. Third, the vast amounts of offline data and the complex augmented cost function can hinder the learning speed of the algorithm. To address this issue, we integrate historical iteration information into the current iteration step to accelerate policy evaluation, and introduce the Nesterov Momentum technique to expedite policy improvement. Additionally, the theoretical analysis demonstrates the convergence, optimality, and safety of the SQL algorithm. Finally, under the influence of different parameters, simulation outcomes of two nonlinear systems with state constraints reveal the efficacy and advantages of the accelerated SQL approach. The proposed method requires fewer iterations while enabling the system state to converge to the equilibrium point more rapidly.
Collapse
Affiliation(s)
- Mingming Zhao
- School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Ding Wang
- School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Junfei Qiao
- School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
2
|
Wang Y, Wang D, Zhao M, Liu N, Qiao J. Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate. Neural Netw 2024; 175:106274. [PMID: 38583264 DOI: 10.1016/j.neunet.2024.106274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/15/2024] [Accepted: 03/25/2024] [Indexed: 04/09/2024]
Abstract
In this paper, an adjustable Q-learning scheme is developed to solve the discrete-time nonlinear zero-sum game problem, which can accelerate the convergence rate of the iterative Q-function sequence. First, the monotonicity and convergence of the iterative Q-function sequence are analyzed under some conditions. Moreover, by employing neural networks, the model-free tracking control problem can be overcome for zero-sum games. Second, two practical algorithms are designed to guarantee the convergence with accelerated learning. In one algorithm, an adjustable acceleration phase is added to the iteration process of Q-learning, which can be adaptively terminated with convergence guarantee. In another algorithm, a novel acceleration function is developed, which can adjust the relaxation factor to ensure the convergence. Finally, through a simulation example with the practical physical background, the fantastic performance of the developed algorithm is demonstrated with neural networks.
Collapse
Affiliation(s)
- Yuan Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Ding Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Mingming Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Nan Liu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Junfei Qiao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
3
|
Wang J, Wang D, Li X, Qiao J. Dichotomy value iteration with parallel learning design towards discrete-time zero-sum games. Neural Netw 2023; 167:751-762. [PMID: 37729789 DOI: 10.1016/j.neunet.2023.09.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 07/16/2023] [Accepted: 09/04/2023] [Indexed: 09/22/2023]
Abstract
In this paper, a novel parallel learning framework is developed to solve zero-sum games for discrete-time nonlinear systems. Briefly, the purpose of this study is to determine a tentative function according to the prior knowledge of the value iteration (VI) algorithm. The learning process of the parallel controllers can be guided by the tentative function. That is to say, the neighborhood of the optimal cost function can be compressed within a small range via two typical exploration policies. Based on the parallel learning framework, a novel dichotomy VI algorithm is established to accelerate the learning speed. It is shown that the parallel controllers will converge to the optimal policy from contrary initial policies. Finally, two typical systems are used to demonstrate the learning performance of the constructed dichotomy VI algorithm.
Collapse
Affiliation(s)
- Jiangyu Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Ding Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Xin Li
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| | - Junfei Qiao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|