Zhao M, Wang D, Qiao J. Neural-network-based accelerated safe Q-learning for optimal control of discrete-time nonlinear systems with state constraints.
Neural Netw 2025;
186:107249. [PMID:
39955957 DOI:
10.1016/j.neunet.2025.107249]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 01/11/2025] [Accepted: 02/02/2025] [Indexed: 02/18/2025]
Abstract
For unknown nonlinear systems with state constraints, it is difficult to achieve the safe optimal control by using Q-learning methods based on traditional quadratic utility functions. To solve this problem, this article proposes an accelerated safe Q-learning (SQL) technique that addresses the concurrent requirements of safety and optimality for discrete-time nonlinear systems within an integrated framework. First, an adjustable control barrier function is designed and integrated into the cost function, aiming to facilitate the transformation of constrained optimal control problems into unconstrained cases. The augmented cost function is closely linked to the next state, enabling quicker deviation of the state from constraint boundaries. Second, leveraging offline data that adheres to safety constraints, we introduce an off-policy value iteration SQL approach for searching a safe optimal policy, thus mitigating the risk of unsafe interactions that may result from suboptimal iterative policies. Third, the vast amounts of offline data and the complex augmented cost function can hinder the learning speed of the algorithm. To address this issue, we integrate historical iteration information into the current iteration step to accelerate policy evaluation, and introduce the Nesterov Momentum technique to expedite policy improvement. Additionally, the theoretical analysis demonstrates the convergence, optimality, and safety of the SQL algorithm. Finally, under the influence of different parameters, simulation outcomes of two nonlinear systems with state constraints reveal the efficacy and advantages of the accelerated SQL approach. The proposed method requires fewer iterations while enabling the system state to converge to the equilibrium point more rapidly.
Collapse