1
|
Lin M, Sun Z, Xia Y, Zhang J. Reinforcement Learning-Based Model Predictive Control for Discrete-Time Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:3312-3324. [PMID: 37204957 DOI: 10.1109/tnnls.2023.3273590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
This article proposes a novel reinforcement learning-based model predictive control (RLMPC) scheme for discrete-time systems. The scheme integrates model predictive control (MPC) and reinforcement learning (RL) through policy iteration (PI), where MPC is a policy generator and the RL technique is employed to evaluate the policy. Then the obtained value function is taken as the terminal cost of MPC, thus improving the generated policy. The advantage of doing so is that it rules out the need for the offline design paradigm of the terminal cost, the auxiliary controller, and the terminal constraint in traditional MPC. Moreover, RLMPC proposed in this article enables a more flexible choice of prediction horizon due to the elimination of the terminal constraint, which has great potential in reducing the computational burden. We provide a rigorous analysis of the convergence, feasibility, and stability properties of RLMPC. Simulation results show that RLMPC achieves nearly the same performance as traditional MPC in the control of linear systems and exhibits superiority over traditional MPC for nonlinear ones.
Collapse
|
2
|
Wang P, Zhang X. Inverse optimal missile guidance law under constraints based on prescribed-time explicit reference governor. ISA TRANSACTIONS 2022; 129:395-404. [PMID: 34973690 DOI: 10.1016/j.isatra.2021.12.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 12/11/2021] [Accepted: 12/11/2021] [Indexed: 06/14/2023]
Abstract
In this paper, the utilization of inverse-optimality-based prescribed-time explicit reference governor is investigated for missile intercepting against unknown maneuvering targets under performance and control input constraints. With an arctangent-based disturbance observer equipped for disturbance elimination, incorporating the inverse optimality approach into the missile interception guarantees the minimization of a performance index. In the framework of prescribed-time explicit reference governor, the control constraint is translated into a restriction of time-varying invariant set, and it follows the prescribed-time regulation of the applied reference along with the restriction satisfaction. The combined prescribed-time explicit reference governor approach could be transformed into a linear matrix inequality optimization problem, and its online solution over a receding horizon gives a Lyapunov function value for reference regulation and then control decisions in the continuous time. Simulation studies are performed to illustrate the performance of the proposed guidance control law.
Collapse
Affiliation(s)
- Peng Wang
- School of Energy and Power Engineering, Nanjing University of Science and Technology, Nanjing, 210094, PR China.
| | - Xiaobing Zhang
- School of Energy and Power Engineering, Nanjing University of Science and Technology, Nanjing, 210094, PR China.
| |
Collapse
|
3
|
Hmede R, Chapelle F, Lapusta Y. Review of Neural Network Modeling of Shape Memory Alloys. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22155610. [PMID: 35957170 PMCID: PMC9370891 DOI: 10.3390/s22155610] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 07/23/2022] [Accepted: 07/25/2022] [Indexed: 05/27/2023]
Abstract
Shape memory materials are smart materials that stand out because of several remarkable properties, including their shape memory effect. Shape memory alloys (SMAs) are largely used members of this family and have been innovatively employed in various fields, such as sensors, actuators, robotics, aerospace, civil engineering, and medicine. Many conventional, unconventional, experimental, and numerical methods have been used to study the properties of SMAs, their models, and their different applications. These materials exhibit nonlinear behavior. This fact complicates the use of traditional methods, such as the finite element method, and increases the computing time necessary to adequately model their different possible shapes and usages. Therefore, a promising solution is to develop new methodological approaches based on artificial intelligence (AI) that aims at efficient computation time and accurate results. AI has recently demonstrated some success in efficiently modeling SMA features with machine- and deep-learning methods. Notably, artificial neural networks (ANNs), a subsection of deep learning, have been applied to characterize SMAs. The present review highlights the importance of AI in SMA modeling and introduces the deep connection between ANNs and SMAs in the medical, robotic, engineering, and automation fields. After summarizing the general characteristics of ANNs and SMAs, we analyze various ANN types used for modeling the properties of SMAs according to their shapes, e.g., a wire as an actuator, a wire with a spring bias, wire systems, magnetic and porous materials, bars and rings, and reinforced concrete beams. The description focuses on the techniques used for NN architectures and learning.
Collapse
|
4
|
Li M, Cao Z, Li Z. A Reinforcement Learning-Based Vehicle Platoon Control Strategy for Reducing Energy Consumption in Traffic Oscillations. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:5309-5322. [PMID: 33882007 DOI: 10.1109/tnnls.2021.3071959] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The vehicle platoon will be the most dominant driving mode on future roads. To the best of our knowledge, few reinforcement learning (RL) algorithms have been applied in vehicle platoon control, which has large-scale action and state spaces. Some RL-based methods were applied to solve single-agent problems. If we need to tackle multiagent problems, we will use multiagent RL algorithms since the parameters space grows exponentially with the increasing number of agents involved. Previous multiagent RL algorithms generally may provide redundant information to agents, indicating a large amount of useless or unrelated information, which may cause to be difficult for convergence training and pattern extractions from shared information. Also, random actions usually contribute to crashes, especially at the beginning of training. In this study, a communication proximal policy optimization (CommPPO) algorithm was proposed to tackle the above issues. In specific, the CommPPO model adopts a parameter-sharing structure to allow the dynamic variation of agent numbers, which can well handle various platoon dynamics, including splitting and merging. The communication protocol of the CommPPO consists of two parts. In the state part, the widely used predecessor-leader follower typology in the platoon is adopted to transmit global and local state information to agents. In the reward part, a new reward communication channel is proposed to solve the spurious reward and "lazy agent" problems in some existing multiagent RLs. Moreover, a curriculum learning approach is adopted to reduce crashes and speed up training. To validate the proposed strategy for platoon control, two existing multiagent RLs and a traditional platoon control strategy were applied in the same scenarios for comparison. Results showed that the CommPPO algorithm gained more rewards and achieved the largest fuel consumption reduction (11.6%).
Collapse
|
5
|
Zhang Z, Zheng L, Qiu T. A gain-adjustment neural network based time-varying underdetermined linear equation solving method. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.096] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
6
|
Kong Y, Jiang Y, Han R, Wu H. A generalized varying-parameter recurrent neural network for super solution of quadratic programming problem. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.01.084] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
7
|
Banerjee A, Mukherjee J, Un Nabi M, Kar IN. An artificial delay based robust guidance strategy for an interceptor with input saturation. ISA TRANSACTIONS 2021; 109:34-48. [PMID: 33012535 DOI: 10.1016/j.isatra.2020.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Revised: 06/25/2020] [Accepted: 09/24/2020] [Indexed: 06/11/2023]
Abstract
This paper proposes a time-energy efficient, artificial time delay based robust guidance strategy with input saturation for a two-dimensional interceptor problem. A reference near optimal heading trajectory is generated by applying Differential Evolution (DE) to the interceptor problem. By following the reference heading trajectory, the robust control law guides the missile to intercept the target in a time-energy efficient manner, while tackling the disturbances and uncertainties that it might encounter. The near optimal trajectory is obtained offline, whereas the robust guidance strategy has been applied online which further increases the appeal of the proposed guidance scheme. Uniformly ultimately bounded (UUB) stability has been affirmed for the closed loop system employing Lyapunov's method. Also, the proposed guidance law has been tested through simulation on both non-maneuvering and maneuvering targets performing bank to bank as well as step maneuver in the presence of uncertainties.
Collapse
Affiliation(s)
- Arunava Banerjee
- Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, India.
| | - Joyjit Mukherjee
- Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, India
| | - Mashuq Un Nabi
- Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, India
| | - Indra Narayan Kar
- Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, India
| |
Collapse
|
8
|
Kang E, Qiao H, Gao J, Yang W. Neural network-based model predictive tracking control of an uncertain robotic manipulator with input constraints. ISA TRANSACTIONS 2021; 109:89-101. [PMID: 33616059 DOI: 10.1016/j.isatra.2020.10.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 07/09/2020] [Accepted: 10/03/2020] [Indexed: 06/12/2023]
Abstract
This paper proposes a neural network-based model predictive control (MPC) method for robotic manipulators with model uncertainty and input constraints. In the presented NN-based MPC structure, two groups of radial basis function neural networks (RBFNNs) are considered for online model estimation and effective optimization. The first group of RBFNNs is introduced as a predictive model for the robotic system with online learning strategies for handling the system uncertainty and improving the model estimation accuracy. The second one is developed for solving the optimization problem. By taking into account an actor-critic scheme with different weights and the same activation function, adaptive learning strategies are established for balancing between optimal tracking performance and predictive system stability. In addition, aiming at guaranteeing the input constraints, a nonquadratic cost function is adopted for the NN-based MPC. The ultimately uniformly boundedness (UUB) of all variables is verified through the Lyapunov approach. Simulation studies are conducted to explain the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Erlong Kang
- The State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; Beijing Key Laboratory of Research and Application for Robotic Intelligence of Hand-Eye-Brain Interaction, Beijing 100190, China
| | - Hong Qiao
- The State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai 200031, China.
| | - Jie Gao
- The State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; Beijing Key Laboratory of Research and Application for Robotic Intelligence of Hand-Eye-Brain Interaction, Beijing 100190, China
| | - Wenjing Yang
- State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha 410073, China
| |
Collapse
|
9
|
Kong Y, Jiang Y, Zhou J, Wu H. A time controlling neural network for time‐varying QP solving with application to kinematics of mobile manipulators. INT J INTELL SYST 2021. [DOI: 10.1002/int.22304] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Ying Kong
- Department of Information and Electronic Engineering Zhejiang University of Science and Technology Zhejiang China
| | - Yunliang Jiang
- Department of Information Engineering Huzhou University Huzhou China
| | - Junwen Zhou
- Department of Information and Electronic Engineering Zhejiang University of Science and Technology Zhejiang China
| | - Huifeng Wu
- Department of Intelligent and Software Technology Hangzhou Dianzi University Hangzhou China
| |
Collapse
|
10
|
Integrated Guidance and Control Using Model Predictive Control with Flight Path Angle Prediction against Pull-Up Maneuvering Target. SENSORS 2020; 20:s20113143. [PMID: 32498281 PMCID: PMC7313701 DOI: 10.3390/s20113143] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Revised: 05/28/2020] [Accepted: 05/29/2020] [Indexed: 11/25/2022]
Abstract
Integrated guidance and control using model predictive control against a maneuvering target is proposed. Equations of motion for terminal homing are developed with the consideration of short-period dynamics as well as actuator dynamics of a missile. The convex optimization problem is solved considering inequality constraints that consist of acceleration and look angle limits. A discrete-time extended Kalman filter is used to estimate the position of the target with a look angle as a measurement. This is utilized to form a flight-path angle of the target, and polynomial fitting is applied for prediction. Numerical simulation including a Monte Carlo simulation is performed to verify the performance of the proposed algorithm.
Collapse
|
11
|
Wang S, Yu H, Yu J, Na J, Ren X. Neural-Network-Based Adaptive Funnel Control for Servo Mechanisms With Unknown Dead-Zone. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:1383-1394. [PMID: 30387759 DOI: 10.1109/tcyb.2018.2875134] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper proposes an adaptive funnel control (FC) scheme for servo mechanisms with an unknown dead-zone. To improve the transient and steady-state performance, a modified funnel variable, which relaxes the limitation of the original FC (e.g., systems with relative degree 1 or 2), is developed using the tracking error to replace the scaling factor. Then, by applying the error transformation method, the original error is transformed into a new error variable which is used in the controller design. By using an improved funnel function in a dynamic surface control procedure, an adaptive funnel controller is proposed to guarantee that the output error remains within a predefined funnel boundary. A novel command filter technique is introduced by using the Levant differentiator to eliminate the "explosion of complexity" problem in the conventional backstepping procedure. Neural networks are used to approximate the unknown dead-zone and unknown nonlinear functions. Comparative experiments on a turntable servo mechanism confirm the effectiveness of the devised control method.
Collapse
|
12
|
A Neural Network-Based Model Reference Control Architecture for Oscillation Damping in Interconnected Power System. ENERGIES 2019. [DOI: 10.3390/en12193653] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In this paper, a model reference controller (MRC) based on a neural network (NN) is proposed for damping oscillations in electric power systems. Variation in reactive load, internal or external perturbation/faults, and asynchronization of the connected machine cause oscillations in power systems. If the oscillation is not damped properly, it will lead to a complete collapse of the power system. An MRC base unified power flow controller (UPFC) is proposed to mitigate the oscillations in 2-area, 4-machine interconnected power systems. The MRC controller is using the NN for training, as well as for plant identification. The proposed NN-based MRC controller is capable of damping power oscillations; hence, the system acquires a stable condition. The response of the proposed MRC is compared with the traditionally used proportional integral (PI) controller to validate its performance. The key performance indicator integral square error (ISE) and integral absolute error (IAE) of both controllers is calculated for single phase, two phase, and three phase faults. MATLAB/Simulink is used to implement and simulate the 2-area, 4-machine power system.
Collapse
|
13
|
Xu Q, Xu K, Li L, Yao X. Optimization of sand casting performance parameters and missing data prediction. ROYAL SOCIETY OPEN SCIENCE 2019; 6:181860. [PMID: 31598220 PMCID: PMC6731703 DOI: 10.1098/rsos.181860] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 07/15/2019] [Indexed: 06/10/2023]
Abstract
Due to a wide range of applications, sand casting occupies an important position in modern casting practice. The main purpose of this study was to optimize the performance parameters of sand casting based on grey relational analysis and predict the missing data using back propagation (BP) neural network. First, the influence of human factors was eliminated by adopting the objective entropy weight method, which also saved manpower. The larger variation degree in the evaluation indicators, indicating that the evaluated projects had good discrimination in this regard, the larger weight should be given to these evaluation indicators. Second, the performance parameters of sand casting were optimized based on grey relational analysis, providing a reference for sand milling. The larger the grey relational degree, the closer the evaluated project was to the ideal project. Third, this paper provided a new method for determining the number of hidden neurons in a network according to the mean square error of training samples, and venting quality was predicted based on BP neural network. The relevant theory was deduced before predicting missing data, such that there will be a general understanding regarding the prediction principle of BP neural network. Fourth, to demonstrate the validity of BP neural network adopted in the process of missing data prediction, grey system theory was applied to compare the result of missing data prediction.
Collapse
Affiliation(s)
| | - Kaili Xu
- Authors for correspondence: Kaili Xu e-mail:
| | | | - Xiwen Yao
- Authors for correspondence: Xiwen Yao e-mail:
| |
Collapse
|
14
|
Li Z, Yuan W, Zhao S, Yu Z, Kang Y, Chen CLP. Brain-Actuated Control of Dual-Arm Robot Manipulation With Relative Motion. IEEE Trans Cogn Dev Syst 2019. [DOI: 10.1109/tcds.2017.2770168] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
15
|
Patan K. Two stage neural network modelling for robust model predictive control. ISA TRANSACTIONS 2018; 72:56-65. [PMID: 29103594 DOI: 10.1016/j.isatra.2017.10.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Revised: 08/17/2017] [Accepted: 10/19/2017] [Indexed: 06/07/2023]
Abstract
The paper proposes a novel robust model predictive control scheme realized by means of artificial neural networks. The neural networks are used twofold: to design the so-called fundamental model of a plant and to catch uncertainty associated with the plant model. In order to simplify the optimization process carried out within the framework of predictive control an instantaneous linearization is applied which renders it possible to define the optimization problem in the form of constrained quadratic programming. Stability of the proposed control system is also investigated by showing that a cost function is monotonically decreasing with respect to time. Derived robust model predictive control is tested and validated on the example of a pneumatic servomechanism working at different operating regimes.
Collapse
Affiliation(s)
- Krzysztof Patan
- Institute of Control and Computation Engineering, University of Zielona Góra, ul. Szafrana 2, 65-516 Zielona Góra, Poland.
| |
Collapse
|
16
|
Wang L, Ge Y, Chen M, Fan Y. Dynamical balance optimization and control of biped robots in double-support phase under perturbing external forces. Neural Comput Appl 2017. [DOI: 10.1007/s00521-016-2316-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
17
|
Neural-approximation-based robust adaptive control of flexible air-breathing hypersonic vehicles with parametric uncertainties and control input constraints. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.01.093] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|