1
|
Wang X, Ma Z, Cao L, Ran D, Ji M, Sun K, Han Y, Li J. A planar tracking strategy based on multiple-interpretable improved PPO algorithm with few-shot technique. Sci Rep 2024; 14:3910. [PMID: 38365944 PMCID: PMC11315912 DOI: 10.1038/s41598-024-54268-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 02/10/2024] [Indexed: 02/18/2024] Open
Abstract
Facing to a planar tracking problem, a multiple-interpretable improved Proximal Policy Optimization (PPO) algorithm with few-shot technique is proposed, namely F-GBQ-PPO. Compared with the normal PPO, the main improvements of F-GBQ-PPO are to increase the interpretability, and reduce the consumption for real interaction samples. Considering to increase incomprehensibility of a tracking policy, three levels of interpretabilities has been studied, including the perceptual, logical and mathematical interpretabilities. Detailly speaking, it is realized through introducing a guided policy based on Apollonius circle, a hybrid exploration policy based on biological motions, and the update of external parameters based on quantum genetic algorithm. Besides, to deal with the potential lack of real interaction samples in real applications, a few-shot technique is contained in the algorithm, which mainly generate fake samples through a multi-dimension Gaussian process. By mixing fake samples with real ones in a certain proportion, the demand for real samples can be reduced.
Collapse
Affiliation(s)
- Xiao Wang
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Zhe Ma
- Intelligent Science & Technology, Academy Limited of CASIC, Beijing, 100043, China
- Key Lab of Aerospace Defense Intelligent System and Technology, Beijing, 100043, China
| | - Lu Cao
- National Innovation Institute of Defense Technology, Academy of Military Sciences, Beijing, 100071, China
| | - Dechao Ran
- National Innovation Institute of Defense Technology, Academy of Military Sciences, Beijing, 100071, China
| | - Mingjiang Ji
- National Innovation Institute of Defense Technology, Academy of Military Sciences, Beijing, 100071, China
| | - Kewu Sun
- Intelligent Science & Technology, Academy Limited of CASIC, Beijing, 100043, China
- Key Lab of Aerospace Defense Intelligent System and Technology, Beijing, 100043, China
| | - Yuying Han
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Jiake Li
- Intelligent Science & Technology, Academy Limited of CASIC, Beijing, 100043, China.
- Key Lab of Aerospace Defense Intelligent System and Technology, Beijing, 100043, China.
- National Innovation Institute of Defense Technology, Academy of Military Sciences, Beijing, 100071, China.
| |
Collapse
|
2
|
Zhang J, Zhang H, Ming Z, Mu Y. Adaptive Event-Triggered Time-Varying Output Bipartite Formation Containment of Multiagent Systems Under Directed Graphs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8909-8922. [PMID: 35436196 DOI: 10.1109/tnnls.2022.3154028] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The time-varying output bipartite formation containment (TVOBFC) problem for linear multiagent systems (MASs) under directed graphs is an important problem. However, the methods in existing works rely on the global information of the MASs or do not use event-triggered communication. This article investigates two kinds of TVOBFC problems for heterogeneous linear MASs under signed digraphs by event-triggered communication. For the first case where leaders have the same dynamics, the innovative fully distributed event-triggered protocol for the follower is proposed. In this case, the followers form the preset formation shape. For the second case where leaders have different dynamics, the leaders are divided into two groups. One group can directly obtain the output information of the virtual leader, while the other group cannot. In order to make leaders achieve the formation shape and track the virtual leader, two kinds of innovative observers are designed for two kinds of leaders to estimate the state of the virtual leader, and the control protocol is designed for each leader based on the designed observers. Then, the control law for each follower is designed to solve the formation containment problem. Finally, two examples are introduced to illustrate the main results.
Collapse
|
3
|
Zhang H, Ming Z, Yan Y, Wang W. Data-Driven Finite-Horizon H ∞ Tracking Control With Event-Triggered Mechanism for the Continuous-Time Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:4687-4701. [PMID: 34633936 DOI: 10.1109/tnnls.2021.3116464] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, the neural network (NN)-based adaptive dynamic programming (ADP) event-triggered control method is presented to obtain the near-optimal control policy for the model-free finite-horizon H∞ optimal tracking control problem with constrained control input. First, using available input-output data, a data-driven model is established by a recurrent NN (RNN) to reconstruct the unknown system. Then, an augmented system with event-triggered mechanism is obtained by a tracking error system and a command generator. We present a novel event-triggering condition without Zeno behavior. On this basis, the relationship between event-triggered Hamilton-Jacobi-Isaacs (HJI) equation and time-triggered HJI equation is given in Theorem 3. Since the solution of the HJI equation is time-dependent for the augmented system, the time-dependent activation functions of NNs are considered. Moreover, an extra error is incorporated to satisfy the terminal constraints of cost function. This adaptive control pattern finds, in real time, approximations of the optimal value while also ensuring the uniform ultimate boundedness of the closed-loop system. Finally, the effectiveness of the proposed near-optimal control pattern is verified by two simulation examples.
Collapse
|
4
|
Zhang J, Ding DW, Ren Y, Sun X. Distributed robust group output synchronization control for heterogeneous uncertain linear multi-agent systems. ISA TRANSACTIONS 2023; 134:108-121. [PMID: 36058719 DOI: 10.1016/j.isatra.2022.08.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 07/04/2022] [Accepted: 08/12/2022] [Indexed: 06/15/2023]
Abstract
This paper investigates the distributed robust group output synchronization problem of heterogeneous uncertain linear leader-follower multi-agent systems (MASs), whose followers have nonidentical and parameter uncertain dynamics. To achieve cooperative tracking with multiple targets, a new group synchronization framework based upon the output regulation technique is established. In the underlying directed communication topology, all nonidentical followers are divided into several subgroups. Meanwhile, each subgroup has its output tracking objective generated by an autonomous exosystem which is seen as the leader of each subgroup. Since not all followers can access their exosystems directly, the distributed exosystem observer based on the algebraic Riccati inequality (ARI) is designed to obtain the information of exosystems. Moreover, to compensate for parameter uncertainties for different group topologies, the p-copy internal model is synthesized into distributed control laws, i.e., dynamic state feedback control protocol under an acyclic directed graph and dynamic output feedback control protocol under a general directed graph. It is shown that group synchronization can be respectively achieved with these controllers under acyclic and general partitions regardless of parameter uncertainties. Finally, some examples are provided to verify the validity of the analytic results.
Collapse
Affiliation(s)
- Jie Zhang
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China; School of Electronics and Information Engineering, Taiyuan University of Science and Technology, Taiyuan, 030024, China
| | - Da-Wei Ding
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China; Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, Beijing 100083, China.
| | - Yingying Ren
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China; Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, Beijing 100083, China
| | - Xinmiao Sun
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China; Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, Beijing 100083, China
| |
Collapse
|
5
|
Zhang J, Zhou C, Xiao X, Chen W, Jiang Y, Zhu R, Xin T. Magnetic resonance imaging image analysis of the therapeutic effect and neuroprotective effect of deep brain stimulation in Parkinson's disease based on a deep learning algorithm. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2022; 38:e3642. [PMID: 36054274 PMCID: PMC9786712 DOI: 10.1002/cnm.3642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 07/19/2022] [Accepted: 07/21/2022] [Indexed: 06/15/2023]
Abstract
In order to study the therapeutic neuroprotective effect of deep brain stimulation (DBS) in Parkinson's disease (PD), based on the deep learning algorithm, this study combines with magnetic resonance imaging (MRI) image analysis technology to study the clinical efficacy of DBS in the surgical treatment of PD and the neuroprotective and neurological recovery effects after surgery. Establish a deep learning algorithm model based on MRI image analysis technology, comparison of UPDRS motor status assessment and the improvement of daily life ability before and after DBS surgery, evaluate the accuracy rate and the detection speed of the model. The models constructed in this study have an accuracy rate of more than 90% in the PD detection test, and the detection speed of the algorithm model under the condition of big data is between 60 and 200 ms. DBS significantly improve a series of clinical symptoms in patients with PD. The deep learning algorithm model based on MRI image analysis technology in this paper has a certain effect. DBS operation can improve the symptoms of PD, and has the effect of neuroprotection and neurological recovery.
Collapse
Affiliation(s)
- Jianzhong Zhang
- Department of NeurosurgeryThe First Affiliated Hospital of Nanchang Medical CollegeNanchangChina
| | - Chaoyang Zhou
- Department of NeurosurgeryThe First Affiliated Hospital of Nanchang Medical CollegeNanchangChina
| | - Xiang Xiao
- Department of NeurosurgeryThe First Affiliated Hospital of Nanchang Medical CollegeNanchangChina
| | - Weihua Chen
- Department of ImagingThe First Affiliated Hospital of Nanchang Medical CollegeNanchangChina
| | - Yi Jiang
- Network Information CenterThe First Affiliated Hospital of Nanchang Medical CollegeNanchangChina
| | - Ronglan Zhu
- Department of NeurosurgeryThe First Affiliated Hospital of Nanchang Medical CollegeNanchangChina
| | - Tao Xin
- Department of NeurosurgeryThe First Affiliated Hospital of Nanchang Medical CollegeNanchangChina
| |
Collapse
|
6
|
Goal representation adaptive critic design for discrete-time uncertain systems subjected to input constraints: The event-triggered case. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.12.057] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
7
|
Yuan L, Li T, Tong S, Xiao Y, Gao X. NN adaptive optimal tracking control for a class of uncertain nonstrict feedback nonlinear systems. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.03.049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
8
|
Wei Q, Han L, Zhang T. Spiking Adaptive Dynamic Programming Based on Poisson Process for Discrete-Time Nonlinear Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1846-1856. [PMID: 34143743 DOI: 10.1109/tnnls.2021.3085781] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, a new iterative spiking adaptive dynamic programming (SADP) method based on the Poisson process is developed to solve optimal impulsive control problems. For a fixed time interval, combining the Poisson process and the maximum likelihood estimation (MLE), the three-tuple of state, spiking interval, and probability of Poisson distribution can be computed, and then, the iterative value functions and iterative control laws can be obtained. A property analysis method is developed to show that the value functions converge to optimal performance index function as the iterative index increases from zero to infinity. Finally, two simulation examples are given to verify the effectiveness of the developed algorithm.
Collapse
|
9
|
Shahid AA, Piga D, Braghin F, Roveda L. Continuous control actions learning and adaptation for robotic manipulation through reinforcement learning. Auton Robots 2022. [DOI: 10.1007/s10514-022-10034-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractThis paper presents a learning-based method that uses simulation data to learn an object manipulation task using two model-free reinforcement learning (RL) algorithms. The learning performance is compared across on-policy and off-policy algorithms: Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). In order to accelerate the learning process, the fine-tuning procedure is proposed that demonstrates the continuous adaptation of on-policy RL to new environments, allowing the learned policy to adapt and execute the (partially) modified task. A dense reward function is designed for the task to enable an efficient learning of the agent. A grasping task involving a Franka Emika Panda manipulator is considered as the reference task to be learned. The learned control policy is demonstrated to be generalizable across multiple object geometries and initial robot/parts configurations. The approach is finally tested on a real Franka Emika Panda robot, showing the possibility to transfer the learned behavior from simulation. Experimental results show 100% of successful grasping tasks, making the proposed approach applicable to real applications.
Collapse
|
10
|
Zhang Y, Pan X, Wang Y. Category learning in a recurrent neural network with reinforcement learning. Front Psychiatry 2022; 13:1008011. [PMID: 36387007 PMCID: PMC9640766 DOI: 10.3389/fpsyt.2022.1008011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 10/10/2022] [Indexed: 11/13/2022] Open
Abstract
It is known that humans and animals can learn and utilize category information quickly and efficiently to adapt to changing environments, and several brain areas are involved in learning and encoding category information. However, it is unclear that how the brain system learns and forms categorical representations from the view of neural circuits. In order to investigate this issue from the network level, we combine a recurrent neural network with reinforcement learning to construct a deep reinforcement learning model to demonstrate how the category is learned and represented in the network. The model consists of a policy network and a value network. The policy network is responsible for updating the policy to choose actions, while the value network is responsible for evaluating the action to predict rewards. The agent learns dynamically through the information interaction between the policy network and the value network. This model was trained to learn six stimulus-stimulus associative chains in a sequential paired-association task that was learned by the monkey. The simulated results demonstrated that our model was able to learn the stimulus-stimulus associative chains, and successfully reproduced the similar behavior of the monkey performing the same task. Two types of neurons were found in this model: one type primarily encoded identity information about individual stimuli; the other type mainly encoded category information of associated stimuli in one chain. The two types of activity-patterns were also observed in the primate prefrontal cortex after the monkey learned the same task. Furthermore, the ability of these two types of neurons to encode stimulus or category information was enhanced during this model was learning the task. Our results suggest that the neurons in the recurrent neural network have the ability to form categorical representations through deep reinforcement learning during learning stimulus-stimulus associations. It might provide a new approach for understanding neuronal mechanisms underlying how the prefrontal cortex learns and encodes category information.
Collapse
Affiliation(s)
- Ying Zhang
- Institute for Cognitive Neurodynamics, East China University of Science and Technology, Shanghai, China
| | - Xiaochuan Pan
- Institute for Cognitive Neurodynamics, East China University of Science and Technology, Shanghai, China
| | - Yihong Wang
- Institute for Cognitive Neurodynamics, East China University of Science and Technology, Shanghai, China
| |
Collapse
|
11
|
Wang N, Gao Y, Zhang X. Data-Driven Performance-Prescribed Reinforcement Learning Control of an Unmanned Surface Vehicle. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:5456-5467. [PMID: 33606641 DOI: 10.1109/tnnls.2021.3056444] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
An unmanned surface vehicle (USV) under complicated marine environments can hardly be modeled well such that model-based optimal control approaches become infeasible. In this article, a self-learning-based model-free solution only using input-output signals of the USV is innovatively provided. To this end, a data-driven performance-prescribed reinforcement learning control (DPRLC) scheme is created to pursue control optimality and prescribed tracking accuracy simultaneously. By devising state transformation with prescribed performance, constrained tracking errors are substantially converted into constraint-free stabilization of tracking errors with unknown dynamics. Reinforcement learning paradigm using neural network-based actor-critic learning framework is further deployed to directly optimize controller synthesis deduced from the Bellman error formulation such that transformed tracking errors evolve a data-driven optimal controller. Theoretical analysis eventually ensures that the entire DPRLC scheme can guarantee prescribed tracking accuracy, subject to optimal cost. Both simulations and virtual-reality experiments demonstrate the remarkable effectiveness and superiority of the proposed DPRLC scheme.
Collapse
|
12
|
Wei Q, Li H, Yang X, He H. Continuous-Time Distributed Policy Iteration for Multicontroller Nonlinear Systems. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2372-2383. [PMID: 32248139 DOI: 10.1109/tcyb.2020.2979614] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, a novel distributed policy iteration algorithm is established for infinite horizon optimal control problems of continuous-time nonlinear systems. In each iteration of the developed distributed policy iteration algorithm, only one controller's control law is updated and the other controllers' control laws remain unchanged. The main contribution of the present algorithm is to improve the iterative control law one by one, instead of updating all the control laws in each iteration of the traditional policy iteration algorithms, which effectively releases the computational burden in each iteration. The properties of distributed policy iteration algorithm for continuous-time nonlinear systems are analyzed. The admissibility of the present methods has also been analyzed. Monotonicity, convergence, and optimality have been discussed, which show that the iterative value function is nonincreasingly convergent to the solution of the Hamilton-Jacobi-Bellman equation. Finally, numerical simulations are conducted to illustrate the effectiveness of the proposed method.
Collapse
|
13
|
Liang M, Wei Q. A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.11.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|