1
|
Kang L, Liu Y, Luo Y, Yang JZ, Yuan H, Zhu C. Approximate Policy Iteration With Deep Minimax Average Bellman Error Minimization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2288-2299. [PMID: 38194389 DOI: 10.1109/tnnls.2023.3346992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
In this work, we investigate the utilization of deep approximate policy iteration (DAPI) in estimating the optimal action-value function within the context of reinforcement learning, employing rectified linear unit (ReLU) ResNet as the underlying framework. The iterative process of DAPI incorporates the minimax average Bellman error minimization principle. It employs ReLU ResNet to estimate the fixed point of the Bellman equation, which is aligned with the estimated greedy policy. Through error propagation, we derive nonasymptotic error bounds between and the estimated function induced by the output greedy policy in DAPI. To effectively control the Bellman residual error, we address both the statistical and approximation errors associated with the -mixing dependent data derived from Markov decision processes, using the techniques of empirical process and deep approximation theory, respectively. Furthermore, we present a novel generalization bound for ReLU ResNet in the presence of dependent data, as well as an approximation bound for ReLU ResNet within the Hölder class. Notably, this approximation bound contributes to a significant improvement in the dependence on the ambient dimension, transitioning from an exponential relationship to a polynomial one. The derived nonasymptotic error bounds explicitly depend on factors such as the sample size, the ambient dimension (in polynomial terms), and the width and depth of the neural networks. Consequently, these bounds serve as valuable theoretical guidelines for appropriately setting the hyperparameters, thereby enabling the achievement of the desired convergence rate during the training process of DAPI.
Collapse
|
2
|
Li Y, Wang Y, Tan X. Highly valued subgoal generation for efficient goal-conditioned reinforcement learning. Neural Netw 2025; 181:106825. [PMID: 39488112 DOI: 10.1016/j.neunet.2024.106825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 08/08/2024] [Accepted: 10/14/2024] [Indexed: 11/04/2024]
Abstract
Goal-conditioned reinforcement learning is widely used in robot control, manipulating the robot to accomplish specific tasks by maximizing accumulated rewards. However, the useful reward signal is only received when the desired goal is reached, leading to the issue of sparse rewards and affecting the efficiency of policy learning. In this paper, we propose a method to generate highly valued subgoals for efficient goal-conditioned policy learning, enabling the development of smart home robots or automatic pilots in our daily life. The highly valued subgoals are conditioned on the context of the specific tasks and characterized by suitable complexity for efficient goal-conditioned action value learning. The context variable captures the latent representation of the particular tasks, allowing for efficient subgoal generation. Additionally, the goal-conditioned action values regularized by the self-adaptive ranges generate subgoals with suitable complexity. Compared to Hindsight Experience Replay that uniformly samples subgoals from visited trajectories, our method generates the subgoals based on the context of tasks with suitable difficulty for efficient policy training. Experimental results show that our method achieves stable performance in robotic environments compared to baseline methods.
Collapse
Affiliation(s)
- Yao Li
- School of Computer and Information Technology, Shanxi University, China.
| | - YuHui Wang
- Center of Excellence in GenAI, King Abdullah University of Science and Technology, Saudi Arabia.
| | - XiaoYang Tan
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, China.
| |
Collapse
|
3
|
Bai Y, Shao S, Zhang J, Zhao X, Fang C, Wang T, Wang Y, Zhao H. A Review of Brain-Inspired Cognition and Navigation Technology for Mobile Robots. CYBORG AND BIONIC SYSTEMS 2024; 5:0128. [PMID: 38938902 PMCID: PMC11210290 DOI: 10.34133/cbsystems.0128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 04/23/2024] [Indexed: 06/29/2024] Open
Abstract
Brain-inspired navigation technologies combine environmental perception, spatial cognition, and target navigation to create a comprehensive navigation research system. Researchers have used various sensors to gather environmental data and enhance environmental perception using multimodal information fusion. In spatial cognition, a neural network model is used to simulate the navigation mechanism of the animal brain and to construct an environmental cognition map. However, existing models face challenges in achieving high navigation success rate and efficiency. In addition, the limited incorporation of navigation mechanisms borrowed from animal brains necessitates further exploration. On the basis of the brain-inspired navigation process, this paper launched a systematic study on brain-inspired environment perception, brain-inspired spatial cognition, and goal-based navigation in brain-inspired navigation, which provides a new classification of brain-inspired cognition and navigation techniques and a theoretical basis for subsequent experimental studies. In the future, brain-inspired navigation technology should learn from more perfect brain-inspired mechanisms to improve its generalization ability and be simultaneously applied to large-scale distributed intelligent body cluster navigation. The multidisciplinary nature of brain-inspired navigation technology presents challenges, and multidisciplinary scholars must cooperate to promote the development of this technology.
Collapse
Affiliation(s)
- Yanan Bai
- School of Computer Science and Engineering,
Northeastern University, Shenyang 110819, China
- State Key Laboratory of Robotics, Shenyang Institute of Automation,
Chinese Academy of Sciences, Shenyang 110016, China
- Institutes for Robotics and Intelligent Manufacturing,
Chinese Academy of Sciences, Shenyang 110169, China
| | - Shiliang Shao
- State Key Laboratory of Robotics, Shenyang Institute of Automation,
Chinese Academy of Sciences, Shenyang 110016, China
- Institutes for Robotics and Intelligent Manufacturing,
Chinese Academy of Sciences, Shenyang 110169, China
| | - Jin Zhang
- School of Computer Science and Engineering,
Northeastern University, Shenyang 110819, China
- State Key Laboratory of Robotics, Shenyang Institute of Automation,
Chinese Academy of Sciences, Shenyang 110016, China
- Institutes for Robotics and Intelligent Manufacturing,
Chinese Academy of Sciences, Shenyang 110169, China
| | - Xianzhe Zhao
- School of Computer Science and Engineering,
Northeastern University, Shenyang 110819, China
- State Key Laboratory of Robotics, Shenyang Institute of Automation,
Chinese Academy of Sciences, Shenyang 110016, China
- Institutes for Robotics and Intelligent Manufacturing,
Chinese Academy of Sciences, Shenyang 110169, China
| | - Chuxi Fang
- School of Computer Science and Engineering,
Northeastern University, Shenyang 110819, China
- State Key Laboratory of Robotics, Shenyang Institute of Automation,
Chinese Academy of Sciences, Shenyang 110016, China
- Institutes for Robotics and Intelligent Manufacturing,
Chinese Academy of Sciences, Shenyang 110169, China
| | - Ting Wang
- State Key Laboratory of Robotics, Shenyang Institute of Automation,
Chinese Academy of Sciences, Shenyang 110016, China
- Institutes for Robotics and Intelligent Manufacturing,
Chinese Academy of Sciences, Shenyang 110169, China
| | - Yongliang Wang
- Department of Artificial Intelligence,
University of Groningen, Groningen 9747 AG, Netherlands
| | - Hai Zhao
- School of Computer Science and Engineering,
Northeastern University, Shenyang 110819, China
| |
Collapse
|
4
|
Wang R, Wang M, Zhao Q, Gong Y, Zuo L, Zheng X, Gao H. A Novel Obstacle Traversal Method for Multiple Robotic Fish Based on Cross-Modal Variational Autoencoders and Imitation Learning. Biomimetics (Basel) 2024; 9:221. [PMID: 38667232 PMCID: PMC11048022 DOI: 10.3390/biomimetics9040221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 04/01/2024] [Accepted: 04/02/2024] [Indexed: 04/28/2024] Open
Abstract
Precision control of multiple robotic fish visual navigation in complex underwater environments has long been a challenging issue in the field of underwater robotics. To address this problem, this paper proposes a multi-robot fish obstacle traversal technique based on the combination of cross-modal variational autoencoder (CM-VAE) and imitation learning. Firstly, the overall framework of the robotic fish control system is introduced, where the first-person view of the robotic fish is encoded into a low-dimensional latent space using CM-VAE, and then different latent features in the space are mapped to the velocity commands of the robotic fish through imitation learning. Finally, to validate the effectiveness of the proposed method, experiments are conducted on linear, S-shaped, and circular gate frame trajectories with both single and multiple robotic fish. Analysis reveals that the visual navigation method proposed in this paper can stably traverse various types of gate frame trajectories. Compared to end-to-end learning and purely unsupervised image reconstruction, the proposed control strategy demonstrates superior performance, offering a new solution for the intelligent navigation of robotic fish in complex environments.
Collapse
Affiliation(s)
- Ruilong Wang
- School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
| | - Ming Wang
- School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
| | - Qianchuan Zhao
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Yanling Gong
- School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
| | - Lingchen Zuo
- School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
| | - Xuehan Zheng
- School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
| | - He Gao
- School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
- Shandong Zhengchen Technology Co., Ltd., Jinan 250101, China
| |
Collapse
|
5
|
Liu W, Niu H, Jang I, Herrmann G, Carrasco J. Distributed Neural Networks Training for Robotic Manipulation With Consensus Algorithm. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2732-2746. [PMID: 35853061 DOI: 10.1109/tnnls.2022.3191021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this article, we propose an algorithm that combines actor-critic-based off-policy method with consensus-based distributed training to deal with multiagent deep reinforcement learning problems. Specifically, convergence analysis of a consensus algorithm for a type of nonlinear system with a Lyapunov method is developed, and we use this result to analyze the convergence properties of the actor training parameters and the critic training parameters in our algorithm. Through the convergence analysis, it can be verified that all agents will converge to the same optimal model as the training time goes to infinity. To validate the implementation of our algorithm, a multiagent training framework is proposed to train each Universal Robot 5 (UR5) robot arm to reach the random target position. Finally, experiments are provided to demonstrate the effectiveness and feasibility of the proposed algorithm.
Collapse
|
6
|
Wu J, Zhou Y, Yang H, Huang Z, Lv C. Human-Guided Reinforcement Learning With Sim-to-Real Transfer for Autonomous Navigation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:14745-14759. [PMID: 37703148 DOI: 10.1109/tpami.2023.3314762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]
Abstract
Reinforcement learning (RL) is a promising approach in unmanned ground vehicles (UGVs) applications, but limited computing resource makes it challenging to deploy a well-behaved RL strategy with sophisticated neural networks. Meanwhile, the training of RL on navigation tasks is difficult, which requires a carefully-designed reward function and a large number of interactions, yet RL navigation can still fail due to many corner cases. This shows the limited intelligence of current RL methods, thereby prompting us to rethink combining RL with human intelligence. In this paper, a human-guided RL framework is proposed to improve RL performance both during learning in the simulator and deployment in the real world. The framework allows humans to intervene in RL's control progress and provide demonstrations as needed, thereby improving RL's capabilities. An innovative human-guided RL algorithm is proposed that utilizes a series of mechanisms to improve the effectiveness of human guidance, including human-guided learning objective, prioritized human experience replay, and human intervention-based reward shaping. Our RL method is trained in simulation and then transferred to the real world, and we develop a denoised representation for domain adaptation to mitigate the simulation-to-real gap. Our method is validated through simulations and real-world experiments to navigate UGVs in diverse and dynamic environments based only on tiny neural networks and image inputs. Our method performs better in goal-reaching and safety than existing learning- and model-based navigation approaches and is robust to changes in input features and ego kinetics. Furthermore, our method allows small-scale human demonstrations to be used to improve the trained RL agent and learn expected behaviors online.
Collapse
|
7
|
Hu T, Luo B, Yang C, Huang T. MO-MIX: Multi-Objective Multi-Agent Cooperative Decision-Making With Deep Reinforcement Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:12098-12112. [PMID: 37285257 DOI: 10.1109/tpami.2023.3283537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Deep reinforcement learning (RL) has been applied extensively to solve complex decision-making problems. In many real-world scenarios, tasks often have several conflicting objectives and may require multiple agents to cooperate, which are the multi-objective multi-agent decision-making problems. However, only few works have been conducted on this intersection. Existing approaches are limited to separate fields and can only handle multi-agent decision-making with a single objective, or multi-objective decision-making with a single agent. In this paper, we propose MO-MIX to solve the multi-objective multi-agent reinforcement learning (MOMARL) problem. Our approach is based on the centralized training with decentralized execution (CTDE) framework. A weight vector representing preference over the objectives is fed into the decentralized agent network as a condition for local action-value function estimation, while a mixing network with parallel architecture is used to estimate the joint action-value function. In addition, an exploration guide approach is applied to improve the uniformity of the final non-dominated solutions. Experiments demonstrate that the proposed method can effectively solve the multi-objective multi-agent cooperative decision-making problem and generate an approximation of the Pareto set. Our approach not only significantly outperforms the baseline method in all four kinds of evaluation metrics, but also requires less computational cost.
Collapse
|
8
|
Li H, Luo B, Song W, Yang C. Predictive hierarchical reinforcement learning for path-efficient mapless navigation with moving target. Neural Netw 2023; 165:677-688. [PMID: 37385022 DOI: 10.1016/j.neunet.2023.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/18/2023] [Accepted: 06/04/2023] [Indexed: 07/01/2023]
Abstract
Deep reinforcement learning (DRL) has been proven as a powerful approach for robot navigation over the past few years. DRL-based navigation does not require the pre-construction of a map, instead, high-performance navigation skills can be learned from trial-and-error experiences. However, recent DRL-based approaches mostly focus on a fixed navigation target. It is noted that when navigating to a moving target without maps, the performance of the standard RL structure drops dramatically on both the success rate and path efficiency. To address the mapless navigation problem with moving target, the predictive hierarchical DRL (pH-DRL) framework is proposed by integrating the long-term trajectory prediction to provide a cost-effective solution. In the proposed framework, the lower-level policy of the RL agent learns robot control actions to a specified goal, and the higher-level policy learns to make long-range planning of shorter navigation routes by sufficiently exploiting the predicted trajectories. By means of making decisions over two level of policies, the pH-DRL framework is robust to the unavoidable errors in long-term predictions. With the application of deep deterministic policy gradient (DDPG) for policy optimization, the pH-DDPG algorithm is developed based on the pH-DRL structure. Finally, through comparative experiments on the Gazebo simulator with several variants of the DDPG algorithm, the results demonstrate that the pH-DDPG outperforms other algorithms and achieves a high success rate and efficiency even though the target moves fast and randomly.
Collapse
Affiliation(s)
- Hanxiao Li
- School of Automation, Central South University, Changsha 410083, China.
| | - Biao Luo
- School of Automation, Central South University, Changsha 410083, China.
| | - Wei Song
- Research Center for Intelligent Robotics, Research Institute of Interdisciplinary Innovation, Zhejiang Laboratory, Hangzhou 311100, China.
| | - Chunhua Yang
- School of Automation, Central South University, Changsha 410083, China.
| |
Collapse
|
9
|
Wiyatno RR, Xu A, Paull L. Lifelong Topological Visual Navigation. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3189164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Rey Reza Wiyatno
- Montréal Robotics and Embodied AI Lab (REAL) and DIRO, University of Montréal, Montreal, QC, Canada
| | - Anqi Xu
- Element AI, Montreal, QC, Canada
| | - Liam Paull
- Montréal Robotics and Embodied AI Lab (REAL) and DIRO, University of Montréal, Montreal, QC, Canada
| |
Collapse
|
10
|
Zhang H, Cheng J, Zhang L, Li Y, Zhang W. H2GNN: Hierarchical-Hops Graph Neural Networks for Multi-Robot Exploration in Unknown Environments. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3146912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
11
|
Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10796-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|