1
|
Wang Y, Xie Y, Xu D, Shi J, Fang S, Gui W. Heuristic dense reward shaping for learning-based map-free navigation of industrial automatic mobile robots. ISA TRANSACTIONS 2025; 156:579-596. [PMID: 39542762 DOI: 10.1016/j.isatra.2024.10.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 09/14/2024] [Accepted: 10/25/2024] [Indexed: 11/17/2024]
Abstract
This paper presents a map-free navigation approach for industrial automatic mobile robots (AMRs), designed to ensure computational efficiency, cost-effectiveness, and adaptability. Utilizing deep reinforcement learning (DRL), the system enables real-time decision-making without fixed markers or frequent map updates. The central contribution is the Heuristic Dense Reward Shaping (HDRS), inspired by potential field methods, which integrates domain knowledge to improve learning efficiency and minimize suboptimal actions. To address the simulation-to-reality gap, data augmentation with controlled sensor noise is applied during training, ensuring robustness and generalization for real-world deployment without fine-tuning. Training results underscore HDRS's superior convergence speed, training stability, and policy learning efficiency compared to baselines. Simulation and real-world evaluations establish HDRS-DRL as a competitive alternative, outperforming traditional approaches, and offering practical applicability in industrial settings.
Collapse
Affiliation(s)
- Yizhi Wang
- School of Automation, Central South University, Changsha, 410083, China.
| | - Yongfang Xie
- School of Automation, Central South University, Changsha, 410083, China.
| | - Degang Xu
- School of Automation, Central South University, Changsha, 410083, China.
| | - Jiahui Shi
- School of Automation, Central South University, Changsha, 410083, China.
| | - Shiyu Fang
- School of Automation, Central South University, Changsha, 410083, China.
| | - Weihua Gui
- School of Automation, Central South University, Changsha, 410083, China.
| |
Collapse
|
2
|
Zhang T, Lin Z, Wang Y, Ye D, Fu Q, Yang W, Wang X, Liang B, Yuan B, Li X. Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:14588-14602. [PMID: 37285252 DOI: 10.1109/tnnls.2023.3280085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
A key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the reinforcement learning (RL) agent's behavior as the environment changes over its lifetime while minimizing the catastrophic forgetting of the learned information. To address this challenge, in this article, we propose DaCoRL, that is, dynamics-adaptive continual RL. DaCoRL learns a context-conditioned policy using progressive contextualization, which incrementally clusters a stream of stationary tasks in the dynamic environment into a series of contexts and opts for an expandable multihead neural network to approximate the policy. Specifically, we define a set of tasks with similar dynamics as an environmental context and formalize context inference as a procedure of online Bayesian infinite Gaussian mixture clustering on environment features, resorting to online Bayesian inference to infer the posterior distribution over contexts. Under the assumption of a Chinese restaurant process (CRP) prior, this technique can accurately classify the current task as a previously seen context or instantiate a new context as needed without relying on any external indicator to signal environmental changes in advance. Furthermore, we employ an expandable multihead neural network whose output layer is synchronously expanded with the newly instantiated context and a knowledge distillation regularization term for retaining the performance on learned tasks. As a general framework that can be coupled with various deep RL algorithms, DaCoRL features consistent superiority over existing methods in terms of stability, overall performance, and generalization ability, as verified by extensive experiments on several robot navigation and MuJoCo locomotion tasks.
Collapse
|
3
|
Ou W, Luo B, Wang B, Zhao Y. Modular hierarchical reinforcement learning for multi-destination navigation in hybrid crowds. Neural Netw 2024; 171:474-484. [PMID: 38154229 DOI: 10.1016/j.neunet.2023.12.032] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/28/2023] [Accepted: 12/18/2023] [Indexed: 12/30/2023]
Abstract
Real-world robot applications usually require navigating agents to face multiple destinations. Besides, the real-world crowded environments usually contain dynamic and static crowds that implicitly interact with each other during navigation. To address this challenging task, a novel modular hierarchical reinforcement learning (MHRL) method is developed in this paper. MHRL is composed of three modules, i.e., destination evaluation, policy switch, and motion network, which are designed exactly according to the three phases of solving the original navigation problem. First, the destination evaluation module rates all destinations and selects the one with the lowest cost. Subsequently, the policy switch module decides which motion network to be used according to the selected destination and the obstacle state. Finally, the selected motion network outputs the robot action. Owing to the complementary strengths of a variety of motion networks and the cooperation of modules in each layer, MHRL is able to deal with hybrid crowds effectively. Extensive simulation experiments demonstrate that MHRL achieves better performance than state-of-the-art methods.
Collapse
Affiliation(s)
- Wen Ou
- School of Automation, Central South University, Changsha 410083, China.
| | - Biao Luo
- School of Automation, Central South University, Changsha 410083, China.
| | - Bingchuan Wang
- School of Automation, Central South University, Changsha 410083, China.
| | - Yuqian Zhao
- School of Automation, Central South University, Changsha 410083, China.
| |
Collapse
|
4
|
Rousseas P, Bechlioulis C, Kyriakopoulos K. Reactive optimal motion planning for a class of holonomic planar agents using reinforcement learning with provable guarantees. Front Robot AI 2024; 10:1255696. [PMID: 38234864 PMCID: PMC10791867 DOI: 10.3389/frobt.2023.1255696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 10/20/2023] [Indexed: 01/19/2024] Open
Abstract
In control theory, reactive methods have been widely celebrated owing to their success in providing robust, provably convergent solutions to control problems. Even though such methods have long been formulated for motion planning, optimality has largely been left untreated through reactive means, with the community focusing on discrete/graph-based solutions. Although the latter exhibit certain advantages (completeness, complicated state-spaces), the recent rise in Reinforcement Learning (RL), provides novel ways to address the limitations of reactive methods. The goal of this paper is to treat the reactive optimal motion planning problem through an RL framework. A policy iteration RL scheme is formulated in a consistent manner with the control-theoretic results, thus utilizing the advantages of each approach in a complementary way; RL is employed to construct the optimal input without necessitating the solution of a hard, non-linear partial differential equation. Conversely, safety, convergence and policy improvement are guaranteed through control theoretic arguments. The proposed method is validated in simulated synthetic workspaces, and compared against reactive methods as well as a PRM and an RRT⋆ approach. The proposed method outperforms or closely matches the latter methods, indicating the near global optimality of the former, while providing a solution for planning from anywhere within the workspace to the goal position.
Collapse
Affiliation(s)
- Panagiotis Rousseas
- Control Systems Laboratory, School of Mechanical Engineering, National Technical University of Athens, Athens, Greece
| | - Charalampos Bechlioulis
- Division of Systems and Control, Department of Electrical and Computer Engineering, University of Patras, Patras, Greece
| | - Kostas Kyriakopoulos
- Center of AI & Robotics (CAIR), New York University, Abu Dhabi, United Arab Emirates
| |
Collapse
|
5
|
Tang Y, Zhao C, Wang J, Zhang C, Sun Q, Zheng WX, Du W, Qian F, Kurths J. Perception and Navigation in Autonomous Systems in the Era of Learning: A Survey. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9604-9624. [PMID: 35482692 DOI: 10.1109/tnnls.2022.3167688] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Autonomous systems possess the features of inferring their own state, understanding their surroundings, and performing autonomous navigation. With the applications of learning systems, like deep learning and reinforcement learning, the visual-based self-state estimation, environment perception, and navigation capabilities of autonomous systems have been efficiently addressed, and many new learning-based algorithms have surfaced with respect to autonomous visual perception and navigation. In this review, we focus on the applications of learning-based monocular approaches in ego-motion perception, environment perception, and navigation in autonomous systems, which is different from previous reviews that discussed traditional methods. First, we delineate the shortcomings of existing classical visual simultaneous localization and mapping (vSLAM) solutions, which demonstrate the necessity to integrate deep learning techniques. Second, we review the visual-based environmental perception and understanding methods based on deep learning, including deep learning-based monocular depth estimation, monocular ego-motion prediction, image enhancement, object detection, semantic segmentation, and their combinations with traditional vSLAM frameworks. Then, we focus on the visual navigation based on learning systems, mainly including reinforcement learning and deep reinforcement learning. Finally, we examine several challenges and promising directions discussed and concluded in related research of learning systems in the era of computer science and robotics.
Collapse
|
6
|
Zhang T, Wang X, Liang B, Yuan B. Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9925-9939. [PMID: 35439142 DOI: 10.1109/tnnls.2022.3162241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The powerful learning ability of deep neural networks enables reinforcement learning (RL) agents to learn competent control policies directly from continuous environments. In theory, to achieve stable performance, neural networks assume identically and independently distributed (i.i.d.) inputs, which unfortunately does not hold in the general RL paradigm where the training data are temporally correlated and nonstationary. This issue may lead to the phenomenon of "catastrophic interference" and the collapse in performance. In this article, we present interference-aware deep Q-learning (IQ) to mitigate catastrophic interference in single-task deep RL. Specifically, we resort to online clustering to achieve on-the-fly context division, together with a multihead network and a knowledge distillation regularization term for preserving the policy of learned contexts. Built upon deep Q networks (DQNs), IQ consistently boosts the stability and performance when compared to existing methods, verified with extensive experiments on classic control and Atari tasks. The code is publicly available at https://github.com/ Sweety-dm/Interference-aware-Deep-Q-learning.
Collapse
|
7
|
Liu C, Xie S, Sui X, Huang Y, Ma X, Guo N, Yang F. PRM-D* Method for Mobile Robot Path Planning. SENSORS (BASEL, SWITZERLAND) 2023; 23:3512. [PMID: 37050570 PMCID: PMC10098883 DOI: 10.3390/s23073512] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/17/2023] [Accepted: 03/24/2023] [Indexed: 06/19/2023]
Abstract
Various navigation tasks involving dynamic scenarios require mobile robots to meet the requirements of a high planning success rate, fast planning, dynamic obstacle avoidance, and shortest path. PRM (probabilistic roadmap method), as one of the classical path planning methods, is characterized by simple principles, probabilistic completeness, fast planning speed, and the formation of asymptotically optimal paths, but has poor performance in dynamic obstacle avoidance. In this study, we use the idea of hierarchical planning to improve the dynamic obstacle avoidance performance of PRM by introducing D* into the network construction and planning process of PRM. To demonstrate the feasibility of the proposed method, we conducted simulation experiments using the proposed PRM-D* (probabilistic roadmap method and D*) method for maps of different complexity and compared the results with those obtained by classical methods such as SPARS2 (improving sparse roadmap spanners). The experiments demonstrate that our method is non-optimal in terms of path length but second only to graph search methods; it outperforms other methods in static planning, with an average planning time of less than 1 s, and in terms of the dynamic planning speed, our method is two orders of magnitude faster than the SPARS2 method, with a single dynamic planning time of less than 0.02 s. Finally, we deployed the proposed PRM-D* algorithm on a real vehicle for experimental validation. The experimental results show that the proposed method was able to perform the navigation task in a real-world scenario.
Collapse
Affiliation(s)
- Chunyang Liu
- School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.L.)
- Longmen Laboratory, Luoyang 471000, China
| | - Saibao Xie
- School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.L.)
| | - Xin Sui
- School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.L.)
- Key Laboratory of Mechanical Design and Transmission System of Henan Province, Henan University of Science and Technology, Luoyang 471003, China
| | - Yan Huang
- School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.L.)
| | - Xiqiang Ma
- School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.L.)
- Longmen Laboratory, Luoyang 471000, China
| | - Nan Guo
- School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.L.)
| | - Fang Yang
- School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.L.)
- Longmen Laboratory, Luoyang 471000, China
| |
Collapse
|
8
|
Wei C, Chen C, Tanner HG. Navigation functions with moving destinations and obstacles. Auton Robots 2023. [DOI: 10.1007/s10514-023-10088-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
|
9
|
Wu Q, Wang J, Liang J, Gong X, Manocha D. Image-Goal Navigation in Complex Environments via Modular Learning. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3178810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Qiaoyun Wu
- School of Artificial Intelligence, Anhui University, Hefei, Anhui, China
| | - Jun Wang
- College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China
| | - Jing Liang
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Xiaoxi Gong
- College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China
| | - Dinesh Manocha
- Department of Computer Science, University of Maryland, College Park, MD, USA
| |
Collapse
|
10
|
Rousseas P, Bechlioulis CP, Kyriakopoulos KJ. Optimal Motion Planning in Unknown Workspaces Using Integral Reinforcement Learning. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3178788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Panagiotis Rousseas
- School of Mechanical Engineering, Control Systems Laboratory, National Technical University of Athens, Athens, Greece
| | | | - Kostas J. Kyriakopoulos
- School of Mechanical Engineering, Control Systems Laboratory, National Technical University of Athens, Athens, Greece
| |
Collapse
|
11
|
Augmented Reality-Centered Position Navigation for Wearable Devices with Machine Learning Techniques. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:1083978. [PMID: 35432829 PMCID: PMC9010156 DOI: 10.1155/2022/1083978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 02/17/2022] [Accepted: 02/22/2022] [Indexed: 11/25/2022]
Abstract
People have always relied on some form of instrument to assist them to get to their destination, from hand-drawn maps and compasses to technology-based navigation systems. Many individuals these days have a smartphone with them at all times, making it a common part of their routine. Using GPS technology, these cellphones offer applications such as Google Maps that let people find their way around the outside world. Indoor navigation, on the other hand, does not offer the same level of precision. The development of indoor navigation systems is continuously ongoing. Bluetooth, Wi-Fi, RFID, and computer vision are some of the existing technologies used for interior navigation in current systems. In this article, we discuss the shortcomings of current indoor navigation solutions and offer an alternative approach based on augmented reality and ARCore. Navigating an indoor environment is made easier with ARCore, which brings augmented reality to your smartphone or tablet.
Collapse
|
12
|
Sorokin M, Tan J, Liu CK, Ha S. Learning to Navigate Sidewalks in Outdoor Environments. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3145947] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
13
|
Social Robot Navigation Tasks: Combining Machine Learning Techniques and Social Force Model. SENSORS 2021; 21:s21217087. [PMID: 34770395 PMCID: PMC8587852 DOI: 10.3390/s21217087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 10/12/2021] [Accepted: 10/15/2021] [Indexed: 11/26/2022]
Abstract
Social robot navigation in public spaces, buildings or private houses is a difficult problem that is not well solved due to environmental constraints (buildings, static objects etc.), pedestrians and other mobile vehicles. Moreover, robots have to move in a human-aware manner—that is, robots have to navigate in such a way that people feel safe and comfortable. In this work, we present two navigation tasks, social robot navigation and robot accompaniment, which combine machine learning techniques with the Social Force Model (SFM) allowing human-aware social navigation. The robots in both approaches use data from different sensors to capture the environment knowledge as well as information from pedestrian motion. The two navigation tasks make use of the SFM, which is a general framework in which human motion behaviors can be expressed through a set of functions depending on the pedestrians’ relative and absolute positions and velocities. Additionally, in both social navigation tasks, the robot’s motion behavior is learned using machine learning techniques: in the first case using supervised deep learning techniques and, in the second case, using Reinforcement Learning (RL). The machine learning techniques are combined with the SFM to create navigation models that behave in a social manner when the robot is navigating in an environment with pedestrians or accompanying a person. The validation of the systems was performed with a large set of simulations and real-life experiments with a new humanoid robot denominated IVO and with an aerial robot. The experiments show that the combination of SFM and machine learning can solve human-aware robot navigation in complex dynamic environments.
Collapse
|
14
|
Huang X, Deng H, Zhang W, Song R, Li Y. Towards Multi-Modal Perception-Based Navigation: A Deep Reinforcement Learning Method. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2021.3064461] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
15
|
Zhang T, Mo H. Reinforcement learning for robot research: A comprehensive review and open issues. INT J ADV ROBOT SYST 2021. [DOI: 10.1177/17298814211007305] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Applying the learning mechanism of natural living beings to endow intelligent robots with humanoid perception and decision-making wisdom becomes an important force to promote the revolution of science and technology in robot domains. Advances in reinforcement learning (RL) over the past decades have led robotics to be highly automated and intelligent, which ensures safety operation instead of manual work and implementation of more intelligence for many challenging tasks. As an important branch of machine learning, RL can realize sequential decision-making under uncertainties through end-to-end learning and has made a series of significant breakthroughs in robot applications. In this review article, we cover RL algorithms from theoretical background to advanced learning policies in different domains, which accelerate to solving practical problems in robotics. The challenges, open issues, and our thoughts on future research directions of RL are also presented to discover new research areas with the objective to motivate new interest.
Collapse
Affiliation(s)
- Tengteng Zhang
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China
| | - Hongwei Mo
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China
| |
Collapse
|
16
|
iADA*-RL: Anytime Graph-Based Path Planning with Deep Reinforcement Learning for an Autonomous UAV. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11093948] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Path planning algorithms are of paramount importance in guidance and collision systems to provide trustworthiness and safety for operations of autonomous unmanned aerial vehicles (UAV). Previous works showed different approaches mostly focusing on shortest path discovery without a sufficient consideration on local planning and collision avoidance. In this paper, we propose a hybrid path planning algorithm that uses an anytime graph-based path planning algorithm for global planning and deep reinforcement learning for local planning which applied for a real-time mission planning system of an autonomous UAV. In particular, we aim to achieve a highly autonomous UAV mission planning system that is adaptive to real-world environments consisting of both static and moving obstacles for collision avoidance capabilities. To achieve adaptive behavior for real-world problems, a simulator is required that can imitate real environments for learning. For this reason, the simulator must be sufficiently flexible to allow the UAV to learn about the environment and to adapt to real-world conditions. In our scheme, the UAV first learns about the environment via a simulator, and only then is it applied to the real-world. The proposed system is divided into two main parts: optimal flight path generation and collision avoidance. A hybrid path planning approach is developed by combining a graph-based path planning algorithm with a learning-based algorithm for local planning to allow the UAV to avoid a collision in real time. The global path planning problem is solved in the first stage using a novel anytime incremental search algorithm called improved Anytime Dynamic A* (iADA*). A reinforcement learning method is used to carry out local planning between waypoints, to avoid any obstacles within the environment. The developed hybrid path planning system was investigated and validated in an AirSim environment. A number of different simulations and experiments were performed using AirSim platform in order to demonstrate the effectiveness of the proposed system for an autonomous UAV. This study helps expand the existing research area in designing efficient and safe path planning algorithms for UAVs.
Collapse
|
17
|
Gao J, Ye W, Guo J, Li Z. Deep Reinforcement Learning for Indoor Mobile Robot Path Planning. SENSORS (BASEL, SWITZERLAND) 2020; 20:E5493. [PMID: 32992750 PMCID: PMC7582363 DOI: 10.3390/s20195493] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 09/15/2020] [Accepted: 09/23/2020] [Indexed: 11/17/2022]
Abstract
This paper proposes a novel incremental training mode to address the problem of Deep Reinforcement Learning (DRL) based path planning for a mobile robot. Firstly, we evaluate the related graphic search algorithms and Reinforcement Learning (RL) algorithms in a lightweight 2D environment. Then, we design the algorithm based on DRL, including observation states, reward function, network structure as well as parameters optimization, in a 2D environment to circumvent the time-consuming works for a 3D environment. We transfer the designed algorithm to a simple 3D environment for retraining to obtain the converged network parameters, including the weights and biases of deep neural network (DNN), etc. Using these parameters as initial values, we continue to train the model in a complex 3D environment. To improve the generalization of the model in different scenes, we propose to combine the DRL algorithm Twin Delayed Deep Deterministic policy gradients (TD3) with the traditional global path planning algorithm Probabilistic Roadmap (PRM) as a novel path planner (PRM+TD3). Experimental results show that the incremental training mode can notably improve the development efficiency. Moreover, the PRM+TD3 path planner can effectively improve the generalization of the model.
Collapse
Affiliation(s)
| | | | - Jing Guo
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China; (J.G.); (W.Y.); (Z.L.)
| | | |
Collapse
|