1
|
Shao M, Zhu H, Zhao D, Han K, Jiang F, Liu S, Zhang W. Progressively Learning to Reach Remote Goals by Continuously Updating Boundary Goals. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:9238-9252. [PMID: 39302800 DOI: 10.1109/tnnls.2024.3428323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
Training an effective policy on complex goal-reaching tasks with sparse rewards is an open challenge. It is more difficult for the task of reaching remote goals (RRG), as the unavailability of the original rewards and large Wasserstein distance between the distributions of desired goals and initial states make existing methods for common goal-reaching tasks inefficient or even completely ineffective. In this article, we propose progressively learning to reach remote goals by continuously updating boundary goals (PLUB), which solves RRG tasks by reducing the Wasserstein distance between the distributions of boundary goals and desired goals. Specifically, the concept of boundary goal is introduced, which is the set of the closest achieved goals for each desired goal. In addition, to reduce the computational complexity caused by the Wasserstein distance, the closest moving distance is introduced, which is its upper bound, and also the expectation of the distance between the desired goal and the closest boundary goal. By selecting the appropriate intermediate goal from all boundary goals and continuously updating boundary goals, both the closest moving distance and the Wasserstein distance can be reduced. As a result, RRG tasks degenerate into common goal-reaching tasks that can be efficiently solved by a combination of hindsight relabeling and the learning from demonstrations (LfD) method. Extensive experiments on several robotic manipulation tasks demonstrate that PLUB can bring substantial improvements over the existing methods.
Collapse
|
2
|
Liu J, Sun W, Liu C, Yang H, Zhang X, Mian A. MH6D: Multi-Hypothesis Consistency Learning for Category-Level 6-D Object Pose Estimation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4820-4833. [PMID: 38356214 DOI: 10.1109/tnnls.2024.3360712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/16/2024]
Abstract
Six-degree-of-freedom (6DoF) object pose estimation is a crucial task for virtual reality and accurate robotic manipulation. Category-level 6DoF pose estimation has recently become popular as it improves generalization to a complete category of objects. However, current methods focus on data-driven differential learning, which makes them highly dependent on the quality of the real-world labeled data and limits their ability to generalize to unseen objects. To address this problem, we propose multi-hypothesis (MH) consistency learning (MH6D) for category-level 6-D object pose estimation without using real-world training data. MH6D uses a parallel consistency learning structure, alleviating the uncertainty problem of single-shot feature extraction and promoting self-adaptation of domain to reduce the synthetic-to-real domain gap. Specifically, three randomly sampled pose transformations are first performed in parallel on the input point cloud. An attention-guided category-level 6-D pose estimation network with channel attention (CA) and global feature cross-attention (GFCA) modules is then proposed to estimate the three hypothesized 6-D object poses by extracting and fusing the global and local features effectively. Finally, we propose a novel loss function that considers both the process and the final result information allowing MH6D to perform robust consistency learning. We conduct experiments under two different training data settings (i.e., only synthetic data and synthetic and real-world data) to verify the generalization ability of MH6D. Extensive experiments on benchmark datasets demonstrate that MH6D achieves state-of-the-art (SOTA) performance, outperforming most data-driven methods even without using any real-world data. The code is available at https://github.com/CNJianLiu/MH6D.
Collapse
|
3
|
Li Y, Wang Y, Tan X. Highly valued subgoal generation for efficient goal-conditioned reinforcement learning. Neural Netw 2025; 181:106825. [PMID: 39488112 DOI: 10.1016/j.neunet.2024.106825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 08/08/2024] [Accepted: 10/14/2024] [Indexed: 11/04/2024]
Abstract
Goal-conditioned reinforcement learning is widely used in robot control, manipulating the robot to accomplish specific tasks by maximizing accumulated rewards. However, the useful reward signal is only received when the desired goal is reached, leading to the issue of sparse rewards and affecting the efficiency of policy learning. In this paper, we propose a method to generate highly valued subgoals for efficient goal-conditioned policy learning, enabling the development of smart home robots or automatic pilots in our daily life. The highly valued subgoals are conditioned on the context of the specific tasks and characterized by suitable complexity for efficient goal-conditioned action value learning. The context variable captures the latent representation of the particular tasks, allowing for efficient subgoal generation. Additionally, the goal-conditioned action values regularized by the self-adaptive ranges generate subgoals with suitable complexity. Compared to Hindsight Experience Replay that uniformly samples subgoals from visited trajectories, our method generates the subgoals based on the context of tasks with suitable difficulty for efficient policy training. Experimental results show that our method achieves stable performance in robotic environments compared to baseline methods.
Collapse
Affiliation(s)
- Yao Li
- School of Computer and Information Technology, Shanxi University, China.
| | - YuHui Wang
- Center of Excellence in GenAI, King Abdullah University of Science and Technology, Saudi Arabia.
| | - XiaoYang Tan
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, China.
| |
Collapse
|
4
|
Qi Y, Jiang A, Gao Y. A Gaussian convolutional optimization algorithm with tent chaotic mapping. Sci Rep 2024; 14:31027. [PMID: 39730896 DOI: 10.1038/s41598-024-82277-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Accepted: 12/04/2024] [Indexed: 12/29/2024] Open
Abstract
To solve the problems of the traditional convolution optimization algorithm (COA), which are its slow convergence speed and likelihood of falling into local optima, a Gaussian mutation convolution optimization algorithm based on tent chaotic mapping (TCOA) is proposed in this article. First, the tent chaotic strategy is employed for the initialization of individual positions to ensure a uniform distribution of the population across a feasible search space. Subsequently, a Gaussian convolution kernel is used for an extensive depth search within the search space to mitigate the likelihood of any individuals converging to a local optimum. The proposed approach is validated by simulation using 23 benchmark functions and six recent evolutionary algorithms. The simulation results show that the TCOA achieves superior results in low-dimensional optimization problems and solves practical, spring-related industrial design problems. This algorithm has important applications to solving optimization problems.
Collapse
Affiliation(s)
- Yanying Qi
- Hangzhou Dianzi University, Baiyang Street, Hangzhou, 310018, China
| | - Aipeng Jiang
- Hangzhou Dianzi University, Baiyang Street, Hangzhou, 310018, China.
| | - Yuhang Gao
- Hangzhou Dianzi University, Baiyang Street, Hangzhou, 310018, China
| |
Collapse
|
5
|
Wang Z, Wei Z. PT-KGNN: A framework for pre-training biomedical knowledge graphs with graph neural networks. Comput Biol Med 2024; 178:108768. [PMID: 38936076 DOI: 10.1016/j.compbiomed.2024.108768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 05/23/2024] [Accepted: 06/15/2024] [Indexed: 06/29/2024]
Abstract
Biomedical knowledge graphs (KGs) serve as comprehensive data repositories that contain rich information about nodes and edges, providing modeling capabilities for complex relationships among biological entities. Many approaches either learn node features through traditional machine learning methods, or leverage graph neural networks (GNNs) to directly learn features of target nodes in the biomedical KGs and utilize them for downstream tasks. Motivated by the pre-training technique in natural language processing (NLP), we propose a framework named PT-KGNN (Pre-Training the biomedical KG with GNNs) to learn embeddings of nodes in a broader context by applying GNNs on the biomedical KG. We design several experiments to evaluate the effectivity of our proposed framework and the impact of the scale of KGs. The results of tasks consistently improve as the scale of the biomedical KG used for pre-training increases. Pre-training on large-scale biomedical KGs significantly enhances the drug-drug interaction (DDI) and drug-disease association (DDA) prediction performance on the independent dataset. The embeddings derived from a larger biomedical KG have demonstrated superior performance compared to those obtained from a smaller KG. By applying pre-training techniques on biomedical KGs, rich semantic and structural information can be learned, leading to enhanced performance on downstream tasks. it is evident that pre-training techniques hold tremendous potential and wide-ranging applications in bioinformatics.
Collapse
Affiliation(s)
- Zhenxing Wang
- School of Data Science, Fudan University, 220 Handan Rd., Shanghai, 200433, China.
| | - Zhongyu Wei
- School of Data Science, Fudan University, 220 Handan Rd., Shanghai, 200433, China.
| |
Collapse
|
6
|
Hong T, Li W, Huang K. A reinforcement learning enhanced pseudo-inverse approach to self-collision avoidance of redundant robots. Front Neurorobot 2024; 18:1375309. [PMID: 38606052 PMCID: PMC11006967 DOI: 10.3389/fnbot.2024.1375309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 03/11/2024] [Indexed: 04/13/2024] Open
Abstract
Introduction Redundant robots offer greater flexibility compared to non-redundant ones but are susceptible to increased collision risks when the end-effector approaches the robot's own links. Redundant degrees of freedom (DoFs) present an opportunity for collision avoidance; however, selecting an appropriate inverse kinematics (IK) solution remains challenging due to the infinite possible solutions. Methods This study proposes a reinforcement learning (RL) enhanced pseudo-inverse approach to address self-collision avoidance in redundant robots. The RL agent is integrated into the redundancy resolution process of a pseudo-inverse method to determine a suitable IK solution for avoiding self-collisions during task execution. Additionally, an improved replay buffer is implemented to enhance the performance of the RL algorithm. Results Simulations and experiments validate the effectiveness of the proposed method in reducing the risk of self-collision in redundant robots. Conclusion The RL enhanced pseudo-inverse approach presented in this study demonstrates promising results in mitigating self-collision risks in redundant robots, highlighting its potential for enhancing safety and performance in robotic systems.
Collapse
Affiliation(s)
| | | | - Kai Huang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
7
|
Zang Y, Wang P, Zha F, Guo W, Li C, Sun L. Human skill knowledge guided global trajectory policy reinforcement learning method. Front Neurorobot 2024; 18:1368243. [PMID: 38559491 PMCID: PMC10978794 DOI: 10.3389/fnbot.2024.1368243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 02/27/2024] [Indexed: 04/04/2024] Open
Abstract
Traditional trajectory learning methods based on Imitation Learning (IL) only learn the existing trajectory knowledge from human demonstration. In this way, it can not adapt the trajectory knowledge to the task environment by interacting with the environment and fine-tuning the policy. To address this problem, a global trajectory learning method which combinines IL with Reinforcement Learning (RL) to adapt the knowledge policy to the environment is proposed. In this paper, IL is proposed to acquire basic trajectory skills, and then learns the agent will explore and exploit more policy which is applicable to the current environment by RL. The basic trajectory skills include the knowledge policy and the time stage information in the whole task space to help learn the time series of the trajectory, and are used to guide the subsequent RL process. Notably, neural networks are not used to model the action policy and the Q value of RL during the RL process. Instead, they are sampled and updated in the whole task space and then transferred to the networks after the RL process through Behavior Cloning (BC) to get continuous and smooth global trajectory policy. The feasibility and the effectiveness of the method was validated in a custom Gym environment of a flower drawing task. And then, we executed the learned policy in the real-world robot drawing experiment.
Collapse
Affiliation(s)
- Yajing Zang
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Pengfei Wang
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Fusheng Zha
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Wei Guo
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Chuanfeng Li
- School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin, China
| | - Lining Sun
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
8
|
Liang K, Zha F, Guo W, Liu S, Wang P, Sun L. Motion planning framework based on dual-agent DDPG method for dual-arm robots guided by human joint angle constraints. Front Neurorobot 2024; 18:1362359. [PMID: 38455735 PMCID: PMC10917907 DOI: 10.3389/fnbot.2024.1362359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 02/05/2024] [Indexed: 03/09/2024] Open
Abstract
Introduction Reinforcement learning has been widely used in robot motion planning. However, for multi-step complex tasks of dual-arm robots, the trajectory planning method based on reinforcement learning still has some problems, such as ample exploration space, long training time, and uncontrollable training process. Based on the dual-agent depth deterministic strategy gradient (DADDPG) algorithm, this study proposes a motion planning framework constrained by the human joint angle, simultaneously realizing the humanization of learning content and learning style. It quickly plans the coordinated trajectory of dual-arm for complex multi-step tasks. Methods The proposed framework mainly includes two parts: one is the modeling of human joint angle constraints. The joint angle is calculated from the human arm motion data measured by the inertial measurement unit (IMU) by establishing a human-robot dual-arm kinematic mapping model. Then, the joint angle range constraints are extracted from multiple groups of demonstration data and expressed as inequalities. Second, the segmented reward function is designed. The human joint angle constraint guides the exploratory learning process of the reinforcement learning method in the form of step reward. Therefore, the exploration space is reduced, the training speed is accelerated, and the learning process is controllable to a certain extent. Results and discussion The effectiveness of the framework was verified in the gym simulation environment of the Baxter robot's reach-grasp-align task. The results show that in this framework, human experience knowledge has a significant impact on the guidance of learning, and this method can more quickly plan the coordinated trajectory of dual-arm for multi-step tasks.
Collapse
Affiliation(s)
| | | | | | | | - Pengfei Wang
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Lining Sun
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
9
|
Zang Y, Wang P, Zha F, Guo W, Zheng C, Sun L. Peg-in-hole assembly skill imitation learning method based on ProMPs under task geometric representation. Front Neurorobot 2023; 17:1320251. [PMID: 38023454 PMCID: PMC10666750 DOI: 10.3389/fnbot.2023.1320251] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 10/18/2023] [Indexed: 12/01/2023] Open
Abstract
Introduction Behavioral Cloning (BC) is a common imitation learning method which utilizes neural networks to approximate the demonstration action samples for task manipulation skill learning. However, in the real world, the demonstration trajectories from human are often sparse and imperfect, which makes it challenging to comprehensively learn directly from the demonstration action samples. Therefore, in this paper, we proposes a streamlined imitation learning method under the terse geometric representation to take good advantage of the demonstration data, and then realize the manipulation skill learning of assembly tasks. Methods We map the demonstration trajectories into the geometric feature space. Then we align the demonstration trajectories by Dynamic Time Warping (DTW) method to get the unified data sequence so we can segment them into several time stages. The Probability Movement Primitives (ProMPs) of the demonstration trajectories are then extracted, so we can generate a lot of task trajectories to be the global strategy action samples for training the neural networks. Notalby, we regard the current state of the assembly task as the via point of the ProMPs model to get the generated trajectories, while the time point of the via point is calculated according to the probability model of the different time stages. And we get the action of the current state according to the target position of the next time state. Finally, we train the neural network to obtain the global assembly strategy by Behavioral Cloning. Results We applied the proposed method to the peg-in-hole assembly task in the simulation environment based on Pybullet + Gym to test its task skill learning performance. And the learned assembly strategy was also executed on a real robotic platform to verify the feasibility of the method further. Discussion According to the result of the experiment, the proposed method achieves higher success rates compared to traditional imitation learning methods while exhibiting reasonable generalization capabilities. It shows that the ProMPs under geometric representation can help the BC method make better use of the demonstration trajectory and thus better learn the task skills.
Collapse
Affiliation(s)
- Yajing Zang
- School of Mechatronics Engineering, State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Pengfei Wang
- School of Mechatronics Engineering, State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Fusheng Zha
- School of Mechatronics Engineering, State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Wei Guo
- School of Mechatronics Engineering, State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Chao Zheng
- Wuhan Second Ship Design and Research Institute, Wuhan, China
| | - Lining Sun
- School of Mechatronics Engineering, State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
10
|
Angelidis E. A perspective on large-scale simulation as an enabler for novel biorobotics applications. Front Robot AI 2023; 10:1102286. [PMID: 37692531 PMCID: PMC10485252 DOI: 10.3389/frobt.2023.1102286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 08/15/2023] [Indexed: 09/12/2023] Open
Abstract
Our understanding of the complex mechanisms that power biological intelligence has been greatly enhanced through the explosive growth of large-scale neuroscience and robotics simulation tools that are used by the research community to perform previously infeasible experiments, such as the simulation of the neocortex's circuitry. Nevertheless, simulation falls far from being directly applicable to biorobots due to the large discrepancy between the simulated and the real world. A possible solution for this problem is the further enhancement of existing simulation tools for robotics, AI and neuroscience with multi-physics capabilities. Previously infeasible or difficult to simulate scenarios, such as robots swimming on the water surface, interacting with soft materials, walking on granular materials etc., would be rendered possible within a multi-physics simulation environment designed for robotics. In combination with multi-physics simulation, large-scale simulation tools that integrate multiple simulation modules in a closed-loop manner help address fundamental questions around the organization of neural circuits and the interplay between the brain, body and environment. We analyze existing designs for large-scale simulation running on cloud and HPC infrastructure as well as their shortcomings. Based on this analysis we propose a next-gen modular architecture design based on multi-physics engines, that we believe would greatly benefit biorobotics and AI.
Collapse
Affiliation(s)
- Emmanouil Angelidis
- Chair of Robotics, Artificial Intelligence and Embedded Systems, School of Informatics, Technical University of Munich, Munich, Germany
- Munich Research Center, Huawei Technologies Germany, Munich, Germany
| |
Collapse
|
11
|
Gu Y, Zheng C, Todoh M, Zha F. American Sign Language Translation Using Wearable Inertial and Electromyography Sensors for Tracking Hand Movements and Facial Expressions. Front Neurosci 2022; 16:962141. [PMID: 35937881 PMCID: PMC9345758 DOI: 10.3389/fnins.2022.962141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 06/17/2022] [Indexed: 11/29/2022] Open
Abstract
A sign language translation system can break the communication barrier between hearing-impaired people and others. In this paper, a novel American sign language (ASL) translation method based on wearable sensors was proposed. We leveraged inertial sensors to capture signs and surface electromyography (EMG) sensors to detect facial expressions. We applied a convolutional neural network (CNN) to extract features from input signals. Then, long short-term memory (LSTM) and transformer models were exploited to achieve end-to-end translation from input signals to text sentences. We evaluated two models on 40 ASL sentences strictly following the rules of grammar. Word error rate (WER) and sentence error rate (SER) are utilized as the evaluation standard. The LSTM model can translate sentences in the testing dataset with a 7.74% WER and 9.17% SER. The transformer model performs much better by achieving a 4.22% WER and 4.72% SER. The encouraging results indicate that both models are suitable for sign language translation with high accuracy. With complete motion capture sensors and facial expression recognition methods, the sign language translation system has the potential to recognize more sentences.
Collapse
Affiliation(s)
- Yutong Gu
- Graduate School of Engineering, Hokkaido University, Sapporo, Japan
- *Correspondence: Yutong Gu
| | - Chao Zheng
- Wuhan Second Ship Design and Research Institute, China State Shipbuilding Corporation Limited, Wuhan, China
| | - Masahiro Todoh
- Faculty of Engineering, Hokkaido University, Sapporo, Japan
| | - Fusheng Zha
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| |
Collapse
|