1
|
Zhou J, Zhu Q, Wang Y, Feng M, Liu J, Huang J, Mian A. A State Space Model for Multiobject Full 3-D Information Estimation From RGB-D Images. IEEE TRANSACTIONS ON CYBERNETICS 2025; 55:2248-2260. [PMID: 40106242 DOI: 10.1109/tcyb.2025.3548788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/22/2025]
Abstract
Visual understanding of 3-D objects is essential for robotic manipulation, autonomous navigation, and augmented reality. However, existing methods struggle to perform this task efficiently and accurately in an end-to-end manner. We propose a single-shot method based on the state space model (SSM) to predict the full 3-D information (pose, size, shape) of multiple 3-D objects from a single RGB-D image in an end-to-end manner. Our method first encodes long-range semantic information from RGB and depth images separately and then combines them into an integrated latent representation that is processed by a modified SSM to infer the full 3-D information in two separate task heads within a unified model. A heatmap/detection head predicts object centers, and a 3-D information head predicts a matrix detailing the pose, size and latent code of shape for each detected object. We also propose a shape autoencoder based on the SSM, which learns canonical shape codes derived from a large database of 3-D point cloud shapes. The end-to-end framework, modified SSM block and SSM-based shape autoencoder form major contributions of this work. Our design includes different scan strategies tailored to different input data representations, such as RGB-D images and point clouds. Extensive evaluations on the REAL275, CAMERA25, and Wild6D datasets show that our method achieves state-of-the-art performance. On the large-scale Wild6D dataset, our model significantly outperforms the nearest competitor, achieving 2.6% and 5.1% improvements on the IOU-50 and 5°10 cm metrics, respectively.
Collapse
|
2
|
Gao H, Zhao J, Hu J, Sun C. A Real-Time Grasping Detection Network Architecture for Various Grasping Scenarios. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:8215-8226. [PMID: 38980779 DOI: 10.1109/tnnls.2024.3419180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
In the field of robot grasping detection, due to uncertain factors such as different shapes, distinct colors, diverse materials, and various poses, robot grasping has become very challenging. This article introduces a integrated robotic system designed to address the challenge of grasping numerous unknown objects within a scene from a set of $\alpha $ -channel images. We propose a lightweight and object-independent pixel-level generative adaptive residual depthwise separable convolutional neural network (GARDSCN) with an inference speed of around 28 ms, which can be applied to real-time grasping detection. It can effectively deal with the grasping detection of unknown objects with different shapes and poses in various scenes and overcome the limitations of current robot grasping technology. The proposed network achieves 98.88% grasp detection accuracy on the Cornell dataset and 95.23% on the Jacquard dataset. To further verify the validity, the grasping experiment is conducted on a physical robot Kinova Gen2, and the grasp success rate is 96.67% in the single-object scene and 94.10% in the multiobject cluttered scene.
Collapse
|
3
|
Yu S, Zhai DH, Guan Y, Xia Y. Category-Level 6-D Object Pose Estimation With Shape Deformation for Robotic Grasp Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:1857-1871. [PMID: 37962999 DOI: 10.1109/tnnls.2023.3330011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
Category-level 6-D object pose estimation plays a crucial role in achieving reliable robotic grasp detection. However, the disparity between synthetic and real datasets hinders the direct transfer of models trained on synthetic data to real-world scenarios, leading to ineffective results. Additionally, creating large-scale real datasets is a time-consuming and labor-intensive task. To overcome these challenges, we propose CatDeform, a novel category-level object pose estimation network trained on synthetic data but capable of delivering good performance on real datasets. In our approach, we introduce a transformer-based fusion module that enables the network to leverage multiple sources of information and enhance prediction accuracy through feature fusion. To ensure proper deformation of the prior point cloud to align with scene objects, we propose a transformer-based attention module that deforms the prior point cloud from both geometric and feature perspectives. Building upon CatDeform, we design a two-branch network for supervised learning, bridging the gap between synthetic and real datasets and achieving high-precision pose estimation in real-world scenes using predominantly synthetic data supplemented with a small amount of real data. To minimize reliance on large-scale real datasets, we train the network in a self-supervised manner by estimating object poses in real scenes based on the synthetic dataset without manual annotation. We conduct training and testing on CAMERA25 and REAL275 datasets, and our experimental results demonstrate that the proposed method outperforms state-of-the-art (SOTA) techniques in both self-supervised and supervised training paradigms. Finally, we apply CatDeform to object pose estimation and robotic grasp experiments in real-world scenarios, showcasing a higher grasp success rate.
Collapse
|
4
|
Manawadu UA, Keitaro N. Dexterous Manipulation Based on Object Recognition and Accurate Pose Estimation Using RGB-D Data. SENSORS (BASEL, SWITZERLAND) 2024; 24:6823. [PMID: 39517721 PMCID: PMC11548730 DOI: 10.3390/s24216823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 10/08/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024]
Abstract
This study presents an integrated system for object recognition, six-degrees-of-freedom pose estimation, and dexterous manipulation using a JACO robotic arm with an Intel RealSense D435 camera. This system is designed to automate the manipulation of industrial valves by capturing point clouds (PCs) from multiple perspectives to improve the accuracy of pose estimation. The object recognition module includes scene segmentation, geometric primitives recognition, model recognition, and a color-based clustering and integration approach enhanced by a dynamic cluster merging algorithm. Pose estimation is achieved using the random sample consensus algorithm, which predicts position and orientation. The system was tested within a 60° field of view, which extended in all directions in front of the object. The experimental results show that the system performs reliably within acceptable error thresholds for both position and orientation when the objects are within a ±15° range of the camera's direct view. However, errors increased with more extreme object orientations and distances, particularly when estimating the orientation of ball valves. A zone-based dexterous manipulation strategy was developed to overcome these challenges, where the system adjusts the camera position for optimal conditions. This approach mitigates larger errors in difficult scenarios, enhancing overall system reliability. The key contributions of this research include a novel method for improving object recognition and pose estimation, a technique for increasing the accuracy of pose estimation, and the development of a robot motion model for dexterous manipulation in industrial settings.
Collapse
Affiliation(s)
- Udaka A. Manawadu
- Graduate School of Computer Science and Engineering, University of Aizu, Aizu-Wakamatsu, Fukushima 965-0006, Japan
| | - Naruse Keitaro
- Graduate School of Computer Science and Engineering, University of Aizu, Aizu-Wakamatsu, Fukushima 965-0006, Japan
| |
Collapse
|
5
|
Liu H, Jin F, Zeng H, Pu H, Fan B. Image Enhancement Guided Object Detection in Visually Degraded Scenes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:14164-14177. [PMID: 37220059 DOI: 10.1109/tnnls.2023.3274926] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Object detection accuracy degrades seriously in visually degraded scenes. A natural solution is to first enhance the degraded image and then perform object detection. However, it is suboptimal and does not necessarily lead to the improvement of object detection due to the separation of the image enhancement and object detection tasks. To solve this problem, we propose an image enhancement guided object detection method, which refines the detection network with an additional enhancement branch in an end-to-end way. Specifically, the enhancement branch and detection branch are organized in a parallel way, and a feature guided module is designed to connect the two branches, which optimizes the shallow feature of the input image in the detection branch to be as consistent as possible with that of the enhanced image. As the enhancement branch is frozen during training, such a design plays a role in using the features of enhanced images to guide the learning of object detection branch, so as to make the learned detection branch being aware of both image quality and object detection. When testing, the enhancement branch and feature guided module are removed, and so no additional computation cost is introduced for detection. Extensive experimental results, on underwater, hazy, and low-light object detection datasets, demonstrate that the proposed method can improve the detection performance of popular detection networks (YOLO v3, Faster R-CNN, DetectoRS) significantly in visually degraded scenes.
Collapse
|
6
|
Leung B, Billeschou P, Manoonpong P. Integrated Modular Neural Control for Versatile Locomotion and Object Transportation of a Dung Beetle-Like Robot. IEEE TRANSACTIONS ON CYBERNETICS 2024; 54:2062-2075. [PMID: 37028343 DOI: 10.1109/tcyb.2023.3249467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Dung beetles can effectively transport dung pallets of various sizes in any direction across uneven terrain. While this impressive ability can inspire new locomotion and object transportation solutions in multilegged (insect-like) robots, to date, most existing robots use their legs primarily to perform locomotion. Only a few robots can use their legs to achieve both locomotion and object transportation, although they are limited to specific object types/sizes (10%-65% of leg length) on flat terrain. Accordingly, we proposed a novel integrated neural control approach that, like dung beetles, pushes state-of-the-art insect-like robots beyond their current limits toward versatile locomotion and object transportation with different object types/sizes and terrains (flat and uneven). The control method is synthesized based on modular neural mechanisms, integrating central pattern generator (CPG)-based control, adaptive local leg control, descending modulation control, and object manipulation control. We also introduced an object transportation strategy combining walking and periodic hind leg lifting for soft object transportation. We validated our method on a dung beetle-like robot. Our results show that the robot can perform versatile locomotion and use its legs to transport hard and soft objects of various sizes (60%-70% of leg length) and weights (approximately 3%-115% of robot weight) on flat and uneven terrains. The study also suggests possible neural control mechanisms underlying the dung beetle Scarabaeus galenus' versatile locomotion and small dung pallet transportation.
Collapse
|
7
|
Yao S, Lu Y, Niu K, Dai J, Dong C, Zhang P. Semantic information processing for interoperability in the Industrial Internet of Things. FUNDAMENTAL RESEARCH 2024; 4:8-12. [PMID: 38933836 PMCID: PMC11197609 DOI: 10.1016/j.fmre.2023.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 05/17/2023] [Accepted: 06/11/2023] [Indexed: 06/28/2024] Open
Abstract
With the advent of the Internet of Everything (IoE), the concept of fully interconnected systems has become a reality, and the need for seamless communication and interoperability among different industrial systems has become more pressing than ever before. To address the challenges posed by massive data traffic, we demonstrate the potentials of semantic information processing in industrial manufacturing processes and then propose a brief framework of semantic processing and communication system for industrial network. In particular, the scheme is featured with task-orientation and collaborative processing. To illustrate its applicability, we provide examples of time series and images, as typical industrial data sources, for practical tasks, such as lifecycle estimation and surface defect detection. Simulation results show that semantic information processing achieves a more efficient way of information processing and exchanging, compared to conventional methods, which is crucial for handling the demands of future interconnected industrial networks.
Collapse
Affiliation(s)
- Shengshi Yao
- Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876, China
| | - Yanpeng Lu
- Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876, China
| | - Kai Niu
- The State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
| | - Jincheng Dai
- Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876, China
| | - Chao Dong
- Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876, China
| | - Ping Zhang
- The State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
| |
Collapse
|
8
|
Gorschlüter F, Rojtberg P, Pöllabauer T. A Survey of 6D Object Detection Based on 3D Models for Industrial Applications. J Imaging 2022; 8:jimaging8030053. [PMID: 35324608 PMCID: PMC8952329 DOI: 10.3390/jimaging8030053] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 02/17/2022] [Accepted: 02/18/2022] [Indexed: 11/16/2022] Open
Abstract
Six-dimensional object detection of rigid objects is a problem especially relevant for quality control and robotic manipulation in industrial contexts. This work is a survey of the state of the art of 6D object detection with these use cases in mind, specifically focusing on algorithms trained only with 3D models or renderings thereof. Our first contribution is a listing of requirements typically encountered in industrial applications. The second contribution is a collection of quantitative evaluation results for several different 6D object detection methods trained with synthetic data and the comparison and analysis thereof. We identify the top methods for individual requirements that industrial applications have for object detectors, but find that a lack of comparable data prevents large-scale comparison over multiple aspects.
Collapse
Affiliation(s)
- Felix Gorschlüter
- Fraunhofer-Institut für Graphische Datenverarbeitung, Fraunhoferstraße 5, 64283 Darmstadt, Germany; (P.R.); (T.P.)
- Department Graphisch-Interaktive Systeme, Technische Universität Darmstadt, Karolinenplatz 5, 64289 Darmstadt, Germany
- Correspondence:
| | - Pavel Rojtberg
- Fraunhofer-Institut für Graphische Datenverarbeitung, Fraunhoferstraße 5, 64283 Darmstadt, Germany; (P.R.); (T.P.)
- Department Graphisch-Interaktive Systeme, Technische Universität Darmstadt, Karolinenplatz 5, 64289 Darmstadt, Germany
| | - Thomas Pöllabauer
- Fraunhofer-Institut für Graphische Datenverarbeitung, Fraunhoferstraße 5, 64283 Darmstadt, Germany; (P.R.); (T.P.)
- Department Graphisch-Interaktive Systeme, Technische Universität Darmstadt, Karolinenplatz 5, 64289 Darmstadt, Germany
| |
Collapse
|