1
|
Gao R, Qi Y. Monocular Object-Level SLAM Enhanced by Joint Semantic Segmentation and Depth Estimation. SENSORS (BASEL, SWITZERLAND) 2025; 25:2110. [PMID: 40218621 PMCID: PMC11991236 DOI: 10.3390/s25072110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2025] [Revised: 03/11/2025] [Accepted: 03/21/2025] [Indexed: 04/14/2025]
Abstract
SLAM is regarded as a fundamental task in mobile robots and AR, implementing localization and mapping in certain circumstances. However, with only RGB images as input, monocular SLAM systems suffer problems of scale ambiguity and tracking difficulty in dynamic scenes. Moreover, high-level semantic information can always contribute to the SLAM process due to its similarity to human vision. Addressing these problems, we propose a monocular object-level SLAM system enhanced by real-time joint depth estimation and semantic segmentation. The multi-task network, called JSDNet, is designed to predict depth and semantic segmentation simultaneously, with four contributions that include depth discretization, feature fusion, a weight-learned loss function, and semantic consistency optimization. Specifically, feature fusion facilitates the sharing of features between the two tasks, while semantic consistency aims to guarantee the semantic segmentation and depth consistency among various views. Based on the results of JSDNet, we design an object-level system that combines both pixel-level and object-level semantics with traditional tracking, mapping, and optimization processes. In addition, a scale recovery process is also integrated into the system to evaluate the truth scale. Experimental results on NYU depth v2 demonstrate state-of-the-art depth estimation and considerable segmentation precision under real-time performance, while the trajectory accuracy on TUM RGB-D shows less errors compared with other SLAM systems.
Collapse
Affiliation(s)
- Ruicheng Gao
- State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing 100191, China
| | - Yue Qi
- State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing 100191, China
- Qingdao Research Institute of Beihang University, Qingdao 266104, China
| |
Collapse
|
2
|
Zhang Y, Feng G. Neural Radiance Field Dynamic Scene SLAM Based on Ray Segmentation and Bundle Adjustment. SENSORS (BASEL, SWITZERLAND) 2025; 25:1679. [PMID: 40292784 PMCID: PMC11944775 DOI: 10.3390/s25061679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2025] [Revised: 03/05/2025] [Accepted: 03/06/2025] [Indexed: 04/30/2025]
Abstract
The current neural implicit SLAM methods have demonstrated excellent performance in reconstructing ideal static 3D scenes. However, it remains a significant challenge for these methods to handle real scenes with drastic changes in lighting conditions and dynamic environments. This paper proposes a neural implicit SLAM method that effectively deals with dynamic scenes. We employ a keyframe selection and tracking switching approach based on Lucas-Kanade (LK) optical flow, which serves as prior construction for the Conditional Random Fields potential function. This forms a semantic-based joint estimation method for dynamic and static pixels and constructs corresponding loss functions to impose constraints on dynamic scenes. We conduct experiments on various dynamic and challenging scene datasets, including TUM RGB-D, Openloris, and Bonn. The results demonstrate that our method significantly outperforms existing neural implicit SLAM systems in terms of reconstruction quality and tracking accuracy.
Collapse
Affiliation(s)
- Yuquan Zhang
- School of Traffic and Transportation, Shijiazhuang Tiedao University, Shijiazhuang 050043, China;
- Department of Automotive Engineering, Hebei Jiaotong Vocational and Tecenical College, Shijiazhuang 050035, China
| | - Guosheng Feng
- School of Traffic and Transportation, Shijiazhuang Tiedao University, Shijiazhuang 050043, China;
- School of New Energy Vehicle Engineering, Guangzhou Institute of Science and Technology, Guangzhou 510540, China
| |
Collapse
|
3
|
Lin Z, Tian Z, Zhang Q, Zhuang H, Lan J. Enhanced Visual SLAM for Collision-Free Driving with Lightweight Autonomous Cars. SENSORS (BASEL, SWITZERLAND) 2024; 24:6258. [PMID: 39409298 PMCID: PMC11478337 DOI: 10.3390/s24196258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 09/23/2024] [Accepted: 09/25/2024] [Indexed: 10/20/2024]
Abstract
The paper presents a vision-based obstacle avoidance strategy for lightweight self-driving cars that can be run on a CPU-only device using a single RGB-D camera. The method consists of two steps: visual perception and path planning. The visual perception part uses ORBSLAM3 enhanced with optical flow to estimate the car's poses and extract rich texture information from the scene. In the path planning phase, the proposed method employs a method combining a control Lyapunov function and control barrier function in the form of a quadratic program (CLF-CBF-QP) together with an obstacle shape reconstruction process (SRP) to plan safe and stable trajectories. To validate the performance and robustness of the proposed method, simulation experiments were conducted with a car in various complex indoor environments using the Gazebo simulation environment. The proposed method can effectively avoid obstacles in the scenes. The proposed algorithm outperforms benchmark algorithms in achieving more stable and shorter trajectories across multiple simulated scenes.
Collapse
Affiliation(s)
- Zhihao Lin
- James Watt School of Engineering, University of Glasgow, Glasgow G12 8QQ, UK; (Z.L.); (Z.T.)
| | - Zhen Tian
- James Watt School of Engineering, University of Glasgow, Glasgow G12 8QQ, UK; (Z.L.); (Z.T.)
| | - Qi Zhang
- Faculty of Science, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands;
| | - Hanyang Zhuang
- University of Michigan-Shanghai Jiao Tong University Joint Institute, Shanghai Jiao Tong University, Shanghai 200240, China;
| | - Jianglin Lan
- James Watt School of Engineering, University of Glasgow, Glasgow G12 8QQ, UK; (Z.L.); (Z.T.)
| |
Collapse
|
4
|
Yuan Y, Wu Y, Fan X, Gong M, Ma W, Miao Q. EGST: Enhanced Geometric Structure Transformer for Point Cloud Registration. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:6222-6234. [PMID: 37971922 DOI: 10.1109/tvcg.2023.3329578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
We explore the effect of geometric structure descriptors on extracting reliable correspondences and obtaining accurate registration for point cloud registration. The point cloud registration task involves the estimation of rigid transformation motion in unorganized point cloud, hence it is crucial to capture the contextual features of the geometric structure in point cloud. Recent coordinates-only methods ignore numerous geometric information in the point cloud which weaken ability to express the global context. We propose Enhanced Geometric Structure Transformer to learn enhanced contextual features of the geometric structure in point cloud and model the structure consistency between point clouds for extracting reliable correspondences, which encodes three explicit enhanced geometric structures and provides significant cues for point cloud registration. More importantly, we report empirical results that Enhanced Geometric Structure Transformer can learn meaningful geometric structure features using none of the following: (i) explicit positional embeddings, (ii) additional feature exchange module such as cross-attention, which can simplify network structure compared with plain Transformer. Extensive experiments on the synthetic dataset and real-world datasets illustrate that our method can achieve competitive results.
Collapse
|
5
|
You H, Xie Y. Automatic driving image matching via Random Sample Consensus (RANSAC) and Spectral Clustering (SC) with monocular camera. THE REVIEW OF SCIENTIFIC INSTRUMENTS 2024; 95:085113. [PMID: 39194348 DOI: 10.1063/5.0214966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 08/08/2024] [Indexed: 08/29/2024]
Abstract
In today's big data era, with the development of the Internet of Things (IoT) technology and the trend of autonomous driving prevailing, visual information has shown a blowout increase, but most image matching algorithms have problems such as low accuracy and low inlier rates, resulting in insufficient information. In order to solve the problem of low image matching accuracy and low inlier rate in the field of autonomous driving, this research innovatively applies spectral clustering (SC) in the field of data analysis to image matching in the field of autonomous driving, and a new image matching algorithm "SC-RANSAC" based on SC and Random Sample Consensus (RANSAC) is proposed. The datasets in this research are collected based on the monocular cameras of autonomous driving cars. We use RANSAC to obtain the initial inlier set and the SC algorithm to filter RANSAC's outliers and then use the filtered inliers as the final inlier set. In order to verify the effectiveness of the algorithm, it shows the matching effect from three angles: camera translation, rotation, and rotation and translation. SC-RANSAC is also compared with RANSAC, graph-cut RANSAC, and marginalizing sample consensus by using two different types of datasets. Finally, we select three representative pictures to test the robustness of the SC-RANSAC algorithm. The experimental results show that SC-RANSAC can effectively and reliably eliminate mismatches in the initial matching results; has a high inlier rate, real-time performance, and robustness; and can be effectively applied in the environment of autonomous driving.
Collapse
Affiliation(s)
- Hairong You
- Ministry of Information Technology, China Minsheng Bank, No. 2 Fuxingmen Inner Street, Xicheng District, 100032 Beijing, China
| | - Yang Xie
- Mobile Department, Xiaomi Technology Co., Ltd., No. 33 Xierqi Middle Road, Haidian District, 100085 Beijing, China
| |
Collapse
|
6
|
Lu Q, Pan Y, Hu L, He J. A Method for Reconstructing Background from RGB-D SLAM in Indoor Dynamic Environments. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23073529. [PMID: 37050589 PMCID: PMC10099189 DOI: 10.3390/s23073529] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 03/17/2023] [Accepted: 03/20/2023] [Indexed: 06/12/2023]
Abstract
Dynamic environments are challenging for visual Simultaneous Localization and Mapping, as dynamic elements can disrupt the camera pose estimation and thus reduce the reconstructed map accuracy. To solve this problem, this study proposes an approach for eliminating dynamic elements and reconstructing static background in indoor dynamic environments. To check out dynamic elements, the geometric residual is exploited, and the static background is obtained after removing the dynamic elements and repairing images. The camera pose is estimated based on the static background. Keyframes are then selected using randomized ferns, and loop closure detection and relocalization are performed according to the keyframes set. Finally, the 3D scene is reconstructed. The proposed method is tested on the TUM and BONN datasets, and the map reconstruction accuracy is experimentally demonstrated.
Collapse
|
7
|
Xu Z, Rong Z, Wu Y. A survey: which features are required for dynamic visual simultaneous localization and mapping? Vis Comput Ind Biomed Art 2021; 4:20. [PMID: 34269925 PMCID: PMC8285453 DOI: 10.1186/s42492-021-00086-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 06/10/2021] [Indexed: 11/10/2022] Open
Abstract
In recent years, simultaneous localization and mapping in dynamic environments (dynamic SLAM) has attracted significant attention from both academia and industry. Some pioneering work on this technique has expanded the potential of robotic applications. Compared to standard SLAM under the static world assumption, dynamic SLAM divides features into static and dynamic categories and leverages each type of feature properly. Therefore, dynamic SLAM can provide more robust localization for intelligent robots that operate in complex dynamic environments. Additionally, to meet the demands of some high-level tasks, dynamic SLAM can be integrated with multiple object tracking. This article presents a survey on dynamic SLAM from the perspective of feature choices. A discussion of the advantages and disadvantages of different visual features is provided in this article.
Collapse
Affiliation(s)
- Zewen Xu
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China.,National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Zheng Rong
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China.,National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Yihong Wu
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China. .,National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.
| |
Collapse
|