1
|
Gao R, Qi Y. Monocular Object-Level SLAM Enhanced by Joint Semantic Segmentation and Depth Estimation. SENSORS (BASEL, SWITZERLAND) 2025; 25:2110. [PMID: 40218621 PMCID: PMC11991236 DOI: 10.3390/s25072110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2025] [Revised: 03/11/2025] [Accepted: 03/21/2025] [Indexed: 04/14/2025]
Abstract
SLAM is regarded as a fundamental task in mobile robots and AR, implementing localization and mapping in certain circumstances. However, with only RGB images as input, monocular SLAM systems suffer problems of scale ambiguity and tracking difficulty in dynamic scenes. Moreover, high-level semantic information can always contribute to the SLAM process due to its similarity to human vision. Addressing these problems, we propose a monocular object-level SLAM system enhanced by real-time joint depth estimation and semantic segmentation. The multi-task network, called JSDNet, is designed to predict depth and semantic segmentation simultaneously, with four contributions that include depth discretization, feature fusion, a weight-learned loss function, and semantic consistency optimization. Specifically, feature fusion facilitates the sharing of features between the two tasks, while semantic consistency aims to guarantee the semantic segmentation and depth consistency among various views. Based on the results of JSDNet, we design an object-level system that combines both pixel-level and object-level semantics with traditional tracking, mapping, and optimization processes. In addition, a scale recovery process is also integrated into the system to evaluate the truth scale. Experimental results on NYU depth v2 demonstrate state-of-the-art depth estimation and considerable segmentation precision under real-time performance, while the trajectory accuracy on TUM RGB-D shows less errors compared with other SLAM systems.
Collapse
Affiliation(s)
- Ruicheng Gao
- State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing 100191, China
| | - Yue Qi
- State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing 100191, China
- Qingdao Research Institute of Beihang University, Qingdao 266104, China
| |
Collapse
|
2
|
Huai S, Cao L, Zhou Y, Guo Z, Gai J. A Multi-Strategy Visual SLAM System for Motion Blur Handling in Indoor Dynamic Environments. SENSORS (BASEL, SWITZERLAND) 2025; 25:1696. [PMID: 40292793 PMCID: PMC11944682 DOI: 10.3390/s25061696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2025] [Revised: 02/18/2025] [Accepted: 03/07/2025] [Indexed: 04/30/2025]
Abstract
Typical SLAM systems adhere to the assumption of environment rigidity, which limits their functionality when deployed in the dynamic indoor environments commonly encountered by household robots. Prevailing methods address this issue by employing semantic information for the identification and processing of dynamic objects in scenes. However, extracting reliable semantic information remains challenging due to the presence of motion blur. In this paper, a novel visual SLAM algorithm is proposed in which various approaches are integrated to obtain more reliable semantic information, consequently reducing the impact of motion blur on visual SLAM systems. Specifically, to accurately distinguish moving objects and static objects, we introduce a missed segmentation compensation mechanism into our SLAM system for predicting and restoring semantic information, and depth and semantic information is then leveraged to generate masks of dynamic objects. Additionally, to refine keypoint filtering, a probability-based algorithm for dynamic feature detection and elimination is incorporated into our SLAM system. Evaluation experiments using the TUM and Bonn RGB-D datasets demonstrated that our SLAM system achieves lower absolute trajectory error (ATE) than existing systems in different dynamic indoor environments, particularly those with large view angle variations. Our system can be applied to enhance the autonomous navigation and scene understanding capabilities of domestic robots.
Collapse
Affiliation(s)
| | | | | | | | - Jingyao Gai
- School of Mechanical Engineering, Guangxi University, Nanning 530004, China; (S.H.)
| |
Collapse
|
3
|
Wang Y, Feng X, Li F, Xian Q, Jia ZH, Du Z, Liu C. Lightweight visual localization algorithm for UAVs. Sci Rep 2025; 15:6069. [PMID: 39971988 PMCID: PMC11840052 DOI: 10.1038/s41598-025-88089-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Accepted: 01/23/2025] [Indexed: 02/21/2025] Open
Abstract
The Lightv8nPnP lightweight visual positioning algorithm model has been introduced to make deep learning-based drone visual positioning algorithms more lightweight. The core objective of this research is to develop an efficient visual positioning algorithm model that can achieve accurate 3D positioning for drones. To enhance model performance, several optimizations are proposed. Firstly, to reduce the complexity of the detection head module, GhostConv is introduced into the detection head module, constructing the GDetect detection head module. Secondly, to address the issues of imbalanced sample difficulty and uneven pixel quality in our custom dataset that result in suboptimal detection performance, Wise-IoU is introduced as the model's bounding box regression loss function. Lastly, based on the characteristics of the drone aerial dataset samples, modifications are made to the YOLOv8n network structure to reduce redundant feature maps, resulting in the creation of the TrimYOLO network structure. Experimental results demonstrate that the Lightv8nPnP algorithm reduces the number of parameters and computational load compared to benchmark algorithms, achieves a detection rate of 186 frames per second, and maintains a positioning error of less than 5.5 centimeters across the X, Y, and Z axes in three-dimensional space.
Collapse
Affiliation(s)
- Yuhang Wang
- College of Computer Science and Technology, Xinjiang University, Urumqi, 830046, China
- Xinjiang University Signal Detection and Processing Autonomous Region Key Laboratory, Urumqi, 830046, China
| | - Xuefeng Feng
- Xinjiang Uygur Autonomous Region Research Institute of Measurement and Testing, Urumqi, 830000, China
| | - Feng Li
- Xinjiang Uygur Autonomous Region Research Institute of Measurement and Testing, Urumqi, 830000, China
| | - Qinglong Xian
- Xinjiang Uygur Autonomous Region Research Institute of Measurement and Testing, Urumqi, 830000, China
| | - Zhen-Hong Jia
- College of Computer Science and Technology, Xinjiang University, Urumqi, 830046, China.
- Xinjiang University Signal Detection and Processing Autonomous Region Key Laboratory, Urumqi, 830046, China.
| | - Zongdong Du
- College of Computer Science and Technology, Xinjiang University, Urumqi, 830046, China
- Xinjiang University Signal Detection and Processing Autonomous Region Key Laboratory, Urumqi, 830046, China
| | - Chang Liu
- College of Computer Science and Technology, Xinjiang University, Urumqi, 830046, China
- Xinjiang University Signal Detection and Processing Autonomous Region Key Laboratory, Urumqi, 830046, China
| |
Collapse
|
4
|
Ghadimzadeh Alamdari A, Zade FA, Ebrahimkhanlou A. A Review of Simultaneous Localization and Mapping for the Robotic-Based Nondestructive Evaluation of Infrastructures. SENSORS (BASEL, SWITZERLAND) 2025; 25:712. [PMID: 39943350 PMCID: PMC11820643 DOI: 10.3390/s25030712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2024] [Revised: 01/13/2025] [Accepted: 01/21/2025] [Indexed: 02/16/2025]
Abstract
The maturity of simultaneous localization and mapping (SLAM) methods has now reached a significant level that motivates in-depth and problem-specific reviews. The focus of this study is to investigate the evolution of vision-based, LiDAR-based, and a combination of these methods and evaluate their performance in enclosed and GPS-denied (EGD) conditions for infrastructure inspection. This paper categorizes and analyzes the SLAM methods in detail, considering the sensor fusion type and chronological order. The paper analyzes the performance of eleven open-source SLAM solutions, containing two visual (VINS-Mono, ORB-SLAM 2), eight LiDAR-based (LIO-SAM, Fast-LIO 2, SC-Fast-LIO 2, LeGO-LOAM, SC-LeGO-LOAM A-LOAM, LINS, F-LOAM) and one combination of the LiDAR and vision-based method (LVI-SAM). The benchmarking section analyzes accuracy and computational resource consumption using our collected dataset and a test dataset. According to the results, LiDAR-based methods performed well under EGD conditions. Contrary to common presumptions, some vision-based methods demonstrate acceptable performance in EGD environments. Additionally, combining vision-based techniques with LiDAR-based methods demonstrates superior performance compared to either vision-based or LiDAR-based methods individually.
Collapse
Affiliation(s)
- Ali Ghadimzadeh Alamdari
- Department of Mechanical Engineering and Mechanics (MEM), Drexel University, 3141 Chestnut St., Philadelphia, PA 19104, USA
| | - Farzad Azizi Zade
- Mechanical Engineering Department, Ferdowsi University of Mashhad, Mashhad 9177948944, Iran
| | - Arvin Ebrahimkhanlou
- Department of Mechanical Engineering and Mechanics (MEM), Drexel University, 3141 Chestnut St., Philadelphia, PA 19104, USA
- Department of Civil, Architectural and Environmental Engineering (CAEE), Drexel University, 3141 Chestnut St., Philadelphia, PA 19104, USA
| |
Collapse
|
5
|
Kan X, Shi G, Yang X, Hu X. YPR-SLAM: A SLAM System Combining Object Detection and Geometric Constraints for Dynamic Scenes. SENSORS (BASEL, SWITZERLAND) 2024; 24:6576. [PMID: 39460056 PMCID: PMC11511238 DOI: 10.3390/s24206576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 10/04/2024] [Accepted: 10/08/2024] [Indexed: 10/28/2024]
Abstract
Traditional SLAM systems assume a static environment, but moving objects break this ideal assumption. In the real world, moving objects can greatly influence the precision of image matching and camera pose estimation. In order to solve these problems, the YPR-SLAM system is proposed. First of all, the system includes a lightweight YOLOv5 detection network for detecting both dynamic and static objects, which provides pre-dynamic object information to the SLAM system. Secondly, utilizing the prior information of dynamic targets and the depth image, a method of geometric constraint for removing motion feature points from the depth image is proposed. The Depth-PROSAC algorithm is used to differentiate the dynamic and static feature points so that dynamic feature points can be removed. At last, the dense cloud map is constructed by the static feature points. The YPR-SLAM system is an efficient combination of object detection and geometry constraint in a tightly coupled way, eliminating motion feature points and minimizing their adverse effects on SLAM systems. The performance of the YPR-SLAM was assessed on the public TUM RGB-D dataset, and it was found that YPR-SLAM was suitable for dynamic situations.
Collapse
Affiliation(s)
- Xukang Kan
- School of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen 518107, China; (X.K.); (X.H.)
| | - Gefei Shi
- Shenzhen Key Laboratory of Intelligent Microsatellite Constellation (Sun Yat-sen University), Sun Yat-sen University, Shenzhen 518107, China;
| | - Xuerong Yang
- Shenzhen Key Laboratory of Intelligent Microsatellite Constellation (Sun Yat-sen University), Sun Yat-sen University, Shenzhen 518107, China;
| | - Xinwei Hu
- School of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen 518107, China; (X.K.); (X.H.)
| |
Collapse
|
6
|
Wang Q, Song J, Du C, Wang C. Online Scene Semantic Understanding Based on Sparsely Correlated Network for AR. SENSORS (BASEL, SWITZERLAND) 2024; 24:4756. [PMID: 39066152 PMCID: PMC11281024 DOI: 10.3390/s24144756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 07/09/2024] [Accepted: 07/18/2024] [Indexed: 07/28/2024]
Abstract
Real-world understanding serves as a medium that bridges the information world and the physical world, enabling the realization of virtual-real mapping and interaction. However, scene understanding based solely on 2D images faces problems such as a lack of geometric information and limited robustness against occlusion. The depth sensor brings new opportunities, but there are still challenges in fusing depth with geometric and semantic priors. To address these concerns, our method considers the repeatability of video stream data and the sparsity of newly generated data. We introduce a sparsely correlated network architecture (SCN) designed explicitly for online RGBD instance segmentation. Additionally, we leverage the power of object-level RGB-D SLAM systems, thereby transcending the limitations of conventional approaches that solely emphasize geometry or semantics. We establish correlation over time and leverage this correlation to develop rules and generate sparse data. We thoroughly evaluate the system's performance on the NYU Depth V2 and ScanNet V2 datasets, demonstrating that incorporating frame-to-frame correlation leads to significantly improved accuracy and consistency in instance segmentation compared to existing state-of-the-art alternatives. Moreover, using sparse data reduces data complexity while ensuring the real-time requirement of 18 fps. Furthermore, by utilizing prior knowledge of object layout understanding, we showcase a promising application of augmented reality, showcasing its potential and practicality.
Collapse
Affiliation(s)
| | | | | | - Chen Wang
- The School of Computer and Artificial Intelligence, Beijing Technology and Business University, Beijing 102488, China; (Q.W.); (J.S.); (C.D.)
| |
Collapse
|
7
|
Du X, Zhang C, Gao K, Liu J, Yu X, Wang S. YPL-SLAM: A Simultaneous Localization and Mapping Algorithm for Point-line Fusion in Dynamic Environments. SENSORS (BASEL, SWITZERLAND) 2024; 24:4517. [PMID: 39065920 PMCID: PMC11280596 DOI: 10.3390/s24144517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 06/03/2024] [Accepted: 07/03/2024] [Indexed: 07/28/2024]
Abstract
Simultaneous Localization and Mapping (SLAM) is one of the key technologies with which to address the autonomous navigation of mobile robots, utilizing environmental features to determine a robot's position and create a map of its surroundings. Currently, visual SLAM algorithms typically yield precise and dependable outcomes in static environments, and many algorithms opt to filter out the feature points in dynamic regions. However, when there is an increase in the number of dynamic objects within the camera's view, this approach might result in decreased accuracy or tracking failures. Therefore, this study proposes a solution called YPL-SLAM based on ORB-SLAM2. The solution adds a target recognition and region segmentation module to determine the dynamic region, potential dynamic region, and static region; determines the state of the potential dynamic region using the RANSAC method with polar geometric constraints; and removes the dynamic feature points. It then extracts the line features of the non-dynamic region and finally performs the point-line fusion optimization process using a weighted fusion strategy, considering the image dynamic score and the number of successful feature point-line matches, thus ensuring the system's robustness and accuracy. A large number of experiments have been conducted using the publicly available TUM dataset to compare YPL-SLAM with globally leading SLAM algorithms. The results demonstrate that the new algorithm surpasses ORB-SLAM2 in terms of accuracy (with a maximum improvement of 96.1%) while also exhibiting a significantly enhanced operating speed compared to Dyna-SLAM.
Collapse
Affiliation(s)
- Xinwu Du
- College of Agricultural Equipment Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.Z.); (K.G.); (J.L.); (X.Y.)
- Longmen Laboratory, Luoyang 471000, China;
- Collaborative Innovation Center of Machinery Equipment Advanced Manufacturing of Henan Province, Luoyang 471003, China
| | - Chenglin Zhang
- College of Agricultural Equipment Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.Z.); (K.G.); (J.L.); (X.Y.)
| | - Kaihang Gao
- College of Agricultural Equipment Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.Z.); (K.G.); (J.L.); (X.Y.)
| | - Jin Liu
- College of Agricultural Equipment Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.Z.); (K.G.); (J.L.); (X.Y.)
| | - Xiufang Yu
- College of Agricultural Equipment Engineering, Henan University of Science and Technology, Luoyang 471003, China; (C.Z.); (K.G.); (J.L.); (X.Y.)
| | | |
Collapse
|
8
|
Al-Tawil B, Hempel T, Abdelrahman A, Al-Hamadi A. A review of visual SLAM for robotics: evolution, properties, and future applications. Front Robot AI 2024; 11:1347985. [PMID: 38686339 PMCID: PMC11056647 DOI: 10.3389/frobt.2024.1347985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 02/20/2024] [Indexed: 05/02/2024] Open
Abstract
Visual simultaneous localization and mapping (V-SLAM) plays a crucial role in the field of robotic systems, especially for interactive and collaborative mobile robots. The growing reliance on robotics has increased complexity in task execution in real-world applications. Consequently, several types of V-SLAM methods have been revealed to facilitate and streamline the functions of robots. This work aims to showcase the latest V-SLAM methodologies, offering clear selection criteria for researchers and developers to choose the right approach for their robotic applications. It chronologically presents the evolution of SLAM methods, highlighting key principles and providing comparative analyses between them. The paper focuses on the integration of the robotic ecosystem with a robot operating system (ROS) as Middleware, explores essential V-SLAM benchmark datasets, and presents demonstrative figures for each method's workflow.
Collapse
Affiliation(s)
- Basheer Al-Tawil
- Institute for Information Technology and Communications, Otto-von-Guericke-University, Magdeburg, Germany
| | | | | | | |
Collapse
|
9
|
Cong P, Li J, Liu J, Xiao Y, Zhang X. SEG-SLAM: Dynamic Indoor RGB-D Visual SLAM Integrating Geometric and YOLOv5-Based Semantic Information. SENSORS (BASEL, SWITZERLAND) 2024; 24:2102. [PMID: 38610313 PMCID: PMC11014023 DOI: 10.3390/s24072102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 04/14/2024]
Abstract
Simultaneous localisation and mapping (SLAM) is crucial in mobile robotics. Most visual SLAM systems assume that the environment is static. However, in real life, there are many dynamic objects, which affect the accuracy and robustness of these systems. To improve the performance of visual SLAM systems, this study proposes a dynamic visual SLAM (SEG-SLAM) system based on the orientated FAST and rotated BRIEF (ORB)-SLAM3 framework and you only look once (YOLO)v5 deep-learning method. First, based on the ORB-SLAM3 framework, the YOLOv5 deep-learning method is used to construct a fusion module for target detection and semantic segmentation. This module can effectively identify and extract prior information for obviously and potentially dynamic objects. Second, differentiated dynamic feature point rejection strategies are developed for different dynamic objects using the prior information, depth information, and epipolar geometry method. Thus, the localisation and mapping accuracy of the SEG-SLAM system is improved. Finally, the rejection results are fused with the depth information, and a static dense 3D mapping without dynamic objects is constructed using the Point Cloud Library. The SEG-SLAM system is evaluated using public TUM datasets and real-world scenarios. The proposed method is more accurate and robust than current dynamic visual SLAM algorithms.
Collapse
Affiliation(s)
- Peichao Cong
- School of Mechanical and Automotive Engineering, Guangxi University of Science and Technology, Liuzhou 545006, China
| | - Jiaxing Li
- School of Mechanical and Automotive Engineering, Guangxi University of Science and Technology, Liuzhou 545006, China
| | - Junjie Liu
- School of Mechanical and Automotive Engineering, Guangxi University of Science and Technology, Liuzhou 545006, China
| | - Yixuan Xiao
- School of Mechanical and Automotive Engineering, Guangxi University of Science and Technology, Liuzhou 545006, China
| | - Xin Zhang
- School of Mechanical and Automotive Engineering, Guangxi University of Science and Technology, Liuzhou 545006, China
| |
Collapse
|
10
|
Xie X, Qin Y, Zhang Z, Yan Z, Jin H, Xu M, Zhang C. GY-SLAM: A Dense Semantic SLAM System for Plant Factory Transport Robots. SENSORS (BASEL, SWITZERLAND) 2024; 24:1374. [PMID: 38474909 DOI: 10.3390/s24051374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 02/07/2024] [Accepted: 02/10/2024] [Indexed: 03/14/2024]
Abstract
Simultaneous Localization and Mapping (SLAM), as one of the core technologies in intelligent robotics, has gained substantial attention in recent years. Addressing the limitations of SLAM systems in dynamic environments, this research proposes a system specifically designed for plant factory transportation environments, named GY-SLAM. GY-SLAM incorporates a lightweight target detection network, GY, based on YOLOv5, which utilizes GhostNet as the backbone network. This integration is further enhanced with CoordConv coordinate convolution, CARAFE up-sampling operators, and the SE attention mechanism, leading to simultaneous improvements in detection accuracy and model complexity reduction. While mAP@0.5 increased by 0.514% to 95.364, the model simultaneously reduced the number of parameters by 43.976%, computational cost by 46.488%, and model size by 41.752%. Additionally, the system constructs pure static octree maps and grid maps. Tests conducted on the TUM dataset and a proprietary dataset demonstrate that GY-SLAM significantly outperforms ORB-SLAM3 in dynamic scenarios in terms of system localization accuracy and robustness. It shows a remarkable 92.59% improvement in RMSE for Absolute Trajectory Error (ATE), along with a 93.11% improvement in RMSE for the translational drift of Relative Pose Error (RPE) and a 92.89% improvement in RMSE for the rotational drift of RPE. Compared to YOLOv5s, the GY model brings a 41.5944% improvement in detection speed and a 17.7975% increase in SLAM operation speed to the system, indicating strong competitiveness and real-time capabilities. These results validate the effectiveness of GY-SLAM in dynamic environments and provide substantial support for the automation of logistics tasks by robots in specific contexts.
Collapse
Affiliation(s)
- Xiaolin Xie
- Longmen Laboratory, Luoyang 471003, China
- College of Agricultural Equipment Engineering, Henan University of Science and Technology, Luoyang 471003, China
| | - Yibo Qin
- College of Agricultural Equipment Engineering, Henan University of Science and Technology, Luoyang 471003, China
| | - Zhihong Zhang
- College of Agricultural Equipment Engineering, Henan University of Science and Technology, Luoyang 471003, China
| | - Zixiang Yan
- College of Agricultural Equipment Engineering, Henan University of Science and Technology, Luoyang 471003, China
| | - Hang Jin
- College of Agricultural Equipment Engineering, Henan University of Science and Technology, Luoyang 471003, China
| | - Man Xu
- College of Agricultural Equipment Engineering, Henan University of Science and Technology, Luoyang 471003, China
| | - Cheng Zhang
- College of Agricultural Equipment Engineering, Henan University of Science and Technology, Luoyang 471003, China
| |
Collapse
|
11
|
Zhang Y, Li Y, Chen P. TSG-SLAM: SLAM Employing Tight Coupling of Instance Segmentation and Geometric Constraints in Complex Dynamic Environments. SENSORS (BASEL, SWITZERLAND) 2023; 23:9807. [PMID: 38139653 PMCID: PMC10747090 DOI: 10.3390/s23249807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/10/2023] [Accepted: 12/11/2023] [Indexed: 12/24/2023]
Abstract
Although numerous effective Simultaneous Localization and Mapping (SLAM) systems have been developed, complex dynamic environments continue to present challenges, such as managing moving objects and enabling robots to comprehend environments. This paper focuses on a visual SLAM method specifically designed for complex dynamic environments. Our approach proposes a dynamic feature removal module based on the tight coupling of instance segmentation and multi-view geometric constraints (TSG). This method seamlessly integrates semantic information with geometric constraint data, using the fundamental matrix as a connecting element. In particular, instance segmentation is performed on frames to eliminate all dynamic and potentially dynamic features, retaining only reliable static features for sequential feature matching and acquiring a dependable fundamental matrix. Subsequently, based on this matrix, true dynamic features are identified and removed by capitalizing on multi-view geometry constraints while preserving reliable static features for further tracking and mapping. An instance-level semantic map of the global scenario is constructed to enhance the perception and understanding of complex dynamic environments. The proposed method is assessed on TUM datasets and in real-world scenarios, demonstrating that TSG-SLAM exhibits superior performance in detecting and eliminating dynamic feature points and obtains good localization accuracy in dynamic environments.
Collapse
Affiliation(s)
- Yongchao Zhang
- School of Intelligent Manufacturing, Taizhou University, Taizhou 318000, China;
| | - Yuanming Li
- Department of Electrical Engineering, Ganzhou Polytechnic, Ganzhou 341000, China;
- School of Electrical and Automation Engineering, East China Jiaotong University, Nanchang 330013, China
| | - Pengzhan Chen
- School of Intelligent Manufacturing, Taizhou University, Taizhou 318000, China;
- School of Electrical and Automation Engineering, East China Jiaotong University, Nanchang 330013, China
| |
Collapse
|
12
|
Zhang XY, Abd Rahman AH, Qamar F. Semantic visual simultaneous localization and mapping (SLAM) using deep learning for dynamic scenes. PeerJ Comput Sci 2023; 9:e1628. [PMID: 37869467 PMCID: PMC10588701 DOI: 10.7717/peerj-cs.1628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 09/11/2023] [Indexed: 10/24/2023]
Abstract
Simultaneous localization and mapping (SLAM) is a fundamental problem in robotics and computer vision. It involves the task of a robot or an autonomous system navigating an unknown environment, simultaneously creating a map of the surroundings, and accurately estimating its position within that map. While significant progress has been made in SLAM over the years, challenges still need to be addressed. One prominent issue is robustness and accuracy in dynamic environments, which can cause uncertainties and errors in the estimation process. Traditional methods using temporal information to differentiate static and dynamic objects have limitations in accuracy and applicability. Nowadays, many research trends have leaned towards utilizing deep learning-based methods which leverage the capabilities to handle dynamic objects, semantic segmentation, and motion estimation, aiming to improve accuracy and adaptability in complex scenes. This article proposed an approach to enhance monocular visual odometry's robustness and precision in dynamic environments. An enhanced algorithm using the semantic segmentation algorithm DeeplabV3+ is used to identify dynamic objects in the image and then apply the motion consistency check to remove feature points belonging to dynamic objects. The remaining static feature points are then used for feature matching and pose estimation based on ORB-SLAM2 using the Technical University of Munich (TUM) dataset. Experimental results show that our method outperforms traditional visual odometry methods in accuracy and robustness, especially in dynamic environments. By eliminating the influence of moving objects, our method improves the accuracy and robustness of visual odometry in dynamic environments. Compared to the traditional ORB-SLAM2, the results show that the system significantly reduces the absolute trajectory error and the relative pose error in dynamic scenes. Our approach has significantly improved the accuracy and robustness of the SLAM system's pose estimation.
Collapse
Affiliation(s)
- Xiao Ya Zhang
- Center for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Bangi, Malaysia
| | - Abdul Hadi Abd Rahman
- Center for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Bangi, Malaysia
| | - Faizan Qamar
- Center for Cyber Security, Universiti Kebangsaan Malaysia, Bangi, Malaysia
| |
Collapse
|
13
|
Alsalatie M, Alquran H, Mustafa WA, Zyout A, Alqudah AM, Kaifi R, Qudsieh S. A New Weighted Deep Learning Feature Using Particle Swarm and Ant Lion Optimization for Cervical Cancer Diagnosis on Pap Smear Images. Diagnostics (Basel) 2023; 13:2762. [PMID: 37685299 PMCID: PMC10487265 DOI: 10.3390/diagnostics13172762] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 08/17/2023] [Accepted: 08/22/2023] [Indexed: 09/10/2023] Open
Abstract
One of the most widespread health issues affecting women is cervical cancer. Early detection of cervical cancer through improved screening strategies will reduce cervical cancer-related morbidity and mortality rates worldwide. Using a Pap smear image is a novel method for detecting cervical cancer. Previous studies have focused on whole Pap smear images or extracted nuclei to detect cervical cancer. In this paper, we compared three scenarios of the entire cell, cytoplasm region, or nucleus region only into seven classes of cervical cancer. After applying image augmentation to solve imbalanced data problems, automated features are extracted using three pre-trained convolutional neural networks: AlexNet, DarkNet 19, and NasNet. There are twenty-one features as a result of these scenario combinations. The most important features are split into ten features by the principal component analysis, which reduces the dimensionality. This study employs feature weighting to create an efficient computer-aided cervical cancer diagnosis system. The optimization procedure uses the new evolutionary algorithms known as Ant lion optimization (ALO) and particle swarm optimization (PSO). Finally, two types of machine learning algorithms, support vector machine classifier, and random forest classifier, have been used in this paper to perform classification jobs. With a 99.5% accuracy rate for seven classes using the PSO algorithm, the SVM classifier outperformed the RF, which had a 98.9% accuracy rate in the same region. Our outcome is superior to other studies that used seven classes because of this focus on the tissues rather than just the nucleus. This method will aid physicians in diagnosing precancerous and early-stage cervical cancer by depending on the tissues, rather than on the nucleus. The result can be enhanced using a significant amount of data.
Collapse
Affiliation(s)
- Mohammed Alsalatie
- King Hussein Medical Center, Royal Jordanian Medical Service, The Institute of Biomedical Technology, Amman 11855, Jordan;
| | - Hiam Alquran
- Department of Biomedical Systems and Informatics Engineering, Yarmouk University, Irbid 21163, Jordan; (A.Z.); (A.M.A.)
| | - Wan Azani Mustafa
- Faculty of Electrical Engineering & Technology, Campus Pauh Putra, Universiti Malaysia Perlis, Arau 02600, Malaysia
- Advanced Computing (AdvCOMP), Centre of Excellence (CoE), Universiti Malaysia Perlis, Arau 02600, Malaysia
| | - Ala’a Zyout
- Department of Biomedical Systems and Informatics Engineering, Yarmouk University, Irbid 21163, Jordan; (A.Z.); (A.M.A.)
| | - Ali Mohammad Alqudah
- Department of Biomedical Systems and Informatics Engineering, Yarmouk University, Irbid 21163, Jordan; (A.Z.); (A.M.A.)
| | - Reham Kaifi
- College of Applied Medical Sciences, King Saud Bin Abdulaziz University for Health Sciences, Jeddah 21423, Saudi Arabia
- King Abdullah International Medical Research Center, Jeddah 22384, Saudi Arabia
| | - Suhair Qudsieh
- Department of Obstetrics and Gynecology, Faculty of Medicine, Yarmouk University, Irbid 21163, Jordan;
| |
Collapse
|
14
|
Gong H, Gong L, Ma T, Sun Z, Li L. AHY-SLAM: Toward Faster and More Accurate Visual SLAM in Dynamic Scenes Using Homogenized Feature Extraction and Object Detection Method. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23094241. [PMID: 37177445 PMCID: PMC10181220 DOI: 10.3390/s23094241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 04/19/2023] [Accepted: 04/21/2023] [Indexed: 05/15/2023]
Abstract
At present, SLAM is widely used in all kinds of dynamic scenes. It is difficult to distinguish dynamic targets in scenes using traditional visual SLAM. In the matching process, dynamic points are incorrectly added to the pose calculation with the camera, resulting in low precision and poor robustness in the pose estimation. This paper proposes a new dynamic scene visual SLAM algorithm based on adaptive threshold homogenized feature extraction and YOLOv5 object detection, named AHY-SLAM. This new method adds three new modules based on ORB-SLAM2: a keyframe selection module, a threshold calculation module, and an object detection module. The optical flow method is used to screen keyframes for each frame input in AHY-SLAM. An adaptive threshold is used to extract feature points for keyframes, and dynamic points are eliminated with YOLOv5. Compared with ORB-SLAM2, AHY-SLAM has significantly improved pose estimation accuracy over multiple dynamic scene sequences in the TUM open dataset, and the absolute pose estimation accuracy can be increased by up to 97%. Compared with other dynamic scene SLAM algorithms, the speed of AHY-SLAM is also significantly improved under a guarantee of acceptable accuracy.
Collapse
Affiliation(s)
- Han Gong
- State Key Laboratory of Mining Response and Disaster Prevention and Control in Deep Coal Mines, Huainan 232001, China
- School of Mechanical Engineering, Anhui University of Science and Technology, Huainan 232001, China
| | - Lei Gong
- School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China
- Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou 215123, China
| | - Tianbing Ma
- State Key Laboratory of Mining Response and Disaster Prevention and Control in Deep Coal Mines, Huainan 232001, China
- School of Mechanical Engineering, Anhui University of Science and Technology, Huainan 232001, China
| | - Zhicheng Sun
- State Key Laboratory of Mining Response and Disaster Prevention and Control in Deep Coal Mines, Huainan 232001, China
- School of Mechanical Engineering, Anhui University of Science and Technology, Huainan 232001, China
| | - Liang Li
- State Key Laboratory of Mining Response and Disaster Prevention and Control in Deep Coal Mines, Huainan 232001, China
- School of Mechanical Engineering, Anhui University of Science and Technology, Huainan 232001, China
| |
Collapse
|
15
|
Jin J, Jiang X, Yu C, Zhao L, Tang Z. Dynamic visual simultaneous localization and mapping based on semantic segmentation module. APPL INTELL 2023. [DOI: 10.1007/s10489-023-04531-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/07/2023]
|
16
|
Zang Q, Zhang K, Wang L, Wu L. An Adaptive ORB-SLAM3 System for Outdoor Dynamic Environments. SENSORS (BASEL, SWITZERLAND) 2023; 23:1359. [PMID: 36772399 PMCID: PMC9918902 DOI: 10.3390/s23031359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 01/19/2023] [Accepted: 01/23/2023] [Indexed: 06/18/2023]
Abstract
Recent developments in robotics have heightened the need for visual SLAM. Dynamic objects are a major problem in visual SLAM which reduces the accuracy of localization due to the wrong epipolar geometry. This study set out to find a new method to address the low accuracy of visual SLAM in outdoor dynamic environments. We propose an adaptive feature point selection system for outdoor dynamic environments. Initially, we utilize YOLOv5s with the attention mechanism to obtain a priori dynamic objects in the scene. Then, feature points are selected using an adaptive feature point selector based on the number of a priori dynamic objects and the percentage of a priori dynamic objects occupied in the frame. Finally, dynamic regions are determined using a geometric method based on Lucas-Kanade optical flow and the RANSAC algorithm. We evaluate the accuracy of our system using the KITTI dataset, comparing it to various dynamic feature point selection strategies and DynaSLAM. Experiments show that our proposed system demonstrates a reduction in both absolute trajectory error and relative trajectory error, with a maximum reduction of 39% and 30%, respectively, compared to other systems.
Collapse
Affiliation(s)
- Qiuyu Zang
- College of Mathematics and Computer Science, Zhejiang Normal University, Yingbin Avenue, Jinhua 321005, China
| | - Kehua Zhang
- Key Laboratory of Urban Rail Transit Intelligent Operation and Maintenance Technology & Equipment of Zhejiang Province, Zhejiang Normal University, Yingbin Avenue, Jinhua 321005, China
| | - Ling Wang
- Key Laboratory of Urban Rail Transit Intelligent Operation and Maintenance Technology & Equipment of Zhejiang Province, Zhejiang Normal University, Yingbin Avenue, Jinhua 321005, China
| | - Lintong Wu
- College of Mathematics and Computer Science, Zhejiang Normal University, Yingbin Avenue, Jinhua 321005, China
| |
Collapse
|
17
|
Wang K, Yao X, Ma N, Jing X. Real-time motion removal based on point correlations for RGB-D SLAM in indoor dynamic environments. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07879-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|