1
|
Chen P, Zhao X, Zeng L, Liu L, Liu S, Sun L, Li Z, Chen H, Liu G, Qiao Z, Qu Y, Xu D, Li L, Li L. A Review of Research on SLAM Technology Based on the Fusion of LiDAR and Vision. SENSORS (BASEL, SWITZERLAND) 2025; 25:1447. [PMID: 40096278 PMCID: PMC11902412 DOI: 10.3390/s25051447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Revised: 02/20/2025] [Accepted: 02/24/2025] [Indexed: 03/19/2025]
Abstract
In recent years, simultaneous localization and mapping with the fusion of LiDAR and vision fusion has gained extensive attention in the field of autonomous navigation and environment sensing. However, its limitations in feature-scarce (low-texture, repetitive structure) environmental scenarios and dynamic environments have prompted researchers to investigate the use of combining LiDAR with other sensors, particularly the effective fusion with vision sensors. This technique has proven to be highly effective in handling a variety of situations by fusing deep learning with adaptive algorithms. LiDAR excels in complex environments, with its ability to acquire high-precision 3D spatial information, especially when dealing with complex and dynamic environments with high reliability. This paper analyzes the research status, including the main research results and findings, of the early single-sensor SLAM technology and the current stage of LiDAR and vision fusion SLAM. Specific solutions for current problems (complexity of data fusion, computational burden and real-time performance, multi-scenario data processing, etc.) are examined by categorizing and summarizing the body of the extant literature and, at the same time, discussing the trends and limitations of the current research by categorizing and summarizing the existing literature, as well as looks forward to the future research directions, including multi-sensor fusion, optimization of algorithms, improvement of real-time performance, and expansion of application scenarios. This review aims to provide guidelines and insights for the development of SLAM technology for LiDAR and vision fusion, with a view to providing a reference for further SLAM technology research.
Collapse
Affiliation(s)
- Peng Chen
- College of Physics and Electronic Engineering, Hainan Normal University, Haikou 571158, China; (P.C.); (X.Z.); (L.Z.); (L.L.); (S.L.); (L.S.); (Z.L.); (H.C.); (G.L.); (Z.Q.); (Y.Q.)
| | - Xinyu Zhao
- College of Physics and Electronic Engineering, Hainan Normal University, Haikou 571158, China; (P.C.); (X.Z.); (L.Z.); (L.L.); (S.L.); (L.S.); (Z.L.); (H.C.); (G.L.); (Z.Q.); (Y.Q.)
| | - Lina Zeng
- College of Physics and Electronic Engineering, Hainan Normal University, Haikou 571158, China; (P.C.); (X.Z.); (L.Z.); (L.L.); (S.L.); (L.S.); (Z.L.); (H.C.); (G.L.); (Z.Q.); (Y.Q.)
- Key Laboratory of Laser Technology and Optoelectronic Functional Materials of Hainan Province, Hainan Normal University, Haikou 571158, China;
- Hainan International Joint Research Center for Semiconductor Lasers, Hainan Normal University, Haikou 571158, China;
| | - Luxinyu Liu
- College of Physics and Electronic Engineering, Hainan Normal University, Haikou 571158, China; (P.C.); (X.Z.); (L.Z.); (L.L.); (S.L.); (L.S.); (Z.L.); (H.C.); (G.L.); (Z.Q.); (Y.Q.)
| | - Shengjie Liu
- College of Physics and Electronic Engineering, Hainan Normal University, Haikou 571158, China; (P.C.); (X.Z.); (L.Z.); (L.L.); (S.L.); (L.S.); (Z.L.); (H.C.); (G.L.); (Z.Q.); (Y.Q.)
- Key Laboratory of Laser Technology and Optoelectronic Functional Materials of Hainan Province, Hainan Normal University, Haikou 571158, China;
- Hainan International Joint Research Center for Semiconductor Lasers, Hainan Normal University, Haikou 571158, China;
| | - Li Sun
- College of Physics and Electronic Engineering, Hainan Normal University, Haikou 571158, China; (P.C.); (X.Z.); (L.Z.); (L.L.); (S.L.); (L.S.); (Z.L.); (H.C.); (G.L.); (Z.Q.); (Y.Q.)
- Key Laboratory of Laser Technology and Optoelectronic Functional Materials of Hainan Province, Hainan Normal University, Haikou 571158, China;
- Hainan International Joint Research Center for Semiconductor Lasers, Hainan Normal University, Haikou 571158, China;
| | - Zaijin Li
- College of Physics and Electronic Engineering, Hainan Normal University, Haikou 571158, China; (P.C.); (X.Z.); (L.Z.); (L.L.); (S.L.); (L.S.); (Z.L.); (H.C.); (G.L.); (Z.Q.); (Y.Q.)
- Key Laboratory of Laser Technology and Optoelectronic Functional Materials of Hainan Province, Hainan Normal University, Haikou 571158, China;
- Hainan International Joint Research Center for Semiconductor Lasers, Hainan Normal University, Haikou 571158, China;
| | - Hao Chen
- College of Physics and Electronic Engineering, Hainan Normal University, Haikou 571158, China; (P.C.); (X.Z.); (L.Z.); (L.L.); (S.L.); (L.S.); (Z.L.); (H.C.); (G.L.); (Z.Q.); (Y.Q.)
- Key Laboratory of Laser Technology and Optoelectronic Functional Materials of Hainan Province, Hainan Normal University, Haikou 571158, China;
| | - Guojun Liu
- College of Physics and Electronic Engineering, Hainan Normal University, Haikou 571158, China; (P.C.); (X.Z.); (L.Z.); (L.L.); (S.L.); (L.S.); (Z.L.); (H.C.); (G.L.); (Z.Q.); (Y.Q.)
- Key Laboratory of Laser Technology and Optoelectronic Functional Materials of Hainan Province, Hainan Normal University, Haikou 571158, China;
- Hainan International Joint Research Center for Semiconductor Lasers, Hainan Normal University, Haikou 571158, China;
| | - Zhongliang Qiao
- College of Physics and Electronic Engineering, Hainan Normal University, Haikou 571158, China; (P.C.); (X.Z.); (L.Z.); (L.L.); (S.L.); (L.S.); (Z.L.); (H.C.); (G.L.); (Z.Q.); (Y.Q.)
- Key Laboratory of Laser Technology and Optoelectronic Functional Materials of Hainan Province, Hainan Normal University, Haikou 571158, China;
- Hainan International Joint Research Center for Semiconductor Lasers, Hainan Normal University, Haikou 571158, China;
| | - Yi Qu
- College of Physics and Electronic Engineering, Hainan Normal University, Haikou 571158, China; (P.C.); (X.Z.); (L.Z.); (L.L.); (S.L.); (L.S.); (Z.L.); (H.C.); (G.L.); (Z.Q.); (Y.Q.)
- Key Laboratory of Laser Technology and Optoelectronic Functional Materials of Hainan Province, Hainan Normal University, Haikou 571158, China;
- Hainan International Joint Research Center for Semiconductor Lasers, Hainan Normal University, Haikou 571158, China;
| | - Dongxin Xu
- Key Laboratory of Laser Technology and Optoelectronic Functional Materials of Hainan Province, Hainan Normal University, Haikou 571158, China;
| | - Lianhe Li
- Hainan International Joint Research Center for Semiconductor Lasers, Hainan Normal University, Haikou 571158, China;
| | - Lin Li
- College of Physics and Electronic Engineering, Hainan Normal University, Haikou 571158, China; (P.C.); (X.Z.); (L.Z.); (L.L.); (S.L.); (L.S.); (Z.L.); (H.C.); (G.L.); (Z.Q.); (Y.Q.)
- Key Laboratory of Laser Technology and Optoelectronic Functional Materials of Hainan Province, Hainan Normal University, Haikou 571158, China;
- Hainan International Joint Research Center for Semiconductor Lasers, Hainan Normal University, Haikou 571158, China;
| |
Collapse
|
2
|
Furukawa R, Kawasaki H, Sagawa R. Incremental shape integration with inter-frame shape consistency using neural SDF for a 3D endoscopic system. Healthc Technol Lett 2025; 12:e70001. [PMID: 39885982 PMCID: PMC11780497 DOI: 10.1049/htl2.70001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Accepted: 11/11/2024] [Indexed: 02/01/2025] Open
Abstract
3D measurement for endoscopic systems has been largely demanded. One promising approach is to utilize active-stereo systems using a micro-sized pattern-projector attached to the head of an endoscope. Furthermore, a multi-frame integration is also desired to enlarge the reconstructed area. This paper proposes an incremental optimization technique of both the shape-field parameters and the positional parameters of the cameras and projectors. The method assumes that the input data is temporarily sequential images, that is, endoscopic videos, and the relative positions between the camera and the projector may vary continuously. As solution, a differential volume rendering algorithm in conjunction with neural signed distance field (NeuralSDF) representation is proposed to simultaneously optimize the 3D scene and the camera/projector poses. Also, an incremental optimization strategy where the optimized frames are gradually increased is proposed. In the experiment, the proposed method is evaluated by performing 3D reconstruction using both synthetic and real images, proving the effectiveness of our method.
Collapse
Affiliation(s)
- Ryo Furukawa
- Department of Informatics/Graduate School of System EngineeringKindai UniversityHigashihiroshimaJapan
| | - Hiroshi Kawasaki
- Faculty of Information Science and Electrical Engineering Department of Advanced Information TechnologyKyushu UniversityFukuokaJapan
| | - Ryusuke Sagawa
- Artificial Intelligence Research Center, The National Institute of Advanced Science and TechnologyTsukubaJapan
| |
Collapse
|
3
|
Wang E, Liu Y, Tu P, Taylor ZA, Chen X. Video-Based Soft Tissue Deformation Tracking for Laparoscopic Augmented Reality-Based Navigation in Kidney Surgery. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:4161-4173. [PMID: 38865220 DOI: 10.1109/tmi.2024.3413537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2024]
Abstract
Minimally invasive surgery (MIS) remains technically demanding due to the difficulty of tracking hidden critical structures within the moving anatomy of the patient. In this study, we propose a soft tissue deformation tracking augmented reality (AR) navigation pipeline for laparoscopic surgery of the kidneys. The proposed navigation pipeline addresses two main sub-problems: the initial registration and deformation tracking. Our method utilizes preoperative MR or CT data and binocular laparoscopes without any additional interventional hardware. The initial registration is resolved through a probabilistic rigid registration algorithm and elastic compensation based on dense point cloud reconstruction. For deformation tracking, the sparse feature point displacement vector field continuously provides temporal boundary conditions for the biomechanical model. To enhance the accuracy of the displacement vector field, a novel feature points selection strategy based on deep learning is proposed. Moreover, an ex-vivo experimental method for internal structures error assessment is presented. The ex-vivo experiments indicate an external surface reprojection error of 4.07 ± 2.17 mm and a maximum mean absolutely error for internal structures of 2.98 mm. In-vivo experiments indicate mean absolutely error of 3.28 ± 0.40 mm and 1.90 ± 0.24 mm, respectively. The combined qualitative and quantitative findings indicated the potential of our AR-assisted navigation system in improving the clinical application of laparoscopic kidney surgery.
Collapse
|
4
|
Wang E, Liu Y, Xu J, Chen X. Non-rigid scene reconstruction of deformable soft tissue with monocular endoscopy in minimally invasive surgery. Int J Comput Assist Radiol Surg 2024; 19:2433-2443. [PMID: 38705922 DOI: 10.1007/s11548-024-03149-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 04/10/2024] [Indexed: 05/07/2024]
Abstract
PURPOSE The utilization of image-guided surgery has demonstrated its ability to improve the precision and safety of minimally invasive surgery (MIS). Non-rigid scene reconstruction is a challenge in image-guided system duo to uniform texture, smoke, and instrument occlusion, etc. METHODS: In this paper, we introduced an algorithm for 3D reconstruction aimed at non-rigid surgery scenes. The proposed method comprises two main components: firstly, the front-end process involves the initial reconstruction of 3D information for deformable soft tissues using embedded deformation graph (EDG) on the basis of dual quaternions, enabling the reconstruction without the need for prior knowledge of the target. Secondly, the EDG is integrated with isometric nonrigid structure from motion (Iso-NRSFM) to facilitate centralized optimization of the observed map points and camera motion across different time instances in deformable scenes. RESULTS For the quantitative evaluation of the proposed method, we conducted comparative experiments with both synthetic datasets and publicly available datasets against the state-of-the-art 3D reconstruction method, DefSLAM. The test results show that our proposed method achieved a maximum reduction of 1.6 mm in average reconstruction error compared to method DefSLAM across all datasets. Additionally, qualitative experiments were performed on video scene datasets involving surgical instrument occlusions. CONCLUSION Our method proved to outperform DefSLAM on both synthetic datasets and public datasets through experiments, demonstrating its robustness and accuracy in the reconstruction of soft tissues in dynamic surgical scenes. This success highlights the potential clinical application of our method in delivering surgeons with critical shape and depth information for MIS.
Collapse
Affiliation(s)
- Enpeng Wang
- School of Mechanical Engineering, Shanghai Jiao Tong University, No. 800, Road Dongchuan, Shanghai, 200240, China
| | - Yueang Liu
- School of Mechanical Engineering, Shanghai Jiao Tong University, No. 800, Road Dongchuan, Shanghai, 200240, China
| | - Jiangchang Xu
- School of Mechanical Engineering, Shanghai Jiao Tong University, No. 800, Road Dongchuan, Shanghai, 200240, China
| | - Xiaojun Chen
- School of Mechanical Engineering, Shanghai Jiao Tong University, No. 800, Road Dongchuan, Shanghai, 200240, China.
| |
Collapse
|
5
|
Parashar S, Long Y, Salzmann M, Fua P. A Closed-Form, Pairwise Solution to Local Non-Rigid Structure-From-Motion. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:7027-7040. [PMID: 38578851 DOI: 10.1109/tpami.2024.3383316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/07/2024]
Abstract
A recent trend in Non-Rigid Structure-from-Motion (NRSfM) is to express local, differential constraints between pairs of images, from which the surface normal at any point can be obtained by solving a system of polynomial equations. While this approach is more successful than its counterparts relying on global constraints, the resulting methods face two main problems: First, most of the equation systems they formulate are of high degree and must be solved using computationally expensive polynomial solvers. Some methods use polynomial reduction strategies to simplify the system, but this adds some phantom solutions. In any event, an additional mechanism is employed to pick the best solution, which adds to the computation without any guarantees on the reliability of the solution. Second, these methods formulate constraints between a pair of images. Even if there is enough motion between them, they may suffer from local degeneracies that make the resulting estimates unreliable without any warning mechanism. In this paper, we solve these problems for isometric/conformal NRSfM. We show that, under widely applicable assumptions, we can derive a new system of equations in terms of the surface normals, whose two solutions can be obtained in closed-form and can easily be disambiguated locally. Our formalism also allows us to assess how reliable the estimated local normals are and to discard them if they are not. Our experiments show that our reconstructions, obtained from two or more views, are significantly more accurate than those of state-of-the-art methods, while also being faster.
Collapse
|
6
|
Lee Y. Three-Dimensional Dense Reconstruction: A Review of Algorithms and Datasets. SENSORS (BASEL, SWITZERLAND) 2024; 24:5861. [PMID: 39338606 PMCID: PMC11435907 DOI: 10.3390/s24185861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 09/04/2024] [Accepted: 09/05/2024] [Indexed: 09/30/2024]
Abstract
Three-dimensional dense reconstruction involves extracting the full shape and texture details of three-dimensional objects from two-dimensional images. Although 3D reconstruction is a crucial and well-researched area, it remains an unsolved challenge in dynamic or complex environments. This work provides a comprehensive overview of classical 3D dense reconstruction techniques, including those based on geometric and optical models, as well as approaches leveraging deep learning. It also discusses the datasets used for deep learning and evaluates the performance and the strengths and limitations of deep learning methods on these datasets.
Collapse
Affiliation(s)
- Yangming Lee
- RoCAL Lab, Rochester Institute of Technology, Rochester, NY 14623, USA
| |
Collapse
|
7
|
Mandel N, Kompe N, Gerwin M, Ernst F. KISS-Keep It Static SLAMMOT-The Cost of Integrating Moving Object Tracking into an EKF-SLAM Algorithm. SENSORS (BASEL, SWITZERLAND) 2024; 24:5764. [PMID: 39275675 PMCID: PMC11398253 DOI: 10.3390/s24175764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/30/2024] [Accepted: 09/02/2024] [Indexed: 09/16/2024]
Abstract
The treatment of moving objects in simultaneous localization and mapping (SLAM) is a key challenge in contemporary robotics. In this paper, we propose an extension of the EKF-SLAM algorithm that incorporates moving objects into the estimation process, which we term KISS. We have extended the robotic vision toolbox to analyze the influence of moving objects in simulations. Two linear and one nonlinear motion models are used to represent the moving objects. The observation model remains the same for all objects. The proposed model is evaluated against an implementation of the state-of-the-art formulation for moving object tracking, DATMO. We investigate increasing numbers of static landmarks and dynamic objects to demonstrate the impact on the algorithm and compare it with cases where a moving object is mistakenly integrated as a static landmark (false negative) and a static landmark as a moving object (false positive). In practice, distances to dynamic objects are important, and we propose the safety-distance-error metric to evaluate the difference between the true and estimated distances to a dynamic object. The results show that false positives have a negligible impact on map distortion and ATE with increasing static landmarks, while false negatives significantly distort maps and degrade performance metrics. Explicitly modeling dynamic objects not only performs comparably in terms of map distortion and ATE but also enables more accurate tracking of dynamic objects with a lower safety-distance-error than DATMO. We recommend that researchers model objects with uncertain motion using a simple constant position model, hence we name our contribution Keep it Static SLAMMOT. We hope this work will provide valuable data points and insights for future research into integrating moving objects into SLAM algorithms.
Collapse
Affiliation(s)
- Nicolas Mandel
- Institute of Robotics and Cognitive Systems, University of Lübeck, 23562 Lübeck, Germany
| | - Nils Kompe
- Institute of Robotics and Cognitive Systems, University of Lübeck, 23562 Lübeck, Germany
| | - Moritz Gerwin
- Institute of Robotics and Cognitive Systems, University of Lübeck, 23562 Lübeck, Germany
| | - Floris Ernst
- Institute of Robotics and Cognitive Systems, University of Lübeck, 23562 Lübeck, Germany
| |
Collapse
|
8
|
Veil C, Sawodny O. Intraoperative Multi-Sensor Tissue Differentiation in (Uro-)Oncology - A Short Review. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-5. [PMID: 40039869 DOI: 10.1109/embc53108.2024.10782728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Precise differentiation of pathological tissue during surgery is crucial in oncology. The current gold standard, histopathological analysis, involves delays due to tissue processing, impacting real-time decision-making. Furthermore, it does not always give information about the extent and heterogeneity of the tumorous tissue. This paper examines our research efforts in developing novel multimodal sensors tailored for uro-oncology. These sensors measure optical, electrical, and mechanical tissue properties, aiming to provide comprehensive tissue differentiation during surgery. Along with this, a review of recent advances in the field of intraoperative tissue differentiation is given. Altered physical properties in tumorous tissues are discussed and the suitability of various sensor modalities for detecting these changes is investigated, especially infrared and Raman spectroscopy, and electrical and mechanical measurements. A digital frameworks for spatial localization of measurements within the organ is is crucial for integrating sensor data to achieve comprehensive tissue characterization. Our focus lies in presenting these innovative sensor technologies and their potential to transform intraoperative tissue assessment. By providing real-time information, these sensors could significantly enhance diagnostic precision in urological oncology, potentially improving patient outcomes.Clinical relevance- Supporting real-time decision making during urological surgeries.
Collapse
|
9
|
Furukawa R, Sagawa R, Oka S, Kawasaki H. NeRF-based multi-frame 3D integration for 3D endoscopy using active stereo. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-5. [PMID: 40040184 DOI: 10.1109/embc53108.2024.10782699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
3D measurement for endoscopic systems has large potential not only for cancer diagnosis or computer-assisted medical systems, but also for providing ground truth for supervised training of deep neural networks. To achieve it, one of the promising approach is the implementation of an active-stereo system using a micro-sized pattern-projector attached to the head of the endoscope. Furthermore, a multi-frame optimization algorithm for the endoscopic active-stereo system has been proposed to improve accuracy and robustness; in the approach, differential rendering algorithm is used to simultaneously optimize the 3D scene represented by triangle meshes and the camera/projector poses. One issue with the approach is its dependency on the accuracy of the initial 3D triangle mesh, however, it is not an easy task to achieve sufficient accuracy for actual endoscopic systems, which reduces the practicality of the algorithm. In this paper, we adapt neural radiance field (NeRF) based 3D scene representation to integrate multi-frame data captured by active-stereo system, where the 3D scene as well as the camera/projector poses are simultaneously optimized without using the initial shape. In the experiment, the proposed method is evaluated by performing 3D reconstruction using both synthetic and real images obtained by a consumer endoscopic camera attached with a micro-pattern-projector.Clinical relevance- One-shot endoscopic measurement of depth information is a practical solution for cancer diagnosis, computer-assisted interventions, and making annotations for machine learning training data.
Collapse
|
10
|
Tang X, Tao H, Qian Y, Yang J, Feng Z, Wang Q. Real-time deformable SLAM with geometrically adapted template for dynamic monocular laparoscopic scenes. Int J Comput Assist Radiol Surg 2024; 19:1375-1383. [PMID: 38771418 DOI: 10.1007/s11548-024-03174-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 05/01/2024] [Indexed: 05/22/2024]
Abstract
PURPOSE Intraoperative reconstruction of endoscopic scenes is a key technology for surgical navigation systems. The accuracy and efficiency of 3D reconstruction directly determine the effectiveness of navigation systems in a variety of clinical applications. While current deformable SLAM algorithms can meet real-time requirements, their underlying reliance on regular templates still makes it challenging to efficiently capture abrupt geometric features within scenes, such as organ contours and surgical margins. METHODS We propose a novel real-time monocular deformable SLAM algorithm with geometrically adapted template. To ensure real-time performance, the proposed algorithm consists of two threads: a deformation mapping thread updates the template at keyframe rate and a deformation tracking thread estimates the camera pose and the deformation at frame rate. To capture geometric features more efficiently, the algorithm first detects salient edge features using a pre-trained contour detection network and then constructs the template through a triangulation method with guidance of the salient features. RESULTS We thoroughly evaluated this method on Mandala and Hamlyn datasets in terms of accuracy and performance. The results demonstrated that the proposed method achieves better accuracy with 0.75-7.95% improvement and achieves consistent effectiveness in data association compared with the closest method. CONCLUSION This study verified an adaptive template does improve the performance of reconstruction of dynamic laparoscopic Scenes with abrupt geometric features. However, further exploration is needed for applications in laparoscopic surgery with incisal margins caused by surgical instruments. This research serves as a crucial step toward enhanced automatic computer-assisted navigation in laparoscopic surgery. Code is available at https://github.com/Tang257/SLAM-with-geometrically-adapted-template .
Collapse
Affiliation(s)
- Xuanshuang Tang
- Department of Computer Science, Sichuan University, Chengdu, 610065, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Nanshan, Shenzhen, 518055, China
| | - Haisu Tao
- Guangdong Provincial Clinical and Engineering Center of Digital Medicine, Guangzhou, 510280, China
- Department of Hepatobiliary Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China
| | - Yinling Qian
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Nanshan, Shenzhen, 518055, China.
| | - Jian Yang
- Guangdong Provincial Clinical and Engineering Center of Digital Medicine, Guangzhou, 510280, China
- Department of Hepatobiliary Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China
| | - Ziliang Feng
- Department of Computer Science, Sichuan University, Chengdu, 610065, China
| | - Qiong Wang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Nanshan, Shenzhen, 518055, China.
| |
Collapse
|
11
|
Yang Z, Dai J, Pan J. 3D reconstruction from endoscopy images: A survey. Comput Biol Med 2024; 175:108546. [PMID: 38704902 DOI: 10.1016/j.compbiomed.2024.108546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/05/2024] [Accepted: 04/28/2024] [Indexed: 05/07/2024]
Abstract
Three-dimensional reconstruction of images acquired through endoscopes is playing a vital role in an increasing number of medical applications. Endoscopes used in the clinic are commonly classified as monocular endoscopes and binocular endoscopes. We have reviewed the classification of methods for depth estimation according to the type of endoscope. Basically, depth estimation relies on feature matching of images and multi-view geometry theory. However, these traditional techniques have many problems in the endoscopic environment. With the increasing development of deep learning techniques, there is a growing number of works based on learning methods to address challenges such as inconsistent illumination and texture sparsity. We have reviewed over 170 papers published in the 10 years from 2013 to 2023. The commonly used public datasets and performance metrics are summarized. We also give a taxonomy of methods and analyze the advantages and drawbacks of algorithms. Summary tables and result atlas are listed to facilitate the comparison of qualitative and quantitative performance of different methods in each category. In addition, we summarize commonly used scene representation methods in endoscopy and speculate on the prospects of deep estimation research in medical applications. We also compare the robustness performance, processing time, and scene representation of the methods to facilitate doctors and researchers in selecting appropriate methods based on surgical applications.
Collapse
Affiliation(s)
- Zhuoyue Yang
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China; Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China
| | - Ju Dai
- Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China
| | - Junjun Pan
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China; Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China.
| |
Collapse
|
12
|
Schmidt A, Mohareri O, DiMaio S, Yip MC, Salcudean SE. Tracking and mapping in medical computer vision: A review. Med Image Anal 2024; 94:103131. [PMID: 38442528 DOI: 10.1016/j.media.2024.103131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 02/08/2024] [Accepted: 02/29/2024] [Indexed: 03/07/2024]
Abstract
As computer vision algorithms increase in capability, their applications in clinical systems will become more pervasive. These applications include: diagnostics, such as colonoscopy and bronchoscopy; guiding biopsies, minimally invasive interventions, and surgery; automating instrument motion; and providing image guidance using pre-operative scans. Many of these applications depend on the specific visual nature of medical scenes and require designing algorithms to perform in this environment. In this review, we provide an update to the field of camera-based tracking and scene mapping in surgery and diagnostics in medical computer vision. We begin with describing our review process, which results in a final list of 515 papers that we cover. We then give a high-level summary of the state of the art and provide relevant background for those who need tracking and mapping for their clinical applications. After which, we review datasets provided in the field and the clinical needs that motivate their design. Then, we delve into the algorithmic side, and summarize recent developments. This summary should be especially useful for algorithm designers and to those looking to understand the capability of off-the-shelf methods. We maintain focus on algorithms for deformable environments while also reviewing the essential building blocks in rigid tracking and mapping since there is a large amount of crossover in methods. With the field summarized, we discuss the current state of the tracking and mapping methods along with needs for future algorithms, needs for quantification, and the viability of clinical applications. We then provide some research directions and questions. We conclude that new methods need to be designed or combined to support clinical applications in deformable environments, and more focus needs to be put into collecting datasets for training and evaluation.
Collapse
Affiliation(s)
- Adam Schmidt
- Department of Electrical and Computer Engineering, University of British Columbia, 2329 West Mall, Vancouver V6T 1Z4, BC, Canada.
| | - Omid Mohareri
- Advanced Research, Intuitive Surgical, 1020 Kifer Rd, Sunnyvale, CA 94086, USA
| | - Simon DiMaio
- Advanced Research, Intuitive Surgical, 1020 Kifer Rd, Sunnyvale, CA 94086, USA
| | - Michael C Yip
- Department of Electrical and Computer Engineering, University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA
| | - Septimiu E Salcudean
- Department of Electrical and Computer Engineering, University of British Columbia, 2329 West Mall, Vancouver V6T 1Z4, BC, Canada
| |
Collapse
|
13
|
Song J, Zhang R, Zhu Q, Lin J, Ghaffari M. BDIS-SLAM: a lightweight CPU-based dense stereo SLAM for surgery. Int J Comput Assist Radiol Surg 2024; 19:811-820. [PMID: 38238493 DOI: 10.1007/s11548-023-03055-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 12/21/2023] [Indexed: 05/18/2024]
Abstract
PURPOSE Common dense stereo simultaneous localization and mapping (SLAM) approaches in minimally invasive surgery (MIS) require high-end parallel computational resources for real-time implementation. Yet, it is not always feasible since the computational resources should be allocated to other tasks like segmentation, detection, and tracking. To solve the problem of limited parallel computational power, this research aims at a lightweight dense stereo SLAM system that works on a single-core CPU and achieves real-time performance (more than 30 Hz in typical scenarios). METHODS A new dense stereo mapping module is integrated with the ORB-SLAM2 system and named BDIS-SLAM. Our new dense stereo mapping module includes stereo matching and 3D dense depth mosaic methods. Stereo matching is achieved with the recently proposed CPU-level real-time matching algorithm Bayesian Dense Inverse Searching (BDIS). A BDIS-based shape recovery and a depth mosaic strategy are integrated as a new thread and coupled with the backbone ORB-SLAM2 system for real-time stereo shape recovery. RESULTS Experiments on in vivo data sets show that BDIS-SLAM runs at over 30 Hz speed on modern single-core CPU in typical endoscopy/colonoscopy scenarios. BDIS-SLAM only consumes around an additional 12 % time compared with the backbone ORB-SLAM2. Although our lightweight BDIS-SLAM simplifies the process by ignoring deformation and fusion procedures, it can provide a usable dense mapping for modern MIS on computationally constrained devices. CONCLUSION The proposed BDIS-SLAM is a lightweight stereo dense SLAM system for MIS. It achieves 30 Hz on a modern single-core CPU in typical endoscopy/colonoscopy scenarios (image size around 640 × 480 ). BDIS-SLAM provides a low-cost solution for dense mapping in MIS and has the potential to be applied in surgical robots and AR systems. Code is available at https://github.com/JingweiSong/BDIS-SLAM .
Collapse
Affiliation(s)
- Jingwei Song
- United Imaging Research Institute of Intelligent Imaging, Beijing, 100144, China.
- University of Michigan, Ann Arbor, MI, 48109, USA.
| | - Ray Zhang
- University of Michigan, Ann Arbor, MI, 48109, USA
| | - Qiuchen Zhu
- University of Technology Sydney, Sydney, NSW, 2007, Australia
| | - Jianyu Lin
- Imperial College London, London, SW72AZ, UK
| | | |
Collapse
|
14
|
Furukawa R, Chen E, Sagawa R, Oka S, Kawasaki H. Calibration-free structured-light-based 3D scanning system in laparoscope for robotic surgery. Healthc Technol Lett 2024; 11:196-205. [PMID: 38638488 PMCID: PMC11022229 DOI: 10.1049/htl2.12083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 02/15/2024] [Indexed: 04/20/2024] Open
Abstract
Accurate 3D shape measurement is crucial for surgical support and alignment in robotic surgery systems. Stereo cameras in laparoscopes offer a potential solution; however, their accuracy in stereo image matching diminishes when the target image has few textures. Although stereo matching with deep learning has gained significant attention, supervised learning requires a large dataset of images with depth annotations, which are scarce for laparoscopes. Thus, there is a strong demand to explore alternative methods for depth reconstruction or annotation for laparoscopes. Active stereo techniques are a promising approach for achieving 3D reconstruction without textures. In this study, a 3D shape reconstruction method is proposed using an ultra-small patterned projector attached to a laparoscopic arm to address these issues. The pattern projector emits a structured light with a grid-like pattern that features node-wise modulation for positional encoding. To scan the target object, multiple images are taken while the projector is in motion, and the relative poses of the projector and a camera are auto-calibrated using a differential rendering technique. In the experiment, the proposed method is evaluated by performing 3D reconstruction using images obtained from a surgical robot and comparing the results with a ground-truth shape obtained from X-ray CT.
Collapse
Affiliation(s)
- Ryo Furukawa
- Department of InformaticsKindai UniversityHigashihiroshimaJapan
| | | | - Ryusuke Sagawa
- Artificial Intelligence Research CenterNational Institute of Anvanced Industrial Science and Technology (AIST)TsukubaJapan
| | | | - Hiroshi Kawasaki
- Faculty of Information Science and Electrical EngineeringKyushu UniversityFukuokaJapan
| |
Collapse
|
15
|
Casella A, Lena C, Moccia S, Paladini D, De Momi E, Mattos LS. Toward a navigation framework for fetoscopy. Int J Comput Assist Radiol Surg 2023; 18:2349-2356. [PMID: 37587389 PMCID: PMC10632301 DOI: 10.1007/s11548-023-02974-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 05/23/2023] [Indexed: 08/18/2023]
Abstract
PURPOSE Fetoscopic laser photocoagulation of placental anastomoses is the most effective treatment for twin-to-twin transfusion syndrome (TTTS). A robust mosaic of placenta and its vascular network could support surgeons' exploration of the placenta by enlarging the fetoscope field-of-view. In this work, we propose a learning-based framework for field-of-view expansion from intra-operative video frames. METHODS While current state of the art for fetoscopic mosaicking builds upon the registration of anatomical landmarks which may not always be visible, our framework relies on learning-based features and keypoints, as well as robust transformer-based image-feature matching, without requiring any anatomical priors. We further address the problem of occlusion recovery and frame relocalization, relying on the computed features and their descriptors. RESULTS Experiments were conducted on 10 in-vivo TTTS videos from two different fetal surgery centers. The proposed framework was compared with several state-of-the-art approaches, achieving higher [Formula: see text] on 7 out of 10 videos and a success rate of [Formula: see text] in occlusion recovery. CONCLUSION This work introduces a learning-based framework for placental mosaicking with occlusion recovery from intra-operative videos using a keypoint-based strategy and features. The proposed framework can compute the placental panorama and recover even in case of camera tracking loss where other methods fail. The results suggest that the proposed framework has large potential to pave the way to creating a surgical navigation system for TTTS by providing robust field-of-view expansion.
Collapse
Affiliation(s)
- Alessandro Casella
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy.
- Department of Electronic, Information and Bioengineering, Politecnico di Milano, Milan, Italy.
| | - Chiara Lena
- Department of Electronic, Information and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Sara Moccia
- Department of Excellence in Robotics and AI, The BioRobotics Institute, Scuola Superiore Sant'Anna, Pisa, Italy
| | - Dario Paladini
- Department of Fetal and Perinatal Medicine, Istituto Giannina Gaslini, Genoa, Italy
| | - Elena De Momi
- Department of Electronic, Information and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Leonardo S Mattos
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
| |
Collapse
|
16
|
Mao F, Huang T, Ma L, Zhang X, Liao H. A Monocular Variable Magnifications 3D Laparoscope System Using Double Liquid Lenses. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2023; 12:32-42. [PMID: 38059130 PMCID: PMC10697296 DOI: 10.1109/jtehm.2023.3311022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 08/13/2023] [Accepted: 08/21/2023] [Indexed: 12/08/2023]
Abstract
During minimal invasive surgery (MIS), the laparoscope only provides a single viewpoint to the surgeon, leaving a lack of 3D perception. Many works have been proposed to obtain depth and 3D reconstruction by designing a new optical structure or by depending on the camera pose and image sequences. Most of these works modify the structure of the conventional laparoscopes and cannot provide 3D reconstruction of different magnification views. In this study, we propose a laparoscopic system based on double liquid lenses, which provide doctors with variable magnification rates, near observation, and real-time monocular 3D reconstruction. Our system composes of an optical structure that can obtain auto magnification change and autofocus without any physically moving element, and a deep learning network based on the Depth from Defocus (DFD) method, trained to suit inconsistent camera intrinsic situations and estimate depth from images of different focal lengths. The optical structure is portable and can be mounted on conventional laparoscopes. The depth estimation network estimates depth in real-time from monocular images of different focal lengths and magnification rates. Experiments show that our system provides a 0.68-1.44x zoom rate and can estimate depth from different magnification rates at 6fps. Monocular 3D reconstruction reaches at least 6mm accuracy. The system also provides a clear view even under 1mm close working distance. Ex-vivo experiments and implementation on clinical images prove that our system provides doctors with a magnified clear view of the lesion, as well as quick monocular depth perception during laparoscopy, which help surgeons get better detection and size diagnosis of the abdomen during laparoscope surgeries.
Collapse
Affiliation(s)
- Fan Mao
- Department of Biomedical EngineeringSchool of MedicineTsinghua UniversityBeijing100084China
| | - Tianqi Huang
- Department of Biomedical EngineeringSchool of MedicineTsinghua UniversityBeijing100084China
| | - Longfei Ma
- Department of Biomedical EngineeringSchool of MedicineTsinghua UniversityBeijing100084China
| | - Xinran Zhang
- Department of Biomedical EngineeringSchool of MedicineTsinghua UniversityBeijing100084China
| | - Hongen Liao
- Department of Biomedical EngineeringSchool of MedicineTsinghua UniversityBeijing100084China
| |
Collapse
|
17
|
Furukawa R, Sagawa R, Oka S, Tanaka S, Kawasaki H. Single and multi-frame auto-calibration for 3D endoscopy with differential rendering. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-5. [PMID: 38083062 DOI: 10.1109/embc40787.2023.10340381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
The use of 3D measurement in endoscopic images offers practicality in cancer diagnosis, computer-assisted interventions, and making annotations for machine learning training data. An effective approach is the implementation of an active stereo system, using a micro-sized pattern projector and an endoscope camera, which has been intensively developed. One open problem for such a system is the necessity of strict and complex calibration of the projector-camera system to precisely recover the shapes. Moreover, since the head of an endoscope should have enough elasticity to avoid harming target objects, the positions of the pattern projector cannot be tightly fixed to the head, resulting in limited accuracy. A straightforward approach to the problem is applying auto-calibration. However, it requires special markers in the pattern or a highly accurate initial position for stable calibration, which is impractical for real operation. In the paper, we propose a novel auto-calibration method based on differential rendering techniques, which are recently proposed and drawing wide attention. To apply the method to an endoscopic system, where a diffractive optical element (DOE) is used, we propose a technique to simultaneously estimate the focal length of the DOE as well as the extrinsic parameters between a projector and a camera. We also propose a multi-frame optimization algorithm to jointly optimize the intrinsic and extrinsic parameters, relative pose between frames, and the entire shape.Clinical relevance- One-shot endoscopic measurement of depth information is a practical solution for cancer diagnosis, computer-assisted interventions, and making annotations for machine learning training data.
Collapse
|
18
|
Deng Z, Jiang P, Guo Y, Zhang S, Hu Y, Zheng X, He B. Safety-aware robotic steering of a flexible endoscope for nasotracheal intubation. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
19
|
Chadebecq F, Lovat LB, Stoyanov D. Artificial intelligence and automation in endoscopy and surgery. Nat Rev Gastroenterol Hepatol 2023; 20:171-182. [PMID: 36352158 DOI: 10.1038/s41575-022-00701-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/03/2022] [Indexed: 11/10/2022]
Abstract
Modern endoscopy relies on digital technology, from high-resolution imaging sensors and displays to electronics connecting configurable illumination and actuation systems for robotic articulation. In addition to enabling more effective diagnostic and therapeutic interventions, the digitization of the procedural toolset enables video data capture of the internal human anatomy at unprecedented levels. Interventional video data encapsulate functional and structural information about a patient's anatomy as well as events, activity and action logs about the surgical process. This detailed but difficult-to-interpret record from endoscopic procedures can be linked to preoperative and postoperative records or patient imaging information. Rapid advances in artificial intelligence, especially in supervised deep learning, can utilize data from endoscopic procedures to develop systems for assisting procedures leading to computer-assisted interventions that can enable better navigation during procedures, automation of image interpretation and robotically assisted tool manipulation. In this Perspective, we summarize state-of-the-art artificial intelligence for computer-assisted interventions in gastroenterology and surgery.
Collapse
Affiliation(s)
- François Chadebecq
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
| | - Laurence B Lovat
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK.
| |
Collapse
|
20
|
Wang Y, Zhao L, Gong L, Chen X, Zuo S. A monocular SLAM system based on SIFT features for gastroscope tracking. Med Biol Eng Comput 2023; 61:511-523. [PMID: 36534372 DOI: 10.1007/s11517-022-02739-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 12/09/2022] [Indexed: 12/23/2022]
Abstract
During flexible gastroscopy, physicians have extreme difficulties to self-localize. Camera tracking method such as simultaneous localization and mapping (SLAM) has become a research hotspot in recent years, allowing tracking of the endoscope. However, most of the existing solutions have focused on tasks in which sufficient texture information is available, such as laparoscope tracking, and cannot be applied to gastroscope tracking since gastroscopic images have fewer textures than laparoscopic images. This paper proposes a new monocular SLAM framework based on scale-invariant feature transform (SIFT) and narrow-band imaging (NBI), which extracts SIFT features instead of oriented features from accelerated segment test (FAST) and rotated binary robust independent elementary features (BRIEF) features from gastroscopic NBI images, and performs feature retention based on the response sorting strategy for achieving more matches. Experimental results show that the root mean squared error of the proposed algorithm can reach a minimum of 2.074 mm, and the pose accuracy can be improved by up to 25.73% compared with oriented FAST and rotated BRIEF (ORB)-SLAM. SIFT features and response sorting strategy can achieve more accurate matching in gastroscopic NBI images than other features and homogenization strategy, and the proposed algorithm can also run successfully on real clinical gastroscopic data. The proposed algorithm has the potential clinical value to assist physicians in locating the gastroscope during gastroscopy.
Collapse
Affiliation(s)
- Yifan Wang
- Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, Tianjin, China
| | - Liang Zhao
- Faculty of Engineering and Information Technology, Robotics Institute, University of Technology Sydney, Sydney, Australia
| | - Lun Gong
- Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, Tianjin, China
| | - Xin Chen
- Tianjin Medical University General Hospital, Tianjin, China
| | - Siyang Zuo
- Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, Tianjin, China.
| |
Collapse
|
21
|
Li Y. Deep causal learning for robotic intelligence. Front Neurorobot 2023; 17:1128591. [PMID: 36910267 PMCID: PMC9992986 DOI: 10.3389/fnbot.2023.1128591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 01/30/2023] [Indexed: 02/24/2023] Open
Abstract
This invited Review discusses causal learning in the context of robotic intelligence. The Review introduces the psychological findings on causal learning in human cognition, as well as the traditional statistical solutions for causal discovery and causal inference. Additionally, we examine recent deep causal learning algorithms, with a focus on their architectures and the benefits of using deep nets, and discuss the gap between deep causal learning and the needs of robotic intelligence.
Collapse
Affiliation(s)
- Yangming Li
- RoCAL, Rochester Institute of Technology, Rochester, NY, United States
| |
Collapse
|
22
|
Furukawa R, Mikamo M, Sagawa R, Okamoto Y, Oka S, Tanaka S, Kawasaki H. Multi-frame optimisation for active stereo with inverse renderingto obtain consistent shape and projector-camera posesfor 3D endoscopic system. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2022. [DOI: 10.1080/21681163.2022.2155578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Ryo Furukawa
- Department of Informatics, Kindai University, Higashihiroshima, Hiroshima, Japan
| | | | - Ryusuke Sagawa
- Computer Vision Research Team, Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan
| | - Yuki Okamoto
- Department of Endoscopy-, Hiroshima University Hospital, Hiroshima, Hiroshima, Japan
| | - Shiro Oka
- Graduate School of Biomedical & Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Shinji Tanaka
- Graduate School of Biomedical & Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Hiroshi Kawasaki
- Graduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
| |
Collapse
|
23
|
Oliva Maza L, Steidle F, Klodmann J, Strobl K, Triebel R. An ORB-SLAM3-based Approach for Surgical Navigation in Ureteroscopy. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2022. [DOI: 10.1080/21681163.2022.2156392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Affiliation(s)
- Laura Oliva Maza
- PEK (Perzeption und Kognition), Deutsches Zentrum für Luft- und Raumfahrt (DLR), Weßling, Germany
| | - Florian Steidle
- PEK (Perzeption und Kognition), Deutsches Zentrum für Luft- und Raumfahrt (DLR), Weßling, Germany
| | - Julian Klodmann
- PEK (Perzeption und Kognition), Deutsches Zentrum für Luft- und Raumfahrt (DLR), Weßling, Germany
- ARR (Analyse und Regelung komplexer Robotersysteme), Deutsches Zentrum für Luft- und Raumfahrt (DLR), Weßling, Germany
| | - Klaus Strobl
- PEK (Perzeption und Kognition), Deutsches Zentrum für Luft- und Raumfahrt (DLR), Weßling, Germany
| | - Rudolph Triebel
- PEK (Perzeption und Kognition), Deutsches Zentrum für Luft- und Raumfahrt (DLR), Weßling, Germany
| |
Collapse
|
24
|
Lamarca J, Gomez Rodriguez JJ, Tardos JD, Montiel J. Direct and Sparse Deformable Tracking. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3201253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Jose Lamarca
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| | - Juan J. Gomez Rodriguez
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| | - Juan D. Tardos
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| | - J.M.M. Montiel
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| |
Collapse
|
25
|
Li B, Lu B, Wang Z, Zhong F, Dou Q, Liu YH. Learning Laparoscope Actions via Video Features for Proactive Robotic Field-of-View Control. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3173442] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Bin Li
- T stone Robotics Institute, The Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Hong Kong
| | - Bo Lu
- T stone Robotics Institute, The Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Hong Kong
| | - Ziyi Wang
- T stone Robotics Institute, The Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Hong Kong
| | - Fangxun Zhong
- T stone Robotics Institute, The Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Hong Kong
| | - Qi Dou
- Department of Computer Science and Engineering, and T Stone Robotics Institute, The Chinese University of Hong Kong, Hong Kong
| | - Yun-Hui Liu
- T stone Robotics Institute, The Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Hong Kong
| |
Collapse
|
26
|
Tracking better, tracking longer: automatic keyframe selection in model-based laparoscopic augmented reality. Int J Comput Assist Radiol Surg 2022; 17:1507-1511. [DOI: 10.1007/s11548-022-02643-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 04/08/2022] [Indexed: 11/05/2022]
|
27
|
3D Texture Reconstruction of Abdominal Cavity Based on Monocular Vision SLAM for Minimally Invasive Surgery. Symmetry (Basel) 2022. [DOI: 10.3390/sym14020185] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The depth information of abdominal tissue surface and the position of laparoscope are very important for accurate surgical navigation in computer-aided surgery. It is difficult to determine the lesion location by empirically matching the laparoscopic visual field with the preoperative image, which is easy to cause intraoperative errors. Aiming at the complex abdominal environment, this paper constructs an improved monocular simultaneous localization and mapping (SLAM) system model, which can more accurately and truly reflect the abdominal cavity structure and spatial relationship. Firstly, in order to enhance the contrast between blood vessels and background, the contrast limited adaptive histogram equalization (CLAHE) algorithm is introduced to preprocess abdominal images. Secondly, combined with AKAZE algorithm, the Oriented FAST and Rotated BRIEF(ORB) algorithm is improved to extract the features of abdominal image, which improves the accuracy of extracted symmetry feature points pair and uses the RANSAC algorithm to quickly eliminate the majority of mis-matched pairs. The medical bag-of-words model is used to replace the traditional bag-of-words model to facilitate the comparison of similarity between abdominal images, which has stronger similarity calculation ability and reduces the matching time between the current abdominal image frame and the historical abdominal image frame. Finally, Poisson surface reconstruction is used to transform the point cloud into a triangular mesh surface, and the abdominal cavity texture image is superimposed on the 3D surface described by the mesh to generate the abdominal cavity inner wall texture. The surface of the abdominal cavity 3D model is smooth and has a strong sense of reality. The experimental results show that the improved SLAM system increases the registration accuracy of feature points and the densification, and the visual effect of dense point cloud reconstruction is more realistic for Hamlyn dataset. The 3D reconstruction technology creates a realistic model to identify the blood vessels, nerves and other tissues in the patient’s focal area, enabling three-dimensional visualization of the focal area, facilitating the surgeon’s observation and diagnosis, and digital simulation of the surgical operation to optimize the surgical plan.
Collapse
|
28
|
Fletcher J. Methods and Applications of 3D Patient-Specific Virtual Reconstructions in Surgery. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1356:53-71. [PMID: 35146617 DOI: 10.1007/978-3-030-87779-8_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
3D modelling has been highlighted as one of the key digital technologies likely to impact surgical practice in the next decade. 3D virtual models are reconstructed using traditional 2D imaging data through either direct volume or indirect surface rendering. One of the principal benefits of 3D visualisation in surgery relates to improved anatomical understanding-particularly in cases involving highly variable complex structures or where precision is required.Workflows begin with imaging segmentation which is a key step in 3D reconstruction and is defined as the process of identifying and delineating structures of interest. Fully automated segmentation will be essential if 3D visualisation is to be feasibly incorporated into routine clinical workflows; however, most algorithmic solutions remain incomplete. 3D models must undergo a range of processing steps prior to visualisation, which typically include smoothing, decimation and colourization. Models used for illustrative purposes may undergo more advanced processing such as UV unwrapping, retopology and PBR texture mapping.Clinical applications are wide ranging and vary significantly between specialities. Beyond pure anatomical visualisation, 3D modelling offers new methods of interacting with imaging data; enabling patient-specific simulations/rehearsal, Computer-Aided Design (CAD) of custom implants/cutting guides and serves as the substrate for augmented reality (AR) enhanced navigation.3D may enable faster, safer surgery with reduced errors and complications, ultimately resulting in improved patient outcomes. However, the relative effectiveness of 3D visualisation remains poorly understood. Future research is needed to not only define the ideal application, specific user and optimal interface/platform for interacting with models but also identify means by which we can systematically evaluate the efficacy of 3D modelling in surgery.
Collapse
|
29
|
Zhou H, Jayender J. EMDQ: Removal of Image Feature Mismatches in Real-Time. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:706-720. [PMID: 34914589 PMCID: PMC8777235 DOI: 10.1109/tip.2021.3134456] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This paper proposes a novel method for removing image feature mismatches in real-time that can handle both rigid and smooth deforming environments. Image distortion, parallax and object deformation may cause the pixel coordinates of feature matches to have non-rigid deformations, which cannot be represented using a single analytical rigid transformation. To solve this problem, we propose an algorithm based on the re-weighting and 1-point RANSAC strategy (R1P-RNSC), which operates under the assumption that a non-rigid deformation can be approximately represented by multiple rigid transformations. R1P-RNSC is fast but suffers from the drawback that local smoothing information cannot be considered, thus limiting its accuracy. To solve this problem, we propose a non-parametric algorithm based on the expectation-maximization algorithm and the dual quaternion-based representation (EMDQ). EMDQ generates dense and smooth deformation fields by interpolating among the feature matches, simultaneously removing mismatches that are inconsistent with the deformation field. It relies on the rigid transformations obtained by R1P-RNSC to improve its accuracy. The experimental results demonstrate that EMDQ has superior accuracy compared to other state-of-the-art mismatch removal methods. The ability to build correspondences for all image pixels using the dense deformation field is another contribution of this paper.
Collapse
|
30
|
Recasens D, Lamarca J, Facil JM, Montiel JMM, Civera J. Endo-Depth-and-Motion: Reconstruction and Tracking in Endoscopic Videos Using Depth Networks and Photometric Constraints. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2021.3095528] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
31
|
Zhou H, Jayender J. EMDQ-SLAM: Real-time High-resolution Reconstruction of Soft Tissue Surface from Stereo Laparoscopy Videos. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2021; 12904:331-340. [PMID: 35664445 PMCID: PMC9165607 DOI: 10.1007/978-3-030-87202-1_32] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
We propose a novel stereo laparoscopy video-based non-rigid SLAM method called EMDQ-SLAM, which can incrementally reconstruct thee-dimensional (3D) models of soft tissue surfaces in real-time and preserve high-resolution color textures. EMDQ-SLAM uses the expectation maximization and dual quaternion (EMDQ) algorithm combined with SURF features to track the camera motion and estimate tissue deformation between video frames. To overcome the problem of accumulative errors over time, we have integrated a g2o-based graph optimization method that combines the EMDQ mismatch removal and as-rigid-as-possible (ARAP) smoothing methods. Finally, the multi-band blending (MBB) algorithm has been used to obtain high resolution color textures with real-time performance. Experimental results demonstrate that our method outperforms two state-of-the-art non-rigid SLAM methods: MISSLAM and DefSLAM. Quantitative evaluation shows an average error in the range of 0.8-2.2 mm for different cases.
Collapse
Affiliation(s)
- Haoyin Zhou
- Surgical Planning Laboratory, Brigham and Women's Hospital, Harvard Medical School, Boston, USA
| | - Jagadeesan Jayender
- Surgical Planning Laboratory, Brigham and Women's Hospital, Harvard Medical School, Boston, USA
| |
Collapse
|
32
|
Zhou H, Jayender J. Real-Time Nonrigid Mosaicking of Laparoscopy Images. IEEE TRANSACTIONS ON MEDICAL IMAGING 2021; 40:1726-1736. [PMID: 33690113 PMCID: PMC8169627 DOI: 10.1109/tmi.2021.3065030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The ability to extend the field of view of laparoscopy images can help the surgeons to obtain a better understanding of the anatomical context. However, due to tissue deformation, complex camera motion and significant three-dimensional (3D) anatomical surface, image pixels may have non-rigid deformation and traditional mosaicking methods cannot work robustly for laparoscopy images in real-time. To solve this problem, a novel two-dimensional (2D) non-rigid simultaneous localization and mapping (SLAM) system is proposed in this paper, which is able to compensate for the deformation of pixels and perform image mosaicking in real-time. The key algorithm of this 2D non-rigid SLAM system is the expectation maximization and dual quaternion (EMDQ) algorithm, which can generate smooth and dense deformation field from sparse and noisy image feature matches in real-time. An uncertainty-based loop closing method has been proposed to reduce the accumulative errors. To achieve real-time performance, both CPU and GPU parallel computation technologies are used for dense mosaicking of all pixels. Experimental results on in vivo and synthetic data demonstrate the feasibility and accuracy of our non-rigid mosaicking method.
Collapse
|
33
|
Collins T, Pizarro D, Gasparini S, Bourdel N, Chauvet P, Canis M, Calvet L, Bartoli A. Augmented Reality Guided Laparoscopic Surgery of the Uterus. IEEE TRANSACTIONS ON MEDICAL IMAGING 2021; 40:371-380. [PMID: 32986548 DOI: 10.1109/tmi.2020.3027442] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
A major research area in Computer Assisted Intervention (CAI) is to aid laparoscopic surgery teams with Augmented Reality (AR) guidance. This involves registering data from other modalities such as MR and fusing it with the laparoscopic video in real-time, to reveal the location of hidden critical structures. We present the first system for AR guided laparoscopic surgery of the uterus. This works with pre-operative MR or CT data and monocular laparoscopes, without requiring any additional interventional hardware such as optical trackers. We present novel and robust solutions to two main sub-problems: the initial registration, which is solved using a short exploratory video, and update registration, which is solved with real-time tracking-by-detection. These problems are challenging for the uterus because it is a weakly-textured, highly mobile organ that moves independently of surrounding structures. In the broader context, our system is the first that has successfully performed markerless real-time registration and AR of a mobile human organ with monocular laparoscopes in the OR.
Collapse
|