1
|
Lu F, Zhou D, Chen H, Liu S, Ling X, Zhu L, Gong T, Sheng B, Liao X, Jin H, Li P, Feng DD. S2P-Matching: Self-Supervised Patch-Based Matching Using Transformer for Capsule Endoscopic Images Stitching. IEEE Trans Biomed Eng 2025; 72:540-551. [PMID: 39302789 DOI: 10.1109/tbme.2024.3462502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
The Magnetically Controlled Capsule Endoscopy (MCCE) has a limited shooting range, resulting in capturing numerous fragmented images and an inability to precisely locate and examine the region of interest (ROI) as traditional endoscopy can. Addressing this issue, image stitching around the ROI can be employed to aid in the diagnosis of gastrointestinal (GI) tract conditions. However, MCCE images possess unique characteristics, such as weak texture, close-up shooting, and large angle rotation, presenting challenges to current image-matching methods. In this context, a method named S2P-Matching is proposed for self-supervised patch-based matching in MCCE image stitching. The method involves augmenting the raw data by simulating the capsule endoscopic camera's behavior around the GI tract's ROI. Subsequently, an improved contrast learning encoder is utilized to extract local features, represented as deep feature descriptors. This encoder comprises two branches that extract distinct scale features, which are combined over the channel without manual labeling. The data-driven descriptors are then input into a Transformer model to obtain patch-level matches by learning the globally consented matching priors in the pseudo-ground-truth match pairs. Finally, the patch-level matching is refined and filtered to the pixel-level. The experimental results on real-world MCCE images demonstrate that S2P-Matching provides enhanced accuracy in addressing challenging issues in the GI tract environment with image parallax. The performance improvement can reach up to 203 and 55.8% in terms of NCM (Number of Correct Matches) and SR (Success Rate), respectively. This approach is expected to facilitate the wide adoption of MCCE-based gastrointestinal screening.
Collapse
|
2
|
Boretto L, Pelanis E, Regensburger A, Petkov K, Palomar R, Fretland ÅA, Edwin B, Elle OJ. Intraoperative patient-specific volumetric reconstruction and 3D visualization for laparoscopic liver surgery. Healthc Technol Lett 2024; 11:374-383. [PMID: 39720761 PMCID: PMC11665787 DOI: 10.1049/htl2.12106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Accepted: 11/25/2024] [Indexed: 12/26/2024] Open
Abstract
Despite the benefits of minimally invasive surgery, interventions such as laparoscopic liver surgery present unique challenges, like the significant anatomical differences between preoperative images and intraoperative scenes due to pneumoperitoneum, patient pose, and organ manipulation by surgical instruments. To address these challenges, a method for intraoperative three-dimensional reconstruction of the surgical scene, including vessels and tumors, without altering the surgical workflow, is proposed. The technique combines neural radiance field reconstructions from tracked laparoscopic videos with ultrasound three-dimensional compounding. The accuracy of our reconstructions on a clinical laparoscopic liver ablation dataset, consisting of laparoscope and patient reference posed from optical tracking, laparoscopic and ultrasound videos, as well as preoperative and intraoperative computed tomographies, is evaluated. The authors propose a solution to compensate for liver deformations due to pressure applied during ultrasound acquisitions, improving the overall accuracy of the three-dimensional reconstructions compared to the ground truth intraoperative computed tomography with pneumoperitoneum. A unified neural radiance field from the ultrasound and laparoscope data, which allows real-time view synthesis providing surgeons with comprehensive intraoperative visual information for laparoscopic liver surgery, is trained.
Collapse
Affiliation(s)
- Luca Boretto
- Siemens Healthcare ASOsloNorway
- Department of InformaticsFaculty of Mathematics and Natural SciencesUniversity of OsloOsloNorway
| | - Egidijus Pelanis
- The Intervention CentreOslo University Hospital RikshospitaletOsloNorway
| | | | - Kaloian Petkov
- Siemens Medical Solutions USA, Inc.PrincetonNew JerseyUSA
| | - Rafael Palomar
- The Intervention CentreOslo University Hospital RikshospitaletOsloNorway
- Department of Computer ScienceNorwegian University of Science and TechnologyGjøvikNorway
| | - Åsmund Avdem Fretland
- The Intervention CentreOslo University Hospital RikshospitaletOsloNorway
- Department of HPB SurgeryOslo University Hospital RikshospitaletOsloNorway
| | - Bjørn Edwin
- The Intervention CentreOslo University Hospital RikshospitaletOsloNorway
- Department of Computer ScienceNorwegian University of Science and TechnologyGjøvikNorway
- Faculty of MedicineInstitute of MedicineUniversity of OsloOsloNorway
| | - Ole Jakob Elle
- Department of InformaticsFaculty of Mathematics and Natural SciencesUniversity of OsloOsloNorway
- The Intervention CentreOslo University Hospital RikshospitaletOsloNorway
| |
Collapse
|
3
|
Yang Z, Dai J, Pan J. 3D reconstruction from endoscopy images: A survey. Comput Biol Med 2024; 175:108546. [PMID: 38704902 DOI: 10.1016/j.compbiomed.2024.108546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/05/2024] [Accepted: 04/28/2024] [Indexed: 05/07/2024]
Abstract
Three-dimensional reconstruction of images acquired through endoscopes is playing a vital role in an increasing number of medical applications. Endoscopes used in the clinic are commonly classified as monocular endoscopes and binocular endoscopes. We have reviewed the classification of methods for depth estimation according to the type of endoscope. Basically, depth estimation relies on feature matching of images and multi-view geometry theory. However, these traditional techniques have many problems in the endoscopic environment. With the increasing development of deep learning techniques, there is a growing number of works based on learning methods to address challenges such as inconsistent illumination and texture sparsity. We have reviewed over 170 papers published in the 10 years from 2013 to 2023. The commonly used public datasets and performance metrics are summarized. We also give a taxonomy of methods and analyze the advantages and drawbacks of algorithms. Summary tables and result atlas are listed to facilitate the comparison of qualitative and quantitative performance of different methods in each category. In addition, we summarize commonly used scene representation methods in endoscopy and speculate on the prospects of deep estimation research in medical applications. We also compare the robustness performance, processing time, and scene representation of the methods to facilitate doctors and researchers in selecting appropriate methods based on surgical applications.
Collapse
Affiliation(s)
- Zhuoyue Yang
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China; Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China
| | - Ju Dai
- Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China
| | - Junjun Pan
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China; Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China.
| |
Collapse
|
4
|
Song J, Zhang R, Zhu Q, Lin J, Ghaffari M. BDIS-SLAM: a lightweight CPU-based dense stereo SLAM for surgery. Int J Comput Assist Radiol Surg 2024; 19:811-820. [PMID: 38238493 DOI: 10.1007/s11548-023-03055-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 12/21/2023] [Indexed: 05/18/2024]
Abstract
PURPOSE Common dense stereo simultaneous localization and mapping (SLAM) approaches in minimally invasive surgery (MIS) require high-end parallel computational resources for real-time implementation. Yet, it is not always feasible since the computational resources should be allocated to other tasks like segmentation, detection, and tracking. To solve the problem of limited parallel computational power, this research aims at a lightweight dense stereo SLAM system that works on a single-core CPU and achieves real-time performance (more than 30 Hz in typical scenarios). METHODS A new dense stereo mapping module is integrated with the ORB-SLAM2 system and named BDIS-SLAM. Our new dense stereo mapping module includes stereo matching and 3D dense depth mosaic methods. Stereo matching is achieved with the recently proposed CPU-level real-time matching algorithm Bayesian Dense Inverse Searching (BDIS). A BDIS-based shape recovery and a depth mosaic strategy are integrated as a new thread and coupled with the backbone ORB-SLAM2 system for real-time stereo shape recovery. RESULTS Experiments on in vivo data sets show that BDIS-SLAM runs at over 30 Hz speed on modern single-core CPU in typical endoscopy/colonoscopy scenarios. BDIS-SLAM only consumes around an additional 12 % time compared with the backbone ORB-SLAM2. Although our lightweight BDIS-SLAM simplifies the process by ignoring deformation and fusion procedures, it can provide a usable dense mapping for modern MIS on computationally constrained devices. CONCLUSION The proposed BDIS-SLAM is a lightweight stereo dense SLAM system for MIS. It achieves 30 Hz on a modern single-core CPU in typical endoscopy/colonoscopy scenarios (image size around 640 × 480 ). BDIS-SLAM provides a low-cost solution for dense mapping in MIS and has the potential to be applied in surgical robots and AR systems. Code is available at https://github.com/JingweiSong/BDIS-SLAM .
Collapse
Affiliation(s)
- Jingwei Song
- United Imaging Research Institute of Intelligent Imaging, Beijing, 100144, China.
- University of Michigan, Ann Arbor, MI, 48109, USA.
| | - Ray Zhang
- University of Michigan, Ann Arbor, MI, 48109, USA
| | - Qiuchen Zhu
- University of Technology Sydney, Sydney, NSW, 2007, Australia
| | - Jianyu Lin
- Imperial College London, London, SW72AZ, UK
| | | |
Collapse
|
5
|
Li L, Mazomenos E, Chandler JH, Obstein KL, Valdastri P, Stoyanov D, Vasconcelos F. Robust endoscopic image mosaicking via fusion of multimodal estimation. Med Image Anal 2023; 84:102709. [PMID: 36549045 PMCID: PMC10636739 DOI: 10.1016/j.media.2022.102709] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 08/15/2022] [Accepted: 11/29/2022] [Indexed: 12/23/2022]
Abstract
We propose an endoscopic image mosaicking algorithm that is robust to light conditioning changes, specular reflections, and feature-less scenes. These conditions are especially common in minimally invasive surgery where the light source moves with the camera to dynamically illuminate close range scenes. This makes it difficult for a single image registration method to robustly track camera motion and then generate consistent mosaics of the expanded surgical scene across different and heterogeneous environments. Instead of relying on one specialised feature extractor or image registration method, we propose to fuse different image registration algorithms according to their uncertainties, formulating the problem as affine pose graph optimisation. This allows to combine landmarks, dense intensity registration, and learning-based approaches in a single framework. To demonstrate our application we consider deep learning-based optical flow, hand-crafted features, and intensity-based registration, however, the framework is general and could take as input other sources of motion estimation, including other sensor modalities. We validate the performance of our approach on three datasets with very different characteristics to highlighting its generalisability, demonstrating the advantages of our proposed fusion framework. While each individual registration algorithm eventually fails drastically on certain surgical scenes, the fusion approach flexibly determines which algorithms to use and in which proportion to more robustly obtain consistent mosaics.
Collapse
Affiliation(s)
- Liang Li
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences(WEISS) and Department of Computer Science, University College London, London, UK; College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, China.
| | - Evangelos Mazomenos
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences(WEISS) and Department of Computer Science, University College London, London, UK.
| | - James H Chandler
- Storm Lab UK, School of Electronic, and Electrical Engineering, University of Leeds, Leeds LS2 9JT, UK.
| | - Keith L Obstein
- Division of Gastroenterology, Hepatology, and Nutrition, Vanderbilt University Medical Center, Nashville, TN 37232, USA; STORM Lab, Department of Mechanical Engineering, Vanderbilt University, Nashville, TN 37235, USA.
| | - Pietro Valdastri
- Storm Lab UK, School of Electronic, and Electrical Engineering, University of Leeds, Leeds LS2 9JT, UK.
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences(WEISS) and Department of Computer Science, University College London, London, UK.
| | - Francisco Vasconcelos
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences(WEISS) and Department of Computer Science, University College London, London, UK.
| |
Collapse
|
6
|
Zhou H, Jayender J. EMDQ: Removal of Image Feature Mismatches in Real-Time. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:706-720. [PMID: 34914589 PMCID: PMC8777235 DOI: 10.1109/tip.2021.3134456] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This paper proposes a novel method for removing image feature mismatches in real-time that can handle both rigid and smooth deforming environments. Image distortion, parallax and object deformation may cause the pixel coordinates of feature matches to have non-rigid deformations, which cannot be represented using a single analytical rigid transformation. To solve this problem, we propose an algorithm based on the re-weighting and 1-point RANSAC strategy (R1P-RNSC), which operates under the assumption that a non-rigid deformation can be approximately represented by multiple rigid transformations. R1P-RNSC is fast but suffers from the drawback that local smoothing information cannot be considered, thus limiting its accuracy. To solve this problem, we propose a non-parametric algorithm based on the expectation-maximization algorithm and the dual quaternion-based representation (EMDQ). EMDQ generates dense and smooth deformation fields by interpolating among the feature matches, simultaneously removing mismatches that are inconsistent with the deformation field. It relies on the rigid transformations obtained by R1P-RNSC to improve its accuracy. The experimental results demonstrate that EMDQ has superior accuracy compared to other state-of-the-art mismatch removal methods. The ability to build correspondences for all image pixels using the dense deformation field is another contribution of this paper.
Collapse
|