1
|
Yang Z, Dai J, Pan J. 3D reconstruction from endoscopy images: A survey. Comput Biol Med 2024; 175:108546. [PMID: 38704902 DOI: 10.1016/j.compbiomed.2024.108546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/05/2024] [Accepted: 04/28/2024] [Indexed: 05/07/2024]
Abstract
Three-dimensional reconstruction of images acquired through endoscopes is playing a vital role in an increasing number of medical applications. Endoscopes used in the clinic are commonly classified as monocular endoscopes and binocular endoscopes. We have reviewed the classification of methods for depth estimation according to the type of endoscope. Basically, depth estimation relies on feature matching of images and multi-view geometry theory. However, these traditional techniques have many problems in the endoscopic environment. With the increasing development of deep learning techniques, there is a growing number of works based on learning methods to address challenges such as inconsistent illumination and texture sparsity. We have reviewed over 170 papers published in the 10 years from 2013 to 2023. The commonly used public datasets and performance metrics are summarized. We also give a taxonomy of methods and analyze the advantages and drawbacks of algorithms. Summary tables and result atlas are listed to facilitate the comparison of qualitative and quantitative performance of different methods in each category. In addition, we summarize commonly used scene representation methods in endoscopy and speculate on the prospects of deep estimation research in medical applications. We also compare the robustness performance, processing time, and scene representation of the methods to facilitate doctors and researchers in selecting appropriate methods based on surgical applications.
Collapse
Affiliation(s)
- Zhuoyue Yang
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China; Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China
| | - Ju Dai
- Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China
| | - Junjun Pan
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China; Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China.
| |
Collapse
|
2
|
Yang Z, Pan J, Dai J, Sun Z, Xiao Y. Self-Supervised Lightweight Depth Estimation in Endoscopy Combining CNN and Transformer. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1934-1944. [PMID: 38198275 DOI: 10.1109/tmi.2024.3352390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2024]
Abstract
In recent years, an increasing number of medical engineering tasks, such as surgical navigation, pre-operative registration, and surgical robotics, rely on 3D reconstruction techniques. Self-supervised depth estimation has attracted interest in endoscopic scenarios because it does not require ground truth. Most existing methods depend on expanding the size of parameters to improve their performance. There, designing a lightweight self-supervised model that can obtain competitive results is a hot topic. We propose a lightweight network with a tight coupling of convolutional neural network (CNN) and Transformer for depth estimation. Unlike other methods that use CNN and Transformer to extract features separately and then fuse them on the deepest layer, we utilize the modules of CNN and Transformer to extract features at different scales in the encoder. This hierarchical structure leverages the advantages of CNN in texture perception and Transformer in shape extraction. In the same scale of feature extraction, the CNN is used to acquire local features while the Transformer encodes global information. Finally, we add multi-head attention modules to the pose network to improve the accuracy of predicted poses. Experiments demonstrate that our approach obtains comparable results while effectively compressing the model parameters on two datasets.
Collapse
|
3
|
Schmidt A, Mohareri O, DiMaio S, Yip MC, Salcudean SE. Tracking and mapping in medical computer vision: A review. Med Image Anal 2024; 94:103131. [PMID: 38442528 DOI: 10.1016/j.media.2024.103131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 02/08/2024] [Accepted: 02/29/2024] [Indexed: 03/07/2024]
Abstract
As computer vision algorithms increase in capability, their applications in clinical systems will become more pervasive. These applications include: diagnostics, such as colonoscopy and bronchoscopy; guiding biopsies, minimally invasive interventions, and surgery; automating instrument motion; and providing image guidance using pre-operative scans. Many of these applications depend on the specific visual nature of medical scenes and require designing algorithms to perform in this environment. In this review, we provide an update to the field of camera-based tracking and scene mapping in surgery and diagnostics in medical computer vision. We begin with describing our review process, which results in a final list of 515 papers that we cover. We then give a high-level summary of the state of the art and provide relevant background for those who need tracking and mapping for their clinical applications. After which, we review datasets provided in the field and the clinical needs that motivate their design. Then, we delve into the algorithmic side, and summarize recent developments. This summary should be especially useful for algorithm designers and to those looking to understand the capability of off-the-shelf methods. We maintain focus on algorithms for deformable environments while also reviewing the essential building blocks in rigid tracking and mapping since there is a large amount of crossover in methods. With the field summarized, we discuss the current state of the tracking and mapping methods along with needs for future algorithms, needs for quantification, and the viability of clinical applications. We then provide some research directions and questions. We conclude that new methods need to be designed or combined to support clinical applications in deformable environments, and more focus needs to be put into collecting datasets for training and evaluation.
Collapse
Affiliation(s)
- Adam Schmidt
- Department of Electrical and Computer Engineering, University of British Columbia, 2329 West Mall, Vancouver V6T 1Z4, BC, Canada.
| | - Omid Mohareri
- Advanced Research, Intuitive Surgical, 1020 Kifer Rd, Sunnyvale, CA 94086, USA
| | - Simon DiMaio
- Advanced Research, Intuitive Surgical, 1020 Kifer Rd, Sunnyvale, CA 94086, USA
| | - Michael C Yip
- Department of Electrical and Computer Engineering, University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA
| | - Septimiu E Salcudean
- Department of Electrical and Computer Engineering, University of British Columbia, 2329 West Mall, Vancouver V6T 1Z4, BC, Canada
| |
Collapse
|
4
|
Zhang Z, Song H, Fan J, Fu T, Li Q, Ai D, Xiao D, Yang J. Dual-correlate optimized coarse-fine strategy for monocular laparoscopic videos feature matching via multilevel sequential coupling feature descriptor. Comput Biol Med 2024; 169:107890. [PMID: 38168646 DOI: 10.1016/j.compbiomed.2023.107890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 12/13/2023] [Accepted: 12/18/2023] [Indexed: 01/05/2024]
Abstract
Feature matching of monocular laparoscopic videos is crucial for visualization enhancement in computer-assisted surgery, and the keys to conducting high-quality matches are accurate homography estimation, relative pose estimation, as well as sufficient matches and fast calculation. However, limited by various monocular laparoscopic imaging characteristics such as highlight noises, motion blur, texture interference and illumination variation, most exiting feature matching methods face the challenges of producing high-quality matches efficiently and sufficiently. To overcome these limitations, this paper presents a novel sequential coupling feature descriptor to extract and express multilevel feature maps efficiently, and a dual-correlate optimized coarse-fine strategy to establish dense matches in coarse level and adjust pixel-wise matches in fine level. Firstly, a novel sequential coupling swin transformer layer is designed in feature descriptor to learn and extract multilevel feature representations richly without increasing complexity. Then, a dual-correlate optimized coarse-fine strategy is proposed to match coarse feature sequences under low resolution, and the correlated fine feature sequences is optimized to refine pixel-wise matches based on coarse matching priors. Finally, the sequential coupling feature descriptor and dual-correlate optimization are merged into the Sequential Coupling Dual-Correlate Network (SeCo DC-Net) to produce high-quality matches. The evaluation is conducted on two public laparoscopic datasets: Scared and EndoSLAM, and the experimental results show the proposed network outperforms state-of-the-art methods in homography estimation, relative pose estimation, reprojection error, matching pairs number and inference runtime. The source code is publicly available at https://github.com/Iheckzza/FeatureMatching.
Collapse
Affiliation(s)
- Ziang Zhang
- The School of Medical Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Hong Song
- The School of Computer Science & Technology, Beijing Institute of Technology, Beijing, 100081, China.
| | - Jingfan Fan
- The School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Tianyu Fu
- The School of Medical Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Qiang Li
- The School of Computer Science & Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Danni Ai
- The School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Deqaing Xiao
- The School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Jian Yang
- The School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| |
Collapse
|
5
|
Liu S, Fan J, Yang Y, Xiao D, Ai D, Song H, Wang Y, Yang J. Monocular endoscopy images depth estimation with multi-scale residual fusion. Comput Biol Med 2024; 169:107850. [PMID: 38145602 DOI: 10.1016/j.compbiomed.2023.107850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 11/16/2023] [Accepted: 12/11/2023] [Indexed: 12/27/2023]
Abstract
BACKGROUND Monocular depth estimation plays a fundamental role in clinical endoscopy surgery. However, the coherent illumination, smooth surfaces, and texture-less nature of endoscopy images present significant challenges to traditional depth estimation methods. Existing approaches struggle to accurately perceive depth in such settings. METHOD To overcome these challenges, this paper proposes a novel multi-scale residual fusion method for estimating the depth of monocular endoscopy images. Specifically, we address the issue of coherent illumination by leveraging image frequency domain component space transformation, thereby enhancing the stability of the scene's light source. Moreover, we employ an image radiation intensity attenuation model to estimate the initial depth map. Finally, to refine the accuracy of depth estimation, we utilize a multi-scale residual fusion optimization technique. RESULTS To evaluate the performance of our proposed method, extensive experiments were conducted on public datasets. The structural similarity measures for continuous frames in three distinct clinical data scenes reached impressive values of 0.94, 0.82, and 0.84, respectively. These results demonstrate the effectiveness of our approach in capturing the intricate details of endoscopy images. Furthermore, the depth estimation accuracy achieved remarkable levels of 89.3 % and 91.2 % for the two models' data, respectively, underscoring the robustness of our method. CONCLUSIONS Overall, the promising results obtained on public datasets highlight the significant potential of our method for clinical applications, facilitating reliable depth estimation and enhancing the quality of endoscopy surgical procedures.
Collapse
Affiliation(s)
- Shiyuan Liu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China; China Center for Information Industry Development, Beijing, 100081, China
| | - Jingfan Fan
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Yun Yang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, National Clinical Research Center for Digestive Diseases, Beijing 100050, China
| | - Deqiang Xiao
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Danni Ai
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Hong Song
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Yongtian Wang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Jian Yang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
6
|
Luo X, Xie L, Zeng HQ, Wang X, Li S. Monocular endoscope 6-DoF tracking with constrained evolutionary stochastic filtering. Med Image Anal 2023; 89:102928. [PMID: 37603943 DOI: 10.1016/j.media.2023.102928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Revised: 05/15/2023] [Accepted: 08/03/2023] [Indexed: 08/23/2023]
Abstract
Monocular endoscopic 6-DoF camera tracking plays a vital role in surgical navigation that involves multimodal images to build augmented or virtual reality surgery. Such a 6-DoF camera tracking generally can be formulated as a nonlinear optimization problem. To resolve this nonlinear problem, this work proposes a new pipeline of constrained evolutionary stochastic filtering that originally introduces spatial constraints and evolutionary stochastic diffusion to deal with particle degeneracy and impoverishment in current stochastic filtering methods. With its application to endoscope 6-DoF tracking and validation on clinical data including more than 59,000 endoscopic video frames acquired from various surgical procedures, the experimental results demonstrate the effectiveness of the new pipeline that works much better than state-of-the-art tracking methods. In particular, it can significantly improve the accuracy of current monocular endoscope tracking approaches from (4.83 mm, 10.2∘) to (2.78 mm, 7.44∘).
Collapse
Affiliation(s)
- Xiongbiao Luo
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361102, China; Department of Computer Science and Technology, Xiamen University, Xiamen 361005, China; Discipline of Intelligent Instrument and Equipment, Xiamen University, Xiamen 361102, China; Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University, Xiamen 361005, China.
| | - Lixin Xie
- College of Pulmonary and Critical Care Medicine, Chinese PLA General Hospital, Beijing 100853, China
| | - Hui-Qing Zeng
- Department of Pulmonary and Critical Care Medicine, Zhongshan Hospital, Xiamen University, Xiamen 361004, China.
| | - Xiaoying Wang
- Department of Liver Surgery, Zhongshan Hospital, Fudan University, Shanghai 200032, China.
| | - Shiyue Li
- The First Affiliated Hospital of Guangzhou Medical University, Guangzhou 510120, China
| |
Collapse
|
7
|
Hirohata Y, Sogabe M, Miyazaki T, Kawase T, Kawashima K. Confidence-aware self-supervised learning for dense monocular depth estimation in dynamic laparoscopic scene. Sci Rep 2023; 13:15380. [PMID: 37717055 PMCID: PMC10505201 DOI: 10.1038/s41598-023-42713-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 09/13/2023] [Indexed: 09/18/2023] Open
Abstract
This paper tackles the challenge of accurate depth estimation from monocular laparoscopic images in dynamic surgical environments. The lack of reliable ground truth due to inconsistencies within these images makes this a complex task. Further complicating the learning process is the presence of noise elements like bleeding and smoke. We propose a model learning framework that uses a generic laparoscopic surgery video dataset for training, aimed at achieving precise monocular depth estimation in dynamic surgical settings. The architecture employs binocular disparity confidence information as a self-supervisory signal, along with the disparity information from a stereo laparoscope. Our method ensures robust learning amidst outliers, influenced by tissue deformation, smoke, and surgical instruments, by utilizing a unique loss function. This function adjusts the selection and weighting of depth data for learning based on their given confidence. We trained the model using the Hamlyn Dataset and verified it with Hamlyn Dataset test data and a static dataset. The results show exceptional generalization performance and efficacy for various scene dynamics, laparoscope types, and surgical sites.
Collapse
Affiliation(s)
- Yasuhide Hirohata
- The Department of Information Physics and Computing, The University of Tokyo, Tokyo, 113-8656, Japan
| | - Maina Sogabe
- The Department of Information Physics and Computing, The University of Tokyo, Tokyo, 113-8656, Japan.
| | - Tetsuro Miyazaki
- The Department of Information Physics and Computing, The University of Tokyo, Tokyo, 113-8656, Japan
| | - Toshihiro Kawase
- The School of Engineering Department of Information and Communication Engineering, Tokyo Denki University, Tokyo, 120-8551, Japan
| | - Kenji Kawashima
- The Department of Information Physics and Computing, The University of Tokyo, Tokyo, 113-8656, Japan
| |
Collapse
|
8
|
Zhang X, Ji X, Wang J, Fan Y, Tao C. Renal surface reconstruction and segmentation for image-guided surgical navigation of laparoscopic partial nephrectomy. Biomed Eng Lett 2023; 13:165-174. [PMID: 37124114 PMCID: PMC10130295 DOI: 10.1007/s13534-023-00263-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 12/01/2022] [Accepted: 01/22/2023] [Indexed: 02/04/2023] Open
Abstract
An unpredictable dynamic surgical environment makes it necessary to measure morphological information of target tissue real-time for laparoscopic image-guided navigation. The stereo vision method for intraoperative tissue 3D reconstruction has the most potential for clinical development benefiting from its high reconstruction accuracy and laparoscopy compatibility. However, existing stereo vision methods have difficulty in achieving high reconstruction accuracy in real time. Also, intraoperative tissue reconstruction results often contain complex background and instrument information that prevents clinical development for image-guided systems. Taking laparoscopic partial nephrectomy (LPN) as the research object, this paper realizes a real-time dense reconstruction and extraction of the kidney tissue surface. The central symmetrical Census based semi-global block stereo matching algorithm is proposed to generate a dense disparity map. A GPU-based pixel-by-pixel connectivity segmentation mechanism is designed to segment the renal tissue area. An in-vitro porcine heart, in-vivo porcine kidney and offline clinical LPN data were performed to evaluate the accuracy and effectiveness of our approach. The algorithm achieved a reconstruction accuracy of ± 2 mm with a real-time update rate of 21 fps for an HD image size of 960 × 540, and 91.0% target tissue segmentation accuracy even with surgical instrument occlusions. Experimental results have demonstrated that the proposed method could accurately reconstruct and extract renal surface in real-time in LPN. The measurement results can be used directly for image-guided systems. Our method provides a new way to measure geometric information of target tissue intraoperatively in laparoscopy surgery. Supplementary Information The online version contains supplementary material available at 10.1007/s13534-023-00263-1.
Collapse
Affiliation(s)
- Xiaohui Zhang
- School of Engineering Medicine, Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100083 China
| | - Xuquan Ji
- School of Biomedical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100083 China
| | - Junchen Wang
- School of Mechanical Engineering and Automation, Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang Unviersity, No. 37 Xueyuan Road, Haidian District, Beijing, 100083 China
| | - Yubo Fan
- School of Engineering Medicine, Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100083 China
- School of Biomedical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100083 China
| | - Chunjing Tao
- School of Engineering Medicine, Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100083 China
| |
Collapse
|
9
|
Emaduddin M, Halic T, Demirel D, Bayrak C, Arikatla VS, De S. Specular Reflection Removal for 3D Reconstruction of Tissues using Endoscopy Videos. PROCEEDINGS OF IEEE SOUTHEASTCON. IEEE SOUTHEASTCON 2023; 2023:246-252. [PMID: 37900192 PMCID: PMC10603791 DOI: 10.1109/southeastcon51012.2023.10115137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/31/2023]
Abstract
Endoscopy is widely employed for diagnostic examination of the interior of organs and body cavities and numerous surgical interventions. Still, the inability to correlate individual 2D images with 3D organ morphology limits its applications, especially in intra-operative planning and navigation, disease physiology, cancer surveillance, etc. As a result, most endoscopy videos, which carry enormous data potential, are used only for real-time guidance and are discarded after collection. We present a complete method for the 3D reconstruction of inner organs that suggests image extraction techniques from endoscopic videos and a novel image pre-processing technique to reconstruct and visualize a 3D model of organs from an endoscopic video. We use advanced computer vision methods and do not require any modifications to the clinical-grade endoscopy hardware. We have also formalized an image acquisition protocol through experimentation with a calibrated test bed. We validate the accuracy and robustness of our reconstruction using a test bed with known ground truth. Our method can significantly contribute to endoscopy-based diagnostic and surgical procedures using comprehensive tissue and tumor 3D visualization.
Collapse
Affiliation(s)
- Muhammad Emaduddin
- Department of Computer Science and Engineering, Texas A&M University, College Station, Texas
| | | | - Doga Demirel
- Department of Computer Science, Florida Polytechnic University, Lakeland, Florida
| | - Coskun Bayrak
- Department of Computer Science, Youngstown State University, Youngstown, OH
| | | | - Suvranu De
- College of Engineering, Florida A&M University - Florida State University, Tallahassee, Florida
| |
Collapse
|
10
|
Wang Y, Zhao L, Gong L, Chen X, Zuo S. A monocular SLAM system based on SIFT features for gastroscope tracking. Med Biol Eng Comput 2023; 61:511-523. [PMID: 36534372 DOI: 10.1007/s11517-022-02739-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 12/09/2022] [Indexed: 12/23/2022]
Abstract
During flexible gastroscopy, physicians have extreme difficulties to self-localize. Camera tracking method such as simultaneous localization and mapping (SLAM) has become a research hotspot in recent years, allowing tracking of the endoscope. However, most of the existing solutions have focused on tasks in which sufficient texture information is available, such as laparoscope tracking, and cannot be applied to gastroscope tracking since gastroscopic images have fewer textures than laparoscopic images. This paper proposes a new monocular SLAM framework based on scale-invariant feature transform (SIFT) and narrow-band imaging (NBI), which extracts SIFT features instead of oriented features from accelerated segment test (FAST) and rotated binary robust independent elementary features (BRIEF) features from gastroscopic NBI images, and performs feature retention based on the response sorting strategy for achieving more matches. Experimental results show that the root mean squared error of the proposed algorithm can reach a minimum of 2.074 mm, and the pose accuracy can be improved by up to 25.73% compared with oriented FAST and rotated BRIEF (ORB)-SLAM. SIFT features and response sorting strategy can achieve more accurate matching in gastroscopic NBI images than other features and homogenization strategy, and the proposed algorithm can also run successfully on real clinical gastroscopic data. The proposed algorithm has the potential clinical value to assist physicians in locating the gastroscope during gastroscopy.
Collapse
Affiliation(s)
- Yifan Wang
- Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, Tianjin, China
| | - Liang Zhao
- Faculty of Engineering and Information Technology, Robotics Institute, University of Technology Sydney, Sydney, Australia
| | - Lun Gong
- Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, Tianjin, China
| | - Xin Chen
- Tianjin Medical University General Hospital, Tianjin, China
| | - Siyang Zuo
- Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, Tianjin, China.
| |
Collapse
|
11
|
|
12
|
Tukra S, Lidströmer N, Ashrafian H, Gianarrou S. AI in Surgical Robotics. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
13
|
Zhang Z, Wang L, Zheng W, Yin L, Hu R, Yang B. Endoscope image mosaic based on pyramid ORB. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2021.103261] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
14
|
Edwards PJE, Psychogyios D, Speidel S, Maier-Hein L, Stoyanov D. SERV-CT: A disparity dataset from cone-beam CT for validation of endoscopic 3D reconstruction. Med Image Anal 2021; 76:102302. [PMID: 34906918 PMCID: PMC8961000 DOI: 10.1016/j.media.2021.102302] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 11/01/2021] [Accepted: 11/04/2021] [Indexed: 11/27/2022]
Abstract
Full torso porcine CT model for stereo-endoscopic reconstruction validation CT of endoscope and anatomy with constrained manual alignment provides a reference Accuracy analysis of repeated alignments and performance of existing algorithms presented Open sourced dataset for stereo reconstruction validation
In computer vision, reference datasets from simulation and real outdoor scenes have been highly successful in promoting algorithmic development in stereo reconstruction. Endoscopic stereo reconstruction for surgical scenes gives rise to specific problems, including the lack of clear corner features, highly specular surface properties and the presence of blood and smoke. These issues present difficulties for both stereo reconstruction itself and also for standardised dataset production. Previous datasets have been produced using computed tomography (CT) or structured light reconstruction on phantom or ex vivo models. We present a stereo-endoscopic reconstruction validation dataset based on cone-beam CT (SERV-CT). Two ex vivo small porcine full torso cadavers were placed within the view of the endoscope with both the endoscope and target anatomy visible in the CT scan. Subsequent orientation of the endoscope was manually aligned to match the stereoscopic view and benchmark disparities, depths and occlusions are calculated. The requirement of a CT scan limited the number of stereo pairs to 8 from each ex vivo sample. For the second sample an RGB surface was acquired to aid alignment of smooth, featureless surfaces. Repeated manual alignments showed an RMS disparity accuracy of around 2 pixels and a depth accuracy of about 2 mm. A simplified reference dataset is provided consisting of endoscope image pairs with corresponding calibration, disparities, depths and occlusions covering the majority of the endoscopic image and a range of tissue types, including smooth specular surfaces, as well as significant variation of depth. We assessed the performance of various stereo algorithms from online available repositories. There is a significant variation between algorithms, highlighting some of the challenges of surgical endoscopic images. The SERV-CT dataset provides an easy to use stereoscopic validation for surgical applications with smooth reference disparities and depths covering the majority of the endoscopic image. This complements existing resources well and we hope will aid the development of surgical endoscopic anatomical reconstruction algorithms.
Collapse
Affiliation(s)
- P J Eddie Edwards
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London (UCL), Charles Bell House, 43-45 Foley Street, London W1W 7TS, UK.
| | - Dimitris Psychogyios
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London (UCL), Charles Bell House, 43-45 Foley Street, London W1W 7TS, UK
| | - Stefanie Speidel
- Division of Translational Surgical Oncology, National Center for Tumor Diseases (NCT) Dresden, Dresden, 01307, Germany
| | - Lena Maier-Hein
- Division of Medical and Biological Informatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London (UCL), Charles Bell House, 43-45 Foley Street, London W1W 7TS, UK
| |
Collapse
|
15
|
Recasens D, Lamarca J, Facil JM, Montiel JMM, Civera J. Endo-Depth-and-Motion: Reconstruction and Tracking in Endoscopic Videos Using Depth Networks and Photometric Constraints. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2021.3095528] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
16
|
Fu Z, Jin Z, Zhang C, Dai Y, Gao X, Wang Z, Li L, Ding G, Hu H, Wang P, Ye X. Visual-electromagnetic system: A novel fusion-based monocular localization, reconstruction, and measurement for flexible ureteroscopy. Int J Med Robot 2021; 17:e2274. [PMID: 33960604 DOI: 10.1002/rcs.2274] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 02/16/2021] [Accepted: 05/03/2021] [Indexed: 12/29/2022]
Abstract
BACKGROUND During flexible ureteroscopy (FURS), surgeons may lose orientation due to intrarenal structural similarities and complex shape of the pyelocaliceal cavity. Decision-making required after initially misjudging stone size will also increase the operative time and risk of severe complications. METHODS A intraoperative navigation system based on electromagnetic tracking (EMT) and simultaneous localization and mapping (SLAM) was proposed to track the tip of the ureteroscope and reconstruct a dense intrarenal three-dimensional (3D) map. Furthermore, the contour lines of stones were segmented to measure the size. RESULTS Our system was evaluated on a kidney phantom, achieving an absolute trajectory accuracy root mean square error (RMSE) of 0.6 mm. The median error of the longitudinal and transversal measurements was 0.061 and 0.074 mm, respectively. The in vivo experiment also demonstrated the effectiveness. CONCLUSION The proposed system worked effectively in tracking and measurement. Further, this system can be extended to other surgical applications involving cavities, branches and intelligent robotic surgery.
Collapse
Affiliation(s)
- Zuoming Fu
- Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Ziyi Jin
- Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Chongan Zhang
- Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Yu Dai
- College of Artificial Intelligence, Nankai University, Tianjin, China
| | - Xiaofeng Gao
- Department of Urology, Changhai Hospital, Shanghai, China
| | - Zeyu Wang
- Department of Urology, Changhai Hospital, Shanghai, China
| | - Ling Li
- Department of Urology, Changhai Hospital, Shanghai, China
| | - Guoqing Ding
- Department of Urology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Haiyi Hu
- Department of Urology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Peng Wang
- Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Xuesong Ye
- Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| |
Collapse
|
17
|
Ma R, Wang R, Zhang Y, Pizer S, McGill SK, Rosenman J, Frahm JM. RNNSLAM: Reconstructing the 3D colon to visualize missing regions during a colonoscopy. Med Image Anal 2021; 72:102100. [PMID: 34102478 DOI: 10.1016/j.media.2021.102100] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 04/30/2021] [Accepted: 05/05/2021] [Indexed: 11/19/2022]
Abstract
Colonoscopy is the gold standard for pre-cancerous polyps screening and treatment. The polyp detection rate is highly tied to the percentage of surveyed colonic surface. However, current colonoscopy technique cannot guarantee that all the colonic surface is well examined because of incomplete camera orientations and of occlusions. The missing regions can hardly be noticed in a continuous first-person perspective. Therefore, a useful contribution would be an automatic system that can compute missing regions from an endoscopic video in real-time and alert the endoscopists when a large missing region is detected. We present a novel method that reconstructs dense chunks of a 3D colon in real time, leaving the unsurveyed part unreconstructed. The method combines a standard SLAM system with a depth and pose prediction network to achieve much more robust tracking and less drift. It addresses the difficulties for colonoscopic images of existing simultaneous localization and mapping (SLAM) systems and end-to-end deep learning methods.
Collapse
Affiliation(s)
- Ruibin Ma
- University of North Carolina at Chapel Hill, Chapel Hill, NC 27705, USA.
| | - Rui Wang
- University of North Carolina at Chapel Hill, Chapel Hill, NC 27705, USA
| | - Yubo Zhang
- University of North Carolina at Chapel Hill, Chapel Hill, NC 27705, USA
| | - Stephen Pizer
- University of North Carolina at Chapel Hill, Chapel Hill, NC 27705, USA
| | - Sarah K McGill
- University of North Carolina at Chapel Hill, Chapel Hill, NC 27705, USA
| | - Julian Rosenman
- University of North Carolina at Chapel Hill, Chapel Hill, NC 27705, USA
| | - Jan-Michael Frahm
- University of North Carolina at Chapel Hill, Chapel Hill, NC 27705, USA
| |
Collapse
|
18
|
Ozyoruk KB, Gokceler GI, Bobrow TL, Coskun G, Incetan K, Almalioglu Y, Mahmood F, Curto E, Perdigoto L, Oliveira M, Sahin H, Araujo H, Alexandrino H, Durr NJ, Gilbert HB, Turan M. EndoSLAM dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Med Image Anal 2021; 71:102058. [PMID: 33930829 DOI: 10.1016/j.media.2021.102058] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 01/23/2021] [Accepted: 03/29/2021] [Indexed: 02/07/2023]
Abstract
Deep learning techniques hold promise to develop dense topography reconstruction and pose estimation methods for endoscopic videos. However, currently available datasets do not support effective quantitative benchmarking. In this paper, we introduce a comprehensive endoscopic SLAM dataset consisting of 3D point cloud data for six porcine organs, capsule and standard endoscopy recordings, synthetically generated data as well as clinically in use conventional endoscope recording of the phantom colon with computed tomography(CT) scan ground truth. A Panda robotic arm, two commercially available capsule endoscopes, three conventional endoscopes with different camera properties, two high precision 3D scanners, and a CT scanner were employed to collect data from eight ex-vivo porcine gastrointestinal (GI)-tract organs and a silicone colon phantom model. In total, 35 sub-datasets are provided with 6D pose ground truth for the ex-vivo part: 18 sub-datasets for colon, 12 sub-datasets for stomach, and 5 sub-datasets for small intestine, while four of these contain polyp-mimicking elevations carried out by an expert gastroenterologist. To verify the applicability of this data for use with real clinical systems, we recorded a video sequence with a state-of-the-art colonoscope from a full representation silicon colon phantom. Synthetic capsule endoscopy frames from stomach, colon, and small intestine with both depth and pose annotations are included to facilitate the study of simulation-to-real transfer learning algorithms. Additionally, we propound Endo-SfMLearner, an unsupervised monocular depth and pose estimation method that combines residual networks with a spatial attention module in order to dictate the network to focus on distinguishable and highly textured tissue regions. The proposed approach makes use of a brightness-aware photometric loss to improve the robustness under fast frame-to-frame illumination changes that are commonly seen in endoscopic videos. To exemplify the use-case of the EndoSLAM dataset, the performance of Endo-SfMLearner is extensively compared with the state-of-the-art: SC-SfMLearner, Monodepth2, and SfMLearner. The codes and the link for the dataset are publicly available at https://github.com/CapsuleEndoscope/EndoSLAM. A video demonstrating the experimental setup and procedure is accessible as Supplementary Video 1.
Collapse
Affiliation(s)
| | | | - Taylor L Bobrow
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Gulfize Coskun
- Institute of Biomedical Engineering, Bogazici University, Turkey
| | - Kagan Incetan
- Institute of Biomedical Engineering, Bogazici University, Turkey
| | | | - Faisal Mahmood
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Cancer Data Science, Dana Farber Cancer Institute, Boston, MA, USA; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Eva Curto
- Institute for Systems and Robotics, University of Coimbra, Portugal
| | - Luis Perdigoto
- Institute for Systems and Robotics, University of Coimbra, Portugal
| | - Marina Oliveira
- Institute for Systems and Robotics, University of Coimbra, Portugal
| | - Hasan Sahin
- Institute of Biomedical Engineering, Bogazici University, Turkey
| | - Helder Araujo
- Institute for Systems and Robotics, University of Coimbra, Portugal
| | - Henrique Alexandrino
- Faculty of Medicine, Clinical Academic Center of Coimbra, University of Coimbra, Coimbra, Portugal
| | - Nicholas J Durr
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Hunter B Gilbert
- Department of Mechanical and Industrial Engineering, Louisiana State University, Baton Rouge, LA, USA
| | - Mehmet Turan
- Institute of Biomedical Engineering, Bogazici University, Turkey.
| |
Collapse
|
19
|
Widya AR, Monno Y, Okutomi M, Suzuki S, Gotoda T, Miki K. Stomach 3D Reconstruction Using Virtual Chromoendoscopic Images. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE-JTEHM 2021; 9:1700211. [PMID: 33796417 PMCID: PMC8009143 DOI: 10.1109/jtehm.2021.3062226] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 01/19/2021] [Accepted: 02/15/2021] [Indexed: 12/23/2022]
Abstract
Gastric endoscopy is a golden standard in the clinical process that enables medical practitioners to diagnose various lesions inside a patient’s stomach. If a lesion is found, a success in identifying the location of the found lesion relative to the global view of the stomach will lead to better decision making for the next clinical treatment. Our previous research showed that the lesion localization could be achieved by reconstructing the whole stomach shape from chromoendoscopic indigo carmine (IC) dye-sprayed images using a structure-from-motion (SfM) pipeline. However, spraying the IC dye to the whole stomach requires additional time, which is not desirable for both patients and practitioners. Our objective is to propose an alternative way to achieve whole stomach 3D reconstruction without the need of the IC dye. We generate virtual IC-sprayed (VIC) images based on image-to-image style translation trained on unpaired real no-IC and IC-sprayed images, where we have investigated the effect of input and output color channel selection for generating the VIC images. We validate our reconstruction results by comparing them with the results using real IC-sprayed images and confirm that the obtained stomach 3D structures are comparable to each other. We also propose a local reconstruction technique to obtain a more detailed surface and texture around an interesting region. The proposed method achieves the whole stomach reconstruction without the need of real IC dye using SfM. We have found that translating no-IC green-channel images to IC-sprayed red-channel images gives the best SfM reconstruction result. Clinical impact We offer a method of the frame localization and local 3D reconstruction of a found gastric lesion using standard endoscopy images, leading to better clinical decision.
Collapse
Affiliation(s)
- Aji Resindra Widya
- Department of Systems and Control EngineeringSchool of EngineeringTokyo Institute of TechnologyTokyo152-8550Japan
| | - Yusuke Monno
- Department of Systems and Control EngineeringSchool of EngineeringTokyo Institute of TechnologyTokyo152-8550Japan
| | - Masatoshi Okutomi
- Department of Systems and Control EngineeringSchool of EngineeringTokyo Institute of TechnologyTokyo152-8550Japan
| | - Sho Suzuki
- Division of Gastroenterology and HepatologyDepartment of MedicineNihon University School of MedicineTokyo101-8309Japan
| | - Takuji Gotoda
- Division of Gastroenterology and HepatologyDepartment of MedicineNihon University School of MedicineTokyo101-8309Japan
| | - Kenji Miki
- Department of Internal MedicineTsujinaka Hospital KashiwanohaKashiwa277-0871Japan
| |
Collapse
|
20
|
Lamarca J, Parashar S, Bartoli A, Montiel JMM. DefSLAM: Tracking and Mapping of Deforming Scenes From Monocular Sequences. IEEE T ROBOT 2021. [DOI: 10.1109/tro.2020.3020739] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
21
|
Tukra S, Lidströmer N, Ashrafian H, Giannarou S. AI in Surgical Robotics. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_323-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
22
|
Zhou Y, Eimen RL, Seibel EJ, Bowden AK. Cost-Efficient Video Synthesis and Evaluation for Development of Virtual 3D Endoscopy. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2021; 9:1800711. [PMID: 34950539 PMCID: PMC8673697 DOI: 10.1109/jtehm.2021.3132193] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 10/11/2021] [Accepted: 11/13/2021] [Indexed: 11/06/2022]
Affiliation(s)
- Yaxuan Zhou
- Department of Electrical and Computer EngineeringUniversity of Washington Seattle WA 98195 USA
- Human Photonics LaboratoryDepartment of Mechanical EngineeringUniversity of Washington Seattle WA 98195 USA
| | - Rachel L Eimen
- Department of Biomedical EngineeringVanderbilt University Nashville TN 37232 USA
| | - Eric J Seibel
- Human Photonics LaboratoryDepartment of Mechanical EngineeringUniversity of Washington Seattle WA 98195 USA
| | - Audrey K Bowden
- Department of Biomedical EngineeringVanderbilt University Nashville TN 37232 USA
- Department of Electrical Engineering and Computer ScienceVanderbilt University Nashville TN 37232 USA
| |
Collapse
|
23
|
Xie T, Wang K, Li R, Tang X. Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy. SENSORS 2020; 20:s20236943. [PMID: 33291774 PMCID: PMC7730972 DOI: 10.3390/s20236943] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 11/25/2020] [Accepted: 12/01/2020] [Indexed: 11/16/2022]
Abstract
The traditional CNN for 6D robot relocalization which outputs pose estimations does not interpret whether the model is making sensible predictions or just guessing at random. We found that convnet representations trained on classification problems generalize well to other tasks. Thus, we propose a multi-task CNN for robot relocalization, which can simultaneously perform pose regression and scene recognition. Scene recognition determines whether the input image belongs to the current scene in which the robot is located, not only reducing the error of relocalization but also making us understand with what confidence we can trust the prediction. Meanwhile, we found that when there is a large visual difference between testing images and training images, the pose precision becomes low. Based on this, we present the dual-level image-similarity strategy (DLISS), which consists of two levels: initial level and iteration-level. The initial level performs feature vector clustering in the training set and feature vector acquisition in testing images. The iteration level, namely, the PSO-based image-block selection algorithm, can select the testing images which are the most similar to training images based on the initial level, enabling us to gain higher pose accuracy in testing set. Our method considers both the accuracy and the robustness of relocalization, and it can operate indoors and outdoors in real time, taking at most 27 ms per frame to compute. Finally, we used the Microsoft 7Scenes dataset and the Cambridge Landmarks dataset to evaluate our method. It can obtain approximately 0.33 m and 7.51∘ accuracy on 7Scenes dataset, and get approximately 1.44 m and 4.83∘ accuracy on the Cambridge Landmarks dataset. Compared with PoseNet, our CNN reduced the average positional error by 25% and the average angular error by 27.79% on 7Scenes dataset, and reduced the average positional error by 40% and the average angular error by 28.55% on the Cambridge Landmarks dataset. We show that our multi-task CNN can localize from high-level features and is robust to images which are not in the current scene. Furthermore, we show that our multi-task CNN gets higher accuracy of relocalization by using testing images obtained by DLISS.
Collapse
Affiliation(s)
- Tao Xie
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, 92 Xidazhi Street, Harbin 150006, China; (X.T.); (K.W.)
| | - Ke Wang
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, 92 Xidazhi Street, Harbin 150006, China; (X.T.); (K.W.)
| | - Ruifeng Li
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, 92 Xidazhi Street, Harbin 150006, China; (X.T.); (K.W.)
- Correspondence:
| | - Xinyue Tang
- MFIN, Faculty of Business and Economics, The University of Hong Kong, Pokfulam Road, Hong Kong 999077, China;
| |
Collapse
|
24
|
Hartwig R, Ostler D, Feußner H, Berlet M, Yu K, Rosenthal JC, Wilhelm D. COMPASS: localization in laparoscopic visceral surgery. CURRENT DIRECTIONS IN BIOMEDICAL ENGINEERING 2020. [DOI: 10.1515/cdbme-2020-0013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Tracking of surgical instruments is an essential step towards the modernization of the surgical workflow by a comprehensive surgical landscape guidance system (COMPASS). Real-time tracking of a laparoscopic camera used in minimally-invasive surgery is required for applications in surgical workflow documentation, machine learning, image-localization, and intra-operative visualization. In our approach, an inertial measurement unit (IMU) assists the tool tracking in situations when no line-of-sight is available for infrared (IR) based tracking of the laparoscopic camera. The novelty of this approach lies in the localization method adjusted for the laparoscopic visceral surgery, particularly when the line-of-sight is lost. It is based on IMU tracking and the positioning of the trocar entry point. The trocar entry point is the remote center of motion (RCM), reducing degrees of freedom. We developed a method to tackle localization and a real-time tool for position and orientation estimation. The main error sources are given and evaluated in a test scenario. It reveals that for small changes in penetration length (e.g., pivoting), the IMU’s accuracy determines the error.
Collapse
Affiliation(s)
- Regine Hartwig
- Research Group MITI, Technical University of Munich , Munich , Germany
| | - Daniel Ostler
- Research Group MITI, Technical University of Munich , Munich , Germany
| | - Hubertus Feußner
- Research Group MITI, Technical University of Munich , Munich , Germany
| | - Maximilian Berlet
- Research Group MITI, Technical University of Munich , Munich , Germany
| | - Kevin Yu
- Research Group MITI, Technical University of Munich , Munich , Germany
| | | | - Dirk Wilhelm
- Research Group MITI, Technical University of Munich , Munich , Germany
| |
Collapse
|
25
|
Chu Y, Yang X, Li H, Ai D, Ding Y, Fan J, Song H, Yang J. Multi-level feature aggregation network for instrument identification of endoscopic images. Phys Med Biol 2020; 65:165004. [PMID: 32344381 DOI: 10.1088/1361-6560/ab8dda] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Identification of surgical instruments is crucial in understanding surgical scenarios and providing an assistive process in endoscopic image-guided surgery. This study proposes a novel multilevel feature-aggregated deep convolutional neural network (MLFA-Net) for identifying surgical instruments in endoscopic images. First, a global feature augmentation layer is created on the top layer of the backbone to improve the localization ability of object identification by boosting the high-level semantic information to the feature flow network. Second, a modified interaction path of cross-channel features is proposed to increase the nonlinear combination of features in the same level and improve the efficiency of information propagation. Third, a multiview fusion branch of features is built to aggregate the location-sensitive information of the same level in different views, increase the information diversity of features, and enhance the localization ability of objects. By utilizing the latent information, the proposed network of multilevel feature aggregation can accomplish multitask instrument identification with a single network. Three tasks are handled by the proposed network, including object detection, which classifies the type of instrument and locates its border; mask segmentation, which detects the instrument shape; and pose estimation, which detects the keypoint of instrument parts. The experiments are performed on laparoscopic images from MICCAI 2017 Endoscopic Vision Challenge, and the mean average precision (AP) and average recall (AR) are utilized to quantify the segmentation and pose estimation results. For the bounding box regression, the AP and AR are 79.1% and 63.2%, respectively, while the AP and AR of mask segmentation are 78.1% and 62.1%, and the AP and AR of the pose estimation achieve 67.1% and 55.7%, respectively. The experiments demonstrate that our method efficiently improves the recognition accuracy of the instrument in endoscopic images, and outperforms the other state-of-the-art methods.
Collapse
Affiliation(s)
- Yakui Chu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081 People's Republic of China. Authors contribute equally to this article
| | | | | | | | | | | | | | | |
Collapse
|
26
|
Application of artificial intelligence in surgery. Front Med 2020; 14:417-430. [DOI: 10.1007/s11684-020-0770-0] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Accepted: 03/05/2020] [Indexed: 12/14/2022]
|
27
|
Chu Y, Li H, Li X, Ding Y, Yang X, Ai D, Chen X, Wang Y, Yang J. Endoscopic image feature matching via motion consensus and global bilateral regression. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2020; 190:105370. [PMID: 32036206 DOI: 10.1016/j.cmpb.2020.105370] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 12/17/2019] [Accepted: 01/26/2020] [Indexed: 06/10/2023]
Abstract
BACKGROUND AND OBJECTIVE Feature matching of endoscopic images is of crucial importance in many clinical applications, such as object tracking and surface reconstruction. However, with the presence of low texture, specular reflections and deformations, the feature matching methods of natural scene are facing great challenges in minimally invasive surgery (MIS) scenarios. We propose a novel motion consensus-based method for endoscopic image feature matching to address these problems. METHODS Our method starts by correcting the radial distortion with the spherical projection model and removing the specular reflection regions with an adaptive detection method, which helps to eliminate the image distortion and to reduce the quantity of outliers. We solve the matching problem with a two-stage strategy that progressively estimates a consensus of inliers; the result is a precisely smoothed motion field. First, we construct a spatial motion field from candidate feature matches and estimate its maximum posterior with expectation maximization algorithm, which is computationally efficient and able to obtain smoothed motion field quickly. Second, we extend the smoothed motion field to the affine domain and refine it with bilateral regression to preserve locally subtle motions. The true matches can be identified by checking the difference of feature motion against the estimated field. RESULTS Evaluations are implemented on two simulation datasets of deformation (218 images) and four different types of endoscopic datasets (1032 images). Our method is compared with three other state-of-the-art methods and achieves the best performance on affine transformation and nonrigid deformation simulations, with inlier ratio of 86.7% and 94.3%, sensitivity of 90.0% and 96.2%, precision of 88.2% and 93.9%, and F1-Score of 89.1% and 95.0%, respectively. On clinical datasets evaluations, the proposed method achieves an average reprojection error of 3.7 pixels and a consistent performance in multi-image correspondence of sequential images. Furthermore, we also present a surface reconstruction result from rhinoscopic images to validate the reliability of our method, which shows high-quality feature matching results. CONCLUSIONS The proposed motion consensus-based feature matching method is proved effective and robust for endoscopic images correspondence. This demonstrates its capability to generate reliable feature matches for surface reconstruction and other meaningful applications in MIS scenarios.
Collapse
Affiliation(s)
- Yakui Chu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Electronics, Beijing Institute of Technology, Beijing 100081, China
| | - Heng Li
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Electronics, Beijing Institute of Technology, Beijing 100081, China.
| | - Xu Li
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Electronics, Beijing Institute of Technology, Beijing 100081, China
| | - Yuan Ding
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Electronics, Beijing Institute of Technology, Beijing 100081, China
| | - Xilin Yang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Electronics, Beijing Institute of Technology, Beijing 100081, China
| | - Danni Ai
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Electronics, Beijing Institute of Technology, Beijing 100081, China
| | - Xiaohong Chen
- Department of Otolaryngology, Head and Neck Surgery, Beijing Tongren Hospital, Beijing 100730, China
| | - Yongtian Wang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Electronics, Beijing Institute of Technology, Beijing 100081, China
| | - Jian Yang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Electronics, Beijing Institute of Technology, Beijing 100081, China.
| |
Collapse
|
28
|
Widya AR, Monno Y, Imahori K, Okutomi M, Suzuki S, Gotoda T, Miki K. 3D Reconstruction of Whole Stomach from Endoscope Video Using Structure-from-Motion. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2019:3900-3904. [PMID: 31946725 DOI: 10.1109/embc.2019.8857964] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Gastric endoscopy is a common clinical practice that enables medical doctors to diagnose the stomach inside a body. In order to identify a gastric lesion's location such as early gastric cancer within the stomach, this work addressed to reconstruct the 3D shape of a whole stomach with color texture information generated from a standard monocular endoscope video. Previous works have tried to reconstruct the 3D structures of various organs from endoscope images. However, they are mainly focused on a partial surface. In this work, we investigated how to enable structure-from-motion (SfM) to reconstruct the whole shape of a stomach from a standard endoscope video. We specifically investigated the combined effect of chromo-endoscopy and color channel selection on SfM. Our study found that 3D reconstruction of the whole stomach can be achieved by using red channel images captured under chromo-endoscopy by spreading indigo carmine (IC) dye on the stomach surface.
Collapse
|
29
|
Liu X, Sinha A, Ishii M, Hager GD, Reiter A, Taylor RH, Unberath M. Dense Depth Estimation in Monocular Endoscopy With Self-Supervised Learning Methods. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020; 39:1438-1447. [PMID: 31689184 PMCID: PMC7289272 DOI: 10.1109/tmi.2019.2950936] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
We present a self-supervised approach to training convolutional neural networks for dense depth estimation from monocular endoscopy data without a priori modeling of anatomy or shading. Our method only requires monocular endoscopic videos and a multi-view stereo method, e.g., structure from motion, to supervise learning in a sparse manner. Consequently, our method requires neither manual labeling nor patient computed tomography (CT) scan in the training and application phases. In a cross-patient experiment using CT scans as groundtruth, the proposed method achieved submillimeter mean residual error. In a comparison study to recent self-supervised depth estimation methods designed for natural video on in vivo sinus endoscopy data, we demonstrate that the proposed approach outperforms the previous methods by a large margin. The source code for this work is publicly available online at https://github.com/lppllppl920/EndoscopyDepthEstimation-Pytorch.
Collapse
|
30
|
Qiu L, Ren H. Endoscope navigation with SLAM-based registration to computed tomography for transoral surgery. INTERNATIONAL JOURNAL OF INTELLIGENT ROBOTICS AND APPLICATIONS 2020. [DOI: 10.1007/s41315-020-00127-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
31
|
Zhou H, Jagadeesan J. Real-Time Dense Reconstruction of Tissue Surface From Stereo Optical Video. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020; 39:400-412. [PMID: 31283478 PMCID: PMC6946894 DOI: 10.1109/tmi.2019.2927436] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
We propose an approach to reconstruct dense three-dimensional (3D) model of tissue surface from stereo optical videos in real-time, the basic idea of which is to first extract 3D information from video frames by using stereo matching, and then to mosaic the reconstructed 3D models. To handle the common low-texture regions on tissue surfaces, we propose effective post-processing steps for the local stereo matching method to enlarge the radius of constraint, which include outliers removal, hole filling, and smoothing. Since the tissue models obtained by stereo matching are limited to the field of view of the imaging modality, we propose a model mosaicking method by using a novel feature-based simultaneously localization and mapping (SLAM) method to align the models. Low-texture regions and the varying illumination condition may lead to a large percentage of feature matching outliers. To solve this problem, we propose several algorithms to improve the robustness of the SLAM, which mainly include 1) a histogram voting-based method to roughly select possible inliers from the feature matching results; 2) a novel 1-point RANSAC-based [Formula: see text] algorithm called as DynamicR1PP [Formula: see text] to track the camera motion; and 3) a GPU-based iterative closest points (ICP) and bundle adjustment (BA) method to refine the camera motion estimation results. Experimental results on ex- and in vivo data showed that the reconstructed 3D models have high-resolution texture with an accuracy error of less than 2 mm. Most algorithms are highly parallelized for GPU computation, and the average runtime for processing one key frame is 76.3 ms on stereo images with 960×540 resolution.
Collapse
|
32
|
Furukawa R, Nagamatsu G, Oka S, Kotachi T, Okamoto Y, Tanaka S, Kawasaki H. Simultaneous shape and camera-projector parameter estimation for 3D endoscopic system using CNN-based grid-oneshot scan. Healthc Technol Lett 2019; 6:249-254. [PMID: 32038866 PMCID: PMC6943237 DOI: 10.1049/htl.2019.0070] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Accepted: 10/02/2019] [Indexed: 11/20/2022] Open
Abstract
For effective in situ endoscopic diagnosis and treatment, measurement of polyp sizes is important. For this purpose, 3D endoscopic systems have been researched. Among such systems, an active stereo technique, which projects a special pattern wherein each feature is coded, is a promising approach because of simplicity and high precision. However, previous works of this approach have problems. First, the quality of 3D reconstruction depended on the stabilities of feature extraction from the images captured by the endoscope camera. Second, due to the limited pattern projection area, the reconstructed region was relatively small. In this Letter, the authors propose a learning-based technique using convolutional neural networks to solve the first problem and an extended bundle adjustment technique, which integrates multiple shapes into a consistent single shape, to address the second. The effectiveness of the proposed techniques compared to previous techniques was evaluated experimentally.
Collapse
Affiliation(s)
- Ryo Furukawa
- Graduate School of Information Sciences, Hiroshima City University, Hiroshima, Japan
| | - Genki Nagamatsu
- Graduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
| | - Shiro Oka
- Department of Gastroenterology & Metabolism, Hiroshima University Hospital, Hiroshima, Japan
| | - Takahiro Kotachi
- Department of Endoscopy, Hiroshima University Hospital, Hiroshima, Japan
| | - Yuki Okamoto
- Department of Endoscopy, Hiroshima University Hospital, Hiroshima, Japan
| | - Shinji Tanaka
- Department of Endoscopy, Hiroshima University Hospital, Hiroshima, Japan
| | - Hiroshi Kawasaki
- Graduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
| |
Collapse
|
33
|
Widya AR, Monno Y, Okutomi M, Suzuki S, Gotoda T, Miki K. Whole Stomach 3D Reconstruction and Frame Localization From Monocular Endoscope Video. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2019; 7:3300310. [PMID: 32309059 PMCID: PMC6830857 DOI: 10.1109/jtehm.2019.2946802] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2019] [Revised: 09/03/2019] [Accepted: 09/25/2019] [Indexed: 12/22/2022]
Abstract
Gastric endoscopy is a common clinical practice that enables medical doctors to diagnose various lesions inside a stomach. In order to identify the location of a gastric lesion such as early cancer and a peptic ulcer within the stomach, this work addresses to reconstruct the color-textured 3D model of a whole stomach from a standard monocular endoscope video and localize any selected video frame to the 3D model. We examine how to enable structure-from-motion (SfM) to reconstruct the whole shape of a stomach from endoscope images, which is a challenging task due to the texture-less nature of the stomach surface. We specifically investigate the combined effect of chromo-endoscopy and color channel selection on SfM to increase the number of feature points. We also design a plane fitting-based algorithm for 3D point outliers removal to improve the 3D model quality. We show that whole stomach 3D reconstruction can be achieved (more than 90% of the frames can be reconstructed) by using red channel images captured under chromo-endoscopy by spreading indigo carmine (IC) dye on the stomach surface. In experimental results, we demonstrate the reconstructed 3D models for seven subjects and the application of lesion localization and reconstruction. The methodology and results presented in this paper could offer some valuable reference to other researchers and also could be an excellent tool for gastric surgeons in various computer-aided diagnosis applications.
Collapse
Affiliation(s)
- Aji Resindra Widya
- Department of Systems and Control EngineeringSchool of EngineeringTokyo Institute of TechnologyTokyo152-8550Japan
| | - Yusuke Monno
- Department of Systems and Control EngineeringSchool of EngineeringTokyo Institute of TechnologyTokyo152-8550Japan
| | - Masatoshi Okutomi
- Department of Systems and Control EngineeringSchool of EngineeringTokyo Institute of TechnologyTokyo152-8550Japan
| | - Sho Suzuki
- Division of Gastroenterology and HepatologyDepartment of MedicineNihon University School of MedicineTokyo101-8309Japan
| | - Takuji Gotoda
- Division of Gastroenterology and HepatologyDepartment of MedicineNihon University School of MedicineTokyo101-8309Japan
| | - Kenji Miki
- Department of Internal MedicineTsujinaka Hospital KashiwanohaKashiwa277-0871Japan
| |
Collapse
|
34
|
Mahmoud N, Collins T, Hostettler A, Soler L, Doignon C, Montiel JMM. Live Tracking and Dense Reconstruction for Handheld Monocular Endoscopy. IEEE TRANSACTIONS ON MEDICAL IMAGING 2019; 38:79-89. [PMID: 30010552 DOI: 10.1109/tmi.2018.2856109] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Contemporary endoscopic simultaneous localization and mapping (SLAM) methods accurately compute endoscope poses; however, they only provide a sparse 3-D reconstruction that poorly describes the surgical scene. We propose a novel dense SLAM method whose qualities are: 1) monocular, requiring only RGB images of a handheld monocular endoscope; 2) fast, providing endoscope positional tracking and 3-D scene reconstruction, running in parallel threads; 3) dense, yielding an accurate dense reconstruction; 4) robust, to the severe illumination changes, poor texture and small deformations that are typical in endoscopy; and 5) self-contained, without needing any fiducials nor external tracking devices and, therefore, it can be smoothly integrated into the surgical workflow. It works as follows. First, accurate cluster frame poses are estimated using the sparse SLAM feature matches. The system segments clusters of video frames according to parallax criteria. Next, dense matches between cluster frames are computed in parallel by a variational approach that combines zero mean normalized cross correlation and a gradient Huber norm regularizer. This combination copes with challenging lighting and textures at an affordable time budget on a modern GPU. It can outperform pure stereo reconstructions, because the frames cluster can provide larger parallax from the endoscope's motion. We provide an extensive experimental validation on real sequences of the porcine abdominal cavity, both in-vivo and ex-vivo. We also show a qualitative evaluation on human liver. In addition, we show a comparison with the other dense SLAM methods showing the performance gain in terms of accuracy, density, and computation time.
Collapse
|
35
|
Mahmood F, Chen R, Durr NJ. Unsupervised Reverse Domain Adaptation for Synthetic Medical Images via Adversarial Training. IEEE TRANSACTIONS ON MEDICAL IMAGING 2018; 37:2572-2581. [PMID: 29993538 DOI: 10.1109/tmi.2018.2842767] [Citation(s) in RCA: 97] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
To realize the full potential of deep learning for medical imaging, large annotated datasets are required for training. Such datasets are difficult to acquire due to privacy issues, lack of experts available for annotation, underrepresentation of rare conditions, and poor standardization. The lack of annotated data has been addressed in conventional vision applications using synthetic images refined via unsupervised adversarial training to look like real images. However, this approach is difficult to extend to general medical imaging because of the complex and diverse set of features found in real human tissues. We propose a novel framework that uses a reverse flow, where adversarial training is used to make real medical images more like synthetic images, and clinically-relevant features are preserved via self-regularization. These domain-adapted synthetic-like images can then be accurately interpreted by networks trained on large datasets of synthetic medical images. We implement this approach on the notoriously difficult task of depth-estimation from monocular endoscopy which has a variety of applications in colonoscopy, robotic surgery, and invasive endoscopic procedures. We train a depth estimator on a large data set of synthetic images generated using an accurate forward model of an endoscope and an anatomically-realistic colon. Our analysis demonstrates that the structural similarity of endoscopy depth estimation in a real pig colon predicted from a network trained solely on synthetic data improved by 78.7% by using reverse domain adaptation.
Collapse
|
36
|
Speers AD, Ma B, Jarnagin WR, Himidan S, Simpson AL, Wildes RP. Fast and accurate vision-based stereo reconstruction and motion estimation for image-guided liver surgery. Healthc Technol Lett 2018; 5:208-214. [PMID: 30464852 PMCID: PMC6222177 DOI: 10.1049/htl.2018.5071] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 08/20/2018] [Indexed: 11/25/2022] Open
Abstract
Image-guided liver surgery aims to enhance the precision of resection and ablation by providing fast localisation of tumours and adjacent complex vasculature to improve oncologic outcome. This Letter presents a novel end-to-end solution for fast stereo reconstruction and motion estimation that demonstrates high accuracy with phantom and clinical data. The authors’ computationally efficient coarse-to-fine (CTF) stereo approach facilitates liver imaging by accounting for low texture regions, enabling precise three-dimensional (3D) boundary recovery through the use of adaptive windows and utilising a robust 3D motion estimator to reject spurious data. To the best of their knowledge, theirs is the only adaptive CTF matching approach to reconstruction and motion estimation that registers time series of reconstructions to a single key frame for registration to a volumetric computed tomography scan. The system is evaluated empirically in controlled laboratory experiments with a liver phantom and motorised stages for precise quantitative evaluation. Additional evaluation is provided through testing with patient data during liver resection.
Collapse
Affiliation(s)
- Andrew D Speers
- Department of Electrical Engineering and Computer Science, York University, Toronto, ON, Canada
| | - Burton Ma
- Department of Electrical Engineering and Computer Science, York University, Toronto, ON, Canada
| | - William R Jarnagin
- Hepatopancreatobiliary Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Sharifa Himidan
- Department of Surgery, The Hospital for Sick Children, Toronto, ON, Canada.,Department of Surgery, University of Toronto, Toronto, ON, Canada
| | - Amber L Simpson
- Hepatopancreatobiliary Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Richard P Wildes
- Department of Electrical Engineering and Computer Science, York University, Toronto, ON, Canada
| |
Collapse
|
37
|
Wide-Area Shape Reconstruction by 3D Endoscopic System Based on CNN Decoding, Shape Registration and Fusion. ACTA ACUST UNITED AC 2018. [DOI: 10.1007/978-3-030-01201-4_16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
38
|
Chen L, Tang W, John NW, Wan TR, Zhang JJ. SLAM-based dense surface reconstruction in monocular Minimally Invasive Surgery and its application to Augmented Reality. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 158:135-146. [PMID: 29544779 DOI: 10.1016/j.cmpb.2018.02.006] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Revised: 01/03/2018] [Accepted: 02/02/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVE While Minimally Invasive Surgery (MIS) offers considerable benefits to patients, it also imposes big challenges on a surgeon's performance due to well-known issues and restrictions associated with the field of view (FOV), hand-eye misalignment and disorientation, as well as the lack of stereoscopic depth perception in monocular endoscopy. Augmented Reality (AR) technology can help to overcome these limitations by augmenting the real scene with annotations, labels, tumour measurements or even a 3D reconstruction of anatomy structures at the target surgical locations. However, previous research attempts of using AR technology in monocular MIS surgical scenes have been mainly focused on the information overlay without addressing correct spatial calibrations, which could lead to incorrect localization of annotations and labels, and inaccurate depth cues and tumour measurements. In this paper, we present a novel intra-operative dense surface reconstruction framework that is capable of providing geometry information from only monocular MIS videos for geometry-aware AR applications such as site measurements and depth cues. We address a number of compelling issues in augmenting a scene for a monocular MIS environment, such as drifting and inaccurate planar mapping. METHODS A state-of-the-art Simultaneous Localization And Mapping (SLAM) algorithm used in robotics has been extended to deal with monocular MIS surgical scenes for reliable endoscopic camera tracking and salient point mapping. A robust global 3D surface reconstruction framework has been developed for building a dense surface using only unorganized sparse point clouds extracted from the SLAM. The 3D surface reconstruction framework employs the Moving Least Squares (MLS) smoothing algorithm and the Poisson surface reconstruction framework for real time processing of the point clouds data set. Finally, the 3D geometric information of the surgical scene allows better understanding and accurate placement AR augmentations based on a robust 3D calibration. RESULTS We demonstrate the clinical relevance of our proposed system through two examples: (a) measurement of the surface; (b) depth cues in monocular endoscopy. The performance and accuracy evaluations of the proposed framework consist of two steps. First, we have created a computer-generated endoscopy simulation video to quantify the accuracy of the camera tracking by comparing the results of the video camera tracking with the recorded ground-truth camera trajectories. The accuracy of the surface reconstruction is assessed by evaluating the Root Mean Square Distance (RMSD) of surface vertices of the reconstructed mesh with that of the ground truth 3D models. An error of 1.24 mm for the camera trajectories has been obtained and the RMSD for surface reconstruction is 2.54 mm, which compare favourably with previous approaches. Second, in vivo laparoscopic videos are used to examine the quality of accurate AR based annotation and measurement, and the creation of depth cues. These results show the potential promise of our geometry-aware AR technology to be used in MIS surgical scenes. CONCLUSIONS The results show that the new framework is robust and accurate in dealing with challenging situations such as the rapid endoscopy camera movements in monocular MIS scenes. Both camera tracking and surface reconstruction based on a sparse point cloud are effective and operated in real-time. This demonstrates the potential of our algorithm for accurate AR localization and depth augmentation with geometric cues and correct surface measurements in MIS with monocular endoscopes.
Collapse
|
39
|
A non-rigid map fusion-based direct SLAM method for endoscopic capsule robots. INTERNATIONAL JOURNAL OF INTELLIGENT ROBOTICS AND APPLICATIONS 2017; 1:399-409. [PMID: 29250588 PMCID: PMC5727175 DOI: 10.1007/s41315-017-0036-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2017] [Accepted: 11/06/2017] [Indexed: 02/07/2023]
Abstract
Since the development of capsule endoscopy technology, medical device companies and research groups have made significant progress to turn passive capsule endoscopes into robotic active capsule endoscopes. However, the use of robotic capsules in endoscopy still has some challenges. One such challenge is the precise localization of the actively controlled robot in real-time. In this paper, we propose a non-rigid map fusion based direct simultaneous localization and mapping method for endoscopic capsule robots. The proposed method achieves high accuracy for extensive evaluations of pose estimation and map reconstruction performed on a non-rigid, realistic surgical EsophagoGastroDuodenoscopy Simulator and outperforms state-of-the art methods.
Collapse
|
40
|
Marmol A, Peynot T, Eriksson A, Jaiprakash A, Roberts J, Crawford R. Evaluation of Keypoint Detectors and Descriptors in Arthroscopic Images for Feature-Based Matching Applications. IEEE Robot Autom Lett 2017. [DOI: 10.1109/lra.2017.2714150] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
41
|
Chen L, Tang W, John NW. Real-time geometry-aware augmented reality in minimally invasive surgery. Healthc Technol Lett 2017; 4:163-167. [PMID: 29184658 PMCID: PMC5683199 DOI: 10.1049/htl.2017.0068] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 07/31/2017] [Indexed: 11/25/2022] Open
Abstract
The potential of augmented reality (AR) technology to assist minimally invasive surgery (MIS) lies in its computational performance and accuracy in dealing with challenging MIS scenes. Even with the latest hardware and software technologies, achieving both real-time and accurate augmented information overlay in MIS is still a formidable task. In this Letter, the authors present a novel real-time AR framework for MIS that achieves interactive geometric aware AR in endoscopic surgery with stereo views. The authors' framework tracks the movement of the endoscopic camera and simultaneously reconstructs a dense geometric mesh of the MIS scene. The movement of the camera is predicted by minimising the re-projection error to achieve a fast tracking performance, while the three-dimensional mesh is incrementally built by a dense zero mean normalised cross-correlation stereo-matching method to improve the accuracy of the surface reconstruction. The proposed system does not require any prior template or pre-operative scan and can infer the geometric information intra-operatively in real time. With the geometric information available, the proposed AR framework is able to interactively add annotations, localisation of tumours and vessels, and measurement labelling with greater precision and accuracy compared with the state-of-the-art approaches.
Collapse
Affiliation(s)
- Long Chen
- Department of Creative Technology, Bournemouth University, Poole, UK
| | - Wen Tang
- Department of Creative Technology, Bournemouth University, Poole, UK
| | - Nigel W. John
- Deaprtment of Computer Science, University of Chester, Chester, UK
| |
Collapse
|
42
|
Deray J, Sola J, Andrade-Cetto J. Word Ordering and Document Adjacency for Large Loop Closure Detection in 2-D Laser Maps. IEEE Robot Autom Lett 2017. [DOI: 10.1109/lra.2017.2657796] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
43
|
Furukawa R, Sanomura Y, Tanaka S, Yoshida S, Sagawa R, Visentini-Scarzanella M, Kawasaki H. 3D endoscope system using DOE projector. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017; 2016:2091-2094. [PMID: 28268743 DOI: 10.1109/embc.2016.7591140] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
For effective in situ endoscopic diagnosis and treatment, size measurement and shape characterization of lesions, such as tumors, is important. For this purpose, in the past we have developed a range of 3D endoscopic systems based on active stereo to measure the shape and size of living tissues. In those works, the main shortcoming was that the target area could only be reconstructed at a specific distance from the scope because of off-focus blurring effects and aberrations in the periphery of the field of view. These issues were compounded by the degree of reconstruction instability due to the strong subsurface scattering common in internal tissue. In this paper, we tackle these shortcomings by developing a new micro pattern laser projector to be inserted in the scope tool channel. The new projector uses a Diffractive Optical Element (DOE) instead of a single lens, which solves the off-focus blur. We also propose a new line-based grid pattern with gap coding to counter the subsurface scattering effect. In our experiments on ex vivo human tumor samples, we show that the tissue shapes were successfully reconstructed regardless of depth variance and strong subsurface scattering effects.
Collapse
|
44
|
The status of augmented reality in laparoscopic surgery as of 2016. Med Image Anal 2017; 37:66-90. [DOI: 10.1016/j.media.2017.01.007] [Citation(s) in RCA: 183] [Impact Index Per Article: 26.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Revised: 01/16/2017] [Accepted: 01/23/2017] [Indexed: 12/27/2022]
|
45
|
Lurie KL, Angst R, Zlatev DV, Liao JC, Ellerbee Bowden AK. 3D reconstruction of cystoscopy videos for comprehensive bladder records. BIOMEDICAL OPTICS EXPRESS 2017; 8:2106-2123. [PMID: 28736658 PMCID: PMC5516821 DOI: 10.1364/boe.8.002106] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Revised: 02/04/2017] [Accepted: 02/04/2017] [Indexed: 05/06/2023]
Abstract
White light endoscopy is widely used for diagnostic imaging of the interior of organs and body cavities, but the inability to correlate individual 2D images with 3D organ morphology limits its utility for quantitative or longitudinal studies of disease physiology or cancer surveillance. As a result, most endoscopy videos, which carry enormous data potential, are used only for real-time guidance and are discarded after collection. We present a computational method to reconstruct and visualize a 3D model of organs from an endoscopic video that captures the shape and surface appearance of the organ. A key aspect of our strategy is the use of advanced computer vision techniques and unmodified, clinical-grade endoscopy hardware with few constraints on the image acquisition protocol, which presents a low barrier to clinical translation. We validate the accuracy and robustness of our reconstruction and co-registration method using cystoscopy videos from tissue-mimicking bladder phantoms and show clinical utility during cystoscopy in the operating room for bladder cancer evaluation. As our method can powerfully augment the visual medical record of the appearance of internal organs, it is broadly applicable to endoscopy and represents a significant advance in cancer surveillance opportunities for big-data cancer research.
Collapse
Affiliation(s)
- Kristen L. Lurie
- Dept. of Electrical Engineering, Stanford University, Stanford, CA,
USA
- Dept. of Urology, Stanford University, Stanford, CA,
USA
| | | | | | - Joseph C. Liao
- Dept. of Urology, Stanford University, Stanford, CA,
USA
- Corresponding author:
| | | |
Collapse
|
46
|
ORBSLAM-Based Endoscope Tracking and 3D Reconstruction. COMPUTER-ASSISTED AND ROBOTIC ENDOSCOPY 2017. [DOI: 10.1007/978-3-319-54057-3_7] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
47
|
Cadena C, Carlone L, Carrillo H, Latif Y, Scaramuzza D, Neira J, Reid I, Leonard JJ. Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age. IEEE T ROBOT 2016. [DOI: 10.1109/tro.2016.2624754] [Citation(s) in RCA: 1565] [Impact Index Per Article: 195.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
48
|
Furukawa R, Aoyama M, Hiura S, Aoki H, Kominami Y, Sanomura Y, Yoshida S, Tanaka S, Sagawa R, Kawasaki H. Calibration of a 3D endoscopic system based on active stereo method for shape measurement of biological tissues and specimen. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2014:4991-4. [PMID: 25571113 DOI: 10.1109/embc.2014.6944745] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
For endoscopic medical treatment, measuring the size and shape of the lesion, such as a tumor, is important for the improvement of diagnostic accuracy. We are developing a system to measure the shapes and sizes of living tissue by active stereo method using a normal endoscope on which a micro pattern projector is attached. In order to perform 3D reconstruction, estimating the intrinsic and extrinsic parameters of the endoscopic camera and the pattern projector is required. Particularly, calibration of the pattern projector is difficult. In this paper, we propose a simultaneous estimation method of both intrinsic and extrinsic parameters of the pattern projector. This simplifies the calibration procedure required in practical scenes. Furthermore, we have developed an efficient user interface to intuitively operate the calibration and reconstruction procedures. Using the developed system, we measured the shape of an internal tissue of the soft palate of a human and a biological specimen.
Collapse
|
49
|
On-patient see-through augmented reality based on visual SLAM. Int J Comput Assist Radiol Surg 2016; 12:1-11. [DOI: 10.1007/s11548-016-1444-x] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Accepted: 06/07/2016] [Indexed: 11/26/2022]
|
50
|
Shape Acquisition and Registration for 3D Endoscope Based on Grid Pattern Projection. COMPUTER VISION – ECCV 2016 2016. [DOI: 10.1007/978-3-319-46466-4_24] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|