1
|
Zhang C, Liu X, Fu Z, Ding G, Qin L, Wang P, Zhang H, Ye X. Registration, Path Planning and Shape Reconstruction for Soft Tools in Robot-Assisted Intraluminal Procedures: A Review. Int J Med Robot 2025; 21:e70066. [PMID: 40237632 DOI: 10.1002/rcs.70066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Revised: 02/22/2025] [Accepted: 03/31/2025] [Indexed: 04/18/2025]
Abstract
BACKGROUND Robot and navigation systems can relieve surgeon's difficulties in delicate and safe operation in tortuous lumens in traditional intraluminal procedures (IP). This paper aims to review the three key components of these systems: registration, path planning and shape reconstruction and highlight their limitations and future perspectives. METHODS An electronic search for relevant studies was performed in Web of Science and Google scholar databases until 2024. RESULTS As for 2D-3D registration in IP, we focused on analysing feature extraction. For path planning, this paper proposed a new classification method and focused on selection of planning space and the establishment of path cost. Regarding shape reconstruction, the pros and cons of existing methods are analysed and methods based on fibre optic sensors and electromagnetic (EM) tracking are focused on. CONCLUSION These three technologies in IP have made great progress, but there are still challenges that require further research.
Collapse
Affiliation(s)
- Chongan Zhang
- Biosensor National Special Laboratory, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Xiaoyue Liu
- Biosensor National Special Laboratory, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Zuoming Fu
- Biosensor National Special Laboratory, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Guoqing Ding
- Department of Urology, School of Medicine, Sir Run Run Shaw Hospital, Zhejiang University, Hangzhou, China
| | - Liping Qin
- Zhejiang Institute of Medical Device Supervision and Testing, Hangzhou, China
- Key Laboratory of Safety Evaluation of Medical Devices of Zhejiang Province, Hangzhou, China
| | - Peng Wang
- Biosensor National Special Laboratory, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Hong Zhang
- Biosensor National Special Laboratory, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Xuesong Ye
- Biosensor National Special Laboratory, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
- State Key Laboratory of CAD and CG, Zhejiang University, Hangzhou, China
| |
Collapse
|
2
|
Furukawa R, Kawasaki H, Sagawa R. Incremental shape integration with inter-frame shape consistency using neural SDF for a 3D endoscopic system. Healthc Technol Lett 2025; 12:e70001. [PMID: 39885982 PMCID: PMC11780497 DOI: 10.1049/htl2.70001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Accepted: 11/11/2024] [Indexed: 02/01/2025] Open
Abstract
3D measurement for endoscopic systems has been largely demanded. One promising approach is to utilize active-stereo systems using a micro-sized pattern-projector attached to the head of an endoscope. Furthermore, a multi-frame integration is also desired to enlarge the reconstructed area. This paper proposes an incremental optimization technique of both the shape-field parameters and the positional parameters of the cameras and projectors. The method assumes that the input data is temporarily sequential images, that is, endoscopic videos, and the relative positions between the camera and the projector may vary continuously. As solution, a differential volume rendering algorithm in conjunction with neural signed distance field (NeuralSDF) representation is proposed to simultaneously optimize the 3D scene and the camera/projector poses. Also, an incremental optimization strategy where the optimized frames are gradually increased is proposed. In the experiment, the proposed method is evaluated by performing 3D reconstruction using both synthetic and real images, proving the effectiveness of our method.
Collapse
Affiliation(s)
- Ryo Furukawa
- Department of Informatics/Graduate School of System EngineeringKindai UniversityHigashihiroshimaJapan
| | - Hiroshi Kawasaki
- Faculty of Information Science and Electrical Engineering Department of Advanced Information TechnologyKyushu UniversityFukuokaJapan
| | - Ryusuke Sagawa
- Artificial Intelligence Research Center, The National Institute of Advanced Science and TechnologyTsukubaJapan
| |
Collapse
|
3
|
Lu Y, Gao H, Qiu J, Qiu Z, Liu J, Bai X. DSIFNet: Implicit feature network for nasal cavity and vestibule segmentation from 3D head CT. Comput Med Imaging Graph 2024; 118:102462. [PMID: 39556905 DOI: 10.1016/j.compmedimag.2024.102462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 10/14/2024] [Accepted: 11/03/2024] [Indexed: 11/20/2024]
Abstract
This study is dedicated to accurately segment the nasal cavity and its intricate internal anatomy from head CT images, which is critical for understanding nasal physiology, diagnosing diseases, and planning surgeries. Nasal cavity and it's anatomical structures such as the sinuses, and vestibule exhibit significant scale differences, with complex shapes and variable microstructures. These features require the segmentation method to have strong cross-scale feature extraction capabilities. To effectively address this challenge, we propose an image segmentation network named the Deeply Supervised Implicit Feature Network (DSIFNet). This network uniquely incorporates an Implicit Feature Function Module Guided by Local and Global Positional Information (LGPI-IFF), enabling effective fusion of features across scales and enhancing the network's ability to recognize details and overall structures. Additionally, we introduce a deep supervision mechanism based on implicit feature functions in the network's decoding phase, optimizing the utilization of multi-scale feature information, thus improving segmentation precision and detail representation. Furthermore, we constructed a dataset comprising 7116 CT volumes (including 1,292,508 slices) and implemented PixPro-based self-supervised pretraining to utilize unlabeled data for enhanced feature extraction. Our tests on nasal cavity and vestibule segmentation, conducted on a dataset comprising 128 head CT volumes (including 34,006 slices), demonstrate the robustness and superior performance of proposed method, achieving leading results across multiple segmentation metrics.
Collapse
Affiliation(s)
- Yi Lu
- Image Processing Center, Beihang University, Beijing 102206, China
| | - Hongjian Gao
- Image Processing Center, Beihang University, Beijing 102206, China
| | - Jikuan Qiu
- Department of Otolaryngology, Head and Neck Surgery, Peking University First Hospital, Beijing 100034, China
| | - Zihan Qiu
- Department of Otorhinolaryngology, Head and Neck Surgery, The Sixth Affiliated Hospital of Sun Yat-sen University, Sun Yat-sen University, Guangzhou 510655, China
| | - Junxiu Liu
- Department of Otolaryngology, Head and Neck Surgery, Peking University First Hospital, Beijing 100034, China.
| | - Xiangzhi Bai
- Image Processing Center, Beihang University, Beijing 102206, China; The State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China; Beijing Advanced Innovation Center for Biomedical Engineering, Beihang University, Beijing 100191, China.
| |
Collapse
|
4
|
Furukawa R, Sagawa R, Oka S, Kawasaki H. NeRF-based multi-frame 3D integration for 3D endoscopy using active stereo. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-5. [PMID: 40040184 DOI: 10.1109/embc53108.2024.10782699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
3D measurement for endoscopic systems has large potential not only for cancer diagnosis or computer-assisted medical systems, but also for providing ground truth for supervised training of deep neural networks. To achieve it, one of the promising approach is the implementation of an active-stereo system using a micro-sized pattern-projector attached to the head of the endoscope. Furthermore, a multi-frame optimization algorithm for the endoscopic active-stereo system has been proposed to improve accuracy and robustness; in the approach, differential rendering algorithm is used to simultaneously optimize the 3D scene represented by triangle meshes and the camera/projector poses. One issue with the approach is its dependency on the accuracy of the initial 3D triangle mesh, however, it is not an easy task to achieve sufficient accuracy for actual endoscopic systems, which reduces the practicality of the algorithm. In this paper, we adapt neural radiance field (NeRF) based 3D scene representation to integrate multi-frame data captured by active-stereo system, where the 3D scene as well as the camera/projector poses are simultaneously optimized without using the initial shape. In the experiment, the proposed method is evaluated by performing 3D reconstruction using both synthetic and real images obtained by a consumer endoscopic camera attached with a micro-pattern-projector.Clinical relevance- One-shot endoscopic measurement of depth information is a practical solution for cancer diagnosis, computer-assisted interventions, and making annotations for machine learning training data.
Collapse
|
5
|
Yang Z, Dai J, Pan J. 3D reconstruction from endoscopy images: A survey. Comput Biol Med 2024; 175:108546. [PMID: 38704902 DOI: 10.1016/j.compbiomed.2024.108546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/05/2024] [Accepted: 04/28/2024] [Indexed: 05/07/2024]
Abstract
Three-dimensional reconstruction of images acquired through endoscopes is playing a vital role in an increasing number of medical applications. Endoscopes used in the clinic are commonly classified as monocular endoscopes and binocular endoscopes. We have reviewed the classification of methods for depth estimation according to the type of endoscope. Basically, depth estimation relies on feature matching of images and multi-view geometry theory. However, these traditional techniques have many problems in the endoscopic environment. With the increasing development of deep learning techniques, there is a growing number of works based on learning methods to address challenges such as inconsistent illumination and texture sparsity. We have reviewed over 170 papers published in the 10 years from 2013 to 2023. The commonly used public datasets and performance metrics are summarized. We also give a taxonomy of methods and analyze the advantages and drawbacks of algorithms. Summary tables and result atlas are listed to facilitate the comparison of qualitative and quantitative performance of different methods in each category. In addition, we summarize commonly used scene representation methods in endoscopy and speculate on the prospects of deep estimation research in medical applications. We also compare the robustness performance, processing time, and scene representation of the methods to facilitate doctors and researchers in selecting appropriate methods based on surgical applications.
Collapse
Affiliation(s)
- Zhuoyue Yang
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China; Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China
| | - Ju Dai
- Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China
| | - Junjun Pan
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China; Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China.
| |
Collapse
|
6
|
Furukawa R, Chen E, Sagawa R, Oka S, Kawasaki H. Calibration-free structured-light-based 3D scanning system in laparoscope for robotic surgery. Healthc Technol Lett 2024; 11:196-205. [PMID: 38638488 PMCID: PMC11022229 DOI: 10.1049/htl2.12083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 02/15/2024] [Indexed: 04/20/2024] Open
Abstract
Accurate 3D shape measurement is crucial for surgical support and alignment in robotic surgery systems. Stereo cameras in laparoscopes offer a potential solution; however, their accuracy in stereo image matching diminishes when the target image has few textures. Although stereo matching with deep learning has gained significant attention, supervised learning requires a large dataset of images with depth annotations, which are scarce for laparoscopes. Thus, there is a strong demand to explore alternative methods for depth reconstruction or annotation for laparoscopes. Active stereo techniques are a promising approach for achieving 3D reconstruction without textures. In this study, a 3D shape reconstruction method is proposed using an ultra-small patterned projector attached to a laparoscopic arm to address these issues. The pattern projector emits a structured light with a grid-like pattern that features node-wise modulation for positional encoding. To scan the target object, multiple images are taken while the projector is in motion, and the relative poses of the projector and a camera are auto-calibrated using a differential rendering technique. In the experiment, the proposed method is evaluated by performing 3D reconstruction using images obtained from a surgical robot and comparing the results with a ground-truth shape obtained from X-ray CT.
Collapse
Affiliation(s)
- Ryo Furukawa
- Department of InformaticsKindai UniversityHigashihiroshimaJapan
| | | | - Ryusuke Sagawa
- Artificial Intelligence Research CenterNational Institute of Anvanced Industrial Science and Technology (AIST)TsukubaJapan
| | | | - Hiroshi Kawasaki
- Faculty of Information Science and Electrical EngineeringKyushu UniversityFukuokaJapan
| |
Collapse
|
7
|
Barbour MC, Amin SN, Friedman SD, Perez FA, Bly RA, Johnson KE, Parikh SR, Richardson CM, Dahl JP, Aliseda A. Surface Reconstruction of the Pediatric Larynx via Structure from Motion Photogrammetry: A Pilot Study. Otolaryngol Head Neck Surg 2024; 170:1195-1199. [PMID: 38168480 PMCID: PMC10960702 DOI: 10.1002/ohn.635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 11/10/2023] [Accepted: 12/07/2023] [Indexed: 01/05/2024]
Abstract
Endoscopy is the gold standard for characterizing pediatric airway disorders, however, it is limited for quantitative analysis due to lack of three-dimensional (3D) vision and poor stereotactic depth perception. We utilize structure from motion (SfM) photogrammetry, to reconstruct 3D surfaces of pathologic and healthy pediatric larynges from monocular two-dimensional (2D) endoscopy. Models of pediatric subglottic stenosis were 3D printed and airway endoscopies were simulated. 3D surfaces were successfully reconstructed from endoscopic videos of all models using an SfM analysis toolkit. Average subglottic surface error between SfM reconstructed surfaces and 3D printed models was 0.65 mm as measured by Modified Hausdorff Distance. Average volumetric similarity between SfM surfaces and printed models was 0.82 as measured by Jaccard Index. SfM can be used to accurately reconstruct 3D surface renderings of the larynx from 2D endoscopy video. This technique has immense potential for use in quantitative analysis of airway geometry and virtual surgical planning.
Collapse
Affiliation(s)
- Michael C Barbour
- Department of Mechanical Engineering, University of Washington, Seattle, Washington, USA
| | - Shaunak N Amin
- Department of Otolaryngology-Head and Neck Surgery, University of Washington, Seattle, Washington, USA
| | - Seth D Friedman
- Center for Respiratory Biology and Therapeutics, Seattle Children's Hospital, Seattle, Washington, USA
| | - Francisco A Perez
- Department of Pediatric Radiology, Seattle Children's Hospital, Seattle, Washington, USA
| | - Randall A Bly
- Department of Otolaryngology-Head and Neck Surgery, University of Washington, Seattle, Washington, USA
- Division of Pediatric Otolaryngology-Head and Neck Surgery, Seattle Children's Hospital, Seattle, Washington, USA
| | - Kaalan E Johnson
- Department of Otolaryngology-Head and Neck Surgery, University of Washington, Seattle, Washington, USA
- Division of Pediatric Otolaryngology-Head and Neck Surgery, Seattle Children's Hospital, Seattle, Washington, USA
| | - Sanjay R Parikh
- Department of Otolaryngology-Head and Neck Surgery, University of Washington, Seattle, Washington, USA
- Division of Pediatric Otolaryngology-Head and Neck Surgery, Seattle Children's Hospital, Seattle, Washington, USA
| | - Clare M Richardson
- Division of Pediatric Otolaryngology-Head and Neck Surgery, Seattle Children's Hospital, Seattle, Washington, USA
- Division of Pediatric Otolaryngology-Head and Neck Surgery, Phoenix Children's Hospital, Phoenix, Arizona, USA
| | - John P Dahl
- Department of Otolaryngology-Head and Neck Surgery, University of Washington, Seattle, Washington, USA
- Division of Pediatric Otolaryngology-Head and Neck Surgery, Seattle Children's Hospital, Seattle, Washington, USA
| | - Alberto Aliseda
- Department of Mechanical Engineering, University of Washington, Seattle, Washington, USA
| |
Collapse
|
8
|
Liu M, Han Y, Wang J, Wang C, Wang Y, Meijering E. LSKANet: Long Strip Kernel Attention Network for Robotic Surgical Scene Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1308-1322. [PMID: 38015689 DOI: 10.1109/tmi.2023.3335406] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Surgical scene segmentation is a critical task in Robotic-assisted surgery. However, the complexity of the surgical scene, which mainly includes local feature similarity (e.g., between different anatomical tissues), intraoperative complex artifacts, and indistinguishable boundaries, poses significant challenges to accurate segmentation. To tackle these problems, we propose the Long Strip Kernel Attention network (LSKANet), including two well-designed modules named Dual-block Large Kernel Attention module (DLKA) and Multiscale Affinity Feature Fusion module (MAFF), which can implement precise segmentation of surgical images. Specifically, by introducing strip convolutions with different topologies (cascaded and parallel) in two blocks and a large kernel design, DLKA can make full use of region- and strip-like surgical features and extract both visual and structural information to reduce the false segmentation caused by local feature similarity. In MAFF, affinity matrices calculated from multiscale feature maps are applied as feature fusion weights, which helps to address the interference of artifacts by suppressing the activations of irrelevant regions. Besides, the hybrid loss with Boundary Guided Head (BGH) is proposed to help the network segment indistinguishable boundaries effectively. We evaluate the proposed LSKANet on three datasets with different surgical scenes. The experimental results show that our method achieves new state-of-the-art results on all three datasets with improvements of 2.6%, 1.4%, and 3.4% mIoU, respectively. Furthermore, our method is compatible with different backbones and can significantly increase their segmentation accuracy. Code is available at https://github.com/YubinHan73/LSKANet.
Collapse
|
9
|
Liu S, Fan J, Yang Y, Xiao D, Ai D, Song H, Wang Y, Yang J. Monocular endoscopy images depth estimation with multi-scale residual fusion. Comput Biol Med 2024; 169:107850. [PMID: 38145602 DOI: 10.1016/j.compbiomed.2023.107850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 11/16/2023] [Accepted: 12/11/2023] [Indexed: 12/27/2023]
Abstract
BACKGROUND Monocular depth estimation plays a fundamental role in clinical endoscopy surgery. However, the coherent illumination, smooth surfaces, and texture-less nature of endoscopy images present significant challenges to traditional depth estimation methods. Existing approaches struggle to accurately perceive depth in such settings. METHOD To overcome these challenges, this paper proposes a novel multi-scale residual fusion method for estimating the depth of monocular endoscopy images. Specifically, we address the issue of coherent illumination by leveraging image frequency domain component space transformation, thereby enhancing the stability of the scene's light source. Moreover, we employ an image radiation intensity attenuation model to estimate the initial depth map. Finally, to refine the accuracy of depth estimation, we utilize a multi-scale residual fusion optimization technique. RESULTS To evaluate the performance of our proposed method, extensive experiments were conducted on public datasets. The structural similarity measures for continuous frames in three distinct clinical data scenes reached impressive values of 0.94, 0.82, and 0.84, respectively. These results demonstrate the effectiveness of our approach in capturing the intricate details of endoscopy images. Furthermore, the depth estimation accuracy achieved remarkable levels of 89.3 % and 91.2 % for the two models' data, respectively, underscoring the robustness of our method. CONCLUSIONS Overall, the promising results obtained on public datasets highlight the significant potential of our method for clinical applications, facilitating reliable depth estimation and enhancing the quality of endoscopy surgical procedures.
Collapse
Affiliation(s)
- Shiyuan Liu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China; China Center for Information Industry Development, Beijing, 100081, China
| | - Jingfan Fan
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Yun Yang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, National Clinical Research Center for Digestive Diseases, Beijing 100050, China
| | - Deqiang Xiao
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Danni Ai
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Hong Song
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Yongtian Wang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Jian Yang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
10
|
Liu S, Fan J, Zang L, Yang Y, Fu T, Song H, Wang Y, Yang J. Pose estimation via structure-depth information from monocular endoscopy images sequence. BIOMEDICAL OPTICS EXPRESS 2024; 15:460-478. [PMID: 38223180 PMCID: PMC10783895 DOI: 10.1364/boe.498262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 12/08/2023] [Accepted: 12/14/2023] [Indexed: 01/16/2024]
Abstract
Image-based endoscopy pose estimation has been shown to significantly improve the visualization and accuracy of minimally invasive surgery (MIS). This paper proposes a method for pose estimation based on structure-depth information from a monocular endoscopy image sequence. Firstly, the initial frame location is constrained using the image structure difference (ISD) network. Secondly, endoscopy image depth information is used to estimate the pose of sequence frames. Finally, adaptive boundary constraints are used to optimize continuous frame endoscopy pose estimation, resulting in more accurate intraoperative endoscopy pose estimation. Evaluations were conducted on publicly available datasets, with the pose estimation error in bronchoscopy and colonoscopy datasets reaching 1.43 mm and 3.64 mm, respectively. These results meet the real-time requirements of various scenarios, demonstrating the capability of this method to generate reliable pose estimation results for endoscopy images and its meaningful applications in clinical practice. This method enables accurate localization of endoscopy images during surgery, assisting physicians in performing safer and more effective procedures.
Collapse
Affiliation(s)
- Shiyuan Liu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
- China Center for Information Industry Development, Beijing 100081, China
| | - Jingfan Fan
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
| | - Liugeng Zang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
| | - Yun Yang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University; National Clinical Research Center for Digestive Diseases, Beijing 100050, China
| | - Tianyu Fu
- Institute of Engineering Medicine, Beijing Institute of Technology, Beijing 100081, China
| | - Hong Song
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yongtian Wang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
| | - Jian Yang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
11
|
Hirohata Y, Sogabe M, Miyazaki T, Kawase T, Kawashima K. Confidence-aware self-supervised learning for dense monocular depth estimation in dynamic laparoscopic scene. Sci Rep 2023; 13:15380. [PMID: 37717055 PMCID: PMC10505201 DOI: 10.1038/s41598-023-42713-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 09/13/2023] [Indexed: 09/18/2023] Open
Abstract
This paper tackles the challenge of accurate depth estimation from monocular laparoscopic images in dynamic surgical environments. The lack of reliable ground truth due to inconsistencies within these images makes this a complex task. Further complicating the learning process is the presence of noise elements like bleeding and smoke. We propose a model learning framework that uses a generic laparoscopic surgery video dataset for training, aimed at achieving precise monocular depth estimation in dynamic surgical settings. The architecture employs binocular disparity confidence information as a self-supervisory signal, along with the disparity information from a stereo laparoscope. Our method ensures robust learning amidst outliers, influenced by tissue deformation, smoke, and surgical instruments, by utilizing a unique loss function. This function adjusts the selection and weighting of depth data for learning based on their given confidence. We trained the model using the Hamlyn Dataset and verified it with Hamlyn Dataset test data and a static dataset. The results show exceptional generalization performance and efficacy for various scene dynamics, laparoscope types, and surgical sites.
Collapse
Affiliation(s)
- Yasuhide Hirohata
- The Department of Information Physics and Computing, The University of Tokyo, Tokyo, 113-8656, Japan
| | - Maina Sogabe
- The Department of Information Physics and Computing, The University of Tokyo, Tokyo, 113-8656, Japan.
| | - Tetsuro Miyazaki
- The Department of Information Physics and Computing, The University of Tokyo, Tokyo, 113-8656, Japan
| | - Toshihiro Kawase
- The School of Engineering Department of Information and Communication Engineering, Tokyo Denki University, Tokyo, 120-8551, Japan
| | - Kenji Kawashima
- The Department of Information Physics and Computing, The University of Tokyo, Tokyo, 113-8656, Japan
| |
Collapse
|
12
|
Furukawa R, Sagawa R, Oka S, Tanaka S, Kawasaki H. Single and multi-frame auto-calibration for 3D endoscopy with differential rendering. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-5. [PMID: 38083062 DOI: 10.1109/embc40787.2023.10340381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
The use of 3D measurement in endoscopic images offers practicality in cancer diagnosis, computer-assisted interventions, and making annotations for machine learning training data. An effective approach is the implementation of an active stereo system, using a micro-sized pattern projector and an endoscope camera, which has been intensively developed. One open problem for such a system is the necessity of strict and complex calibration of the projector-camera system to precisely recover the shapes. Moreover, since the head of an endoscope should have enough elasticity to avoid harming target objects, the positions of the pattern projector cannot be tightly fixed to the head, resulting in limited accuracy. A straightforward approach to the problem is applying auto-calibration. However, it requires special markers in the pattern or a highly accurate initial position for stable calibration, which is impractical for real operation. In the paper, we propose a novel auto-calibration method based on differential rendering techniques, which are recently proposed and drawing wide attention. To apply the method to an endoscopic system, where a diffractive optical element (DOE) is used, we propose a technique to simultaneously estimate the focal length of the DOE as well as the extrinsic parameters between a projector and a camera. We also propose a multi-frame optimization algorithm to jointly optimize the intrinsic and extrinsic parameters, relative pose between frames, and the entire shape.Clinical relevance- One-shot endoscopic measurement of depth information is a practical solution for cancer diagnosis, computer-assisted interventions, and making annotations for machine learning training data.
Collapse
|
13
|
Liu Y, Zuo S. Self-supervised monocular depth estimation for gastrointestinal endoscopy. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 238:107619. [PMID: 37235969 DOI: 10.1016/j.cmpb.2023.107619] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 04/26/2023] [Accepted: 05/18/2023] [Indexed: 05/28/2023]
Abstract
BACKGROUND AND OBJECTIVE Gastrointestinal (GI) endoscopy represents a promising tool for GI cancer screening. However, the limited field of view and uneven skills of endoscopists make it remains difficult to accurately identify polyps and follow up on precancerous lesions under endoscopy. Estimating depth from GI endoscopic sequences is essential for a series of AI-assisted surgical techniques. Nonetheless, depth estimation algorithm of GI endoscopy is a challenging task due to the particularity of the environment and the limitation of datasets. In this paper, we propose a self-supervised monocular depth estimation method for GI endoscopy. METHODS A depth estimation network and a camera ego-motion estimation network are firstly constructed to obtain the depth information and pose information of the sequence respectively, and then the model is enabled to perform self-supervised training by calculating the multi-scale structural similarity with L1 norm (MS-SSIM+L1) loss function between the target frame and the reconstructed image as part of the loss of the training network. The MS-SSIM+L1 loss function is good for reserving high-frequency information and can maintain the invariance of brightness and color. Our model consists of the U-shape convolutional network with the dual-attention mechanism, which is beneficial to capture muti-scale contextual information, and greatly improves the accuracy of depth estimation. We evaluated our method qualitatively and quantitatively with different state-of-the-art methods. RESULTS AND CONCLUSIONS The experimental results manifest that our method has superior generality, achieving lower error metrics and higher accuracy metrics on both the UCL dataset and the Endoslam dataset. The proposed method has also been validated with clinical GI endoscopy, demonstrating the potential clinical value of the model.
Collapse
Affiliation(s)
- Yuying Liu
- Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, Tianjin, China
| | - Siyang Zuo
- Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, Tianjin, China.
| |
Collapse
|
14
|
Deng Z, Jiang P, Guo Y, Zhang S, Hu Y, Zheng X, He B. Safety-aware robotic steering of a flexible endoscope for nasotracheal intubation. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
15
|
Furukawa R, Mikamo M, Sagawa R, Okamoto Y, Oka S, Tanaka S, Kawasaki H. Multi-frame optimisation for active stereo with inverse renderingto obtain consistent shape and projector-camera posesfor 3D endoscopic system. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2022. [DOI: 10.1080/21681163.2022.2155578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Ryo Furukawa
- Department of Informatics, Kindai University, Higashihiroshima, Hiroshima, Japan
| | | | - Ryusuke Sagawa
- Computer Vision Research Team, Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan
| | - Yuki Okamoto
- Department of Endoscopy-, Hiroshima University Hospital, Hiroshima, Hiroshima, Japan
| | - Shiro Oka
- Graduate School of Biomedical & Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Shinji Tanaka
- Graduate School of Biomedical & Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Hiroshi Kawasaki
- Graduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
| |
Collapse
|
16
|
Yang Z, Pan J, Li R, Qin H. Scene-graph-driven semantic feature matching for monocular digestive endoscopy. Comput Biol Med 2022; 146:105616. [DOI: 10.1016/j.compbiomed.2022.105616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 04/11/2022] [Accepted: 05/11/2022] [Indexed: 11/28/2022]
|
17
|
Xu C, Huang B, Elson DS. Self-supervised Monocular Depth Estimation with 3D Displacement Module for Laparoscopic Images. IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS 2022; 4:331-334. [PMID: 36148138 PMCID: PMC7613618 DOI: 10.1109/tmrb.2022.3170206] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
We present a novel self-supervised training framework with 3D displacement (3DD) module for accurately estimating per-pixel depth maps from single laparoscopic images. Recently, several self-supervised learning based monocular depth estimation models have achieved good results on the KITTI dataset, under the hypothesis that the camera is dynamic and the objects are stationary, however this hypothesis is often reversed in the surgical setting (laparoscope is stationary, the surgical instruments and tissues are dynamic). Therefore, a 3DD module is proposed to establish the relation between frames instead of ego-motion estimation. In the 3DD module, a convolutional neural network (CNN) analyses source and target frames to predict the 3D displacement of a 3D point cloud from a target frame to a source frame in the coordinates of the camera. Since it is difficult to constrain the depth displacement from two 2D images, a novel depth consistency module is proposed to maintain depth consistency between displacement-updated depth and model-estimated depth to constrain 3D displacement effectively. Our proposed method achieves remarkable performance for monocular depth estimation on the Hamlyn surgical dataset and acquired ground truth depth maps, outperforming monodepth, monodepth2 and packnet models.
Collapse
Affiliation(s)
- Chi Xu
- The Hamlyn Centre for Robotic Surgery, Department of Surgery and Cancer, Imperial College London, London SW7 2AZ, UK
| | - Baoru Huang
- The Hamlyn Centre for Robotic Surgery, Department of Surgery and Cancer, Imperial College London, London SW7 2AZ, UK
| | - Daniel S. Elson
- The Hamlyn Centre for Robotic Surgery, Department of Surgery and Cancer, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
18
|
Liu S, Fan J, Song D, Fu T, Lin Y, Xiao D, Song H, Wang Y, Yang J. Joint estimation of depth and motion from a monocular endoscopy image sequence using a multi-loss rebalancing network. BIOMEDICAL OPTICS EXPRESS 2022; 13:2707-2727. [PMID: 35774318 PMCID: PMC9203100 DOI: 10.1364/boe.457475] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 04/01/2022] [Accepted: 04/01/2022] [Indexed: 06/15/2023]
Abstract
Building an in vivo three-dimensional (3D) surface model from a monocular endoscopy is an effective technology to improve the intuitiveness and precision of clinical laparoscopic surgery. This paper proposes a multi-loss rebalancing-based method for joint estimation of depth and motion from a monocular endoscopy image sequence. The feature descriptors are used to provide monitoring signals for the depth estimation network and motion estimation network. The epipolar constraints of the sequence frame is considered in the neighborhood spatial information by depth estimation network to enhance the accuracy of depth estimation. The reprojection information of depth estimation is used to reconstruct the camera motion by motion estimation network with a multi-view relative pose fusion mechanism. The relative response loss, feature consistency loss, and epipolar consistency loss function are defined to improve the robustness and accuracy of the proposed unsupervised learning-based method. Evaluations are implemented on public datasets. The error of motion estimation in three scenes decreased by 42.1%,53.6%, and 50.2%, respectively. And the average error of 3D reconstruction is 6.456 ± 1.798mm. This demonstrates its capability to generate reliable depth estimation and trajectory reconstruction results for endoscopy images and meaningful applications in clinical.
Collapse
Affiliation(s)
- Shiyuan Liu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Jingfan Fan
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Dengpan Song
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Tianyu Fu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Yucong Lin
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Deqiang Xiao
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Hong Song
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Yongtian Wang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Jian Yang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
19
|
Vagdargi P, Uneri A, Jones CK, Wu P, Han R, Luciano MG, Anderson WS, Helm PA, Hager GD, Siewerdsen JH. Pre-Clinical Development of Robot-Assisted Ventriculoscopy for 3D Image Reconstruction and Guidance of Deep Brain Neurosurgery. IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS 2022; 4:28-37. [PMID: 35368731 PMCID: PMC8967072 DOI: 10.1109/tmrb.2021.3125322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Conventional neuro-navigation can be challenged in targeting deep brain structures via transventricular neuroendoscopy due to unresolved geometric error following soft-tissue deformation. Current robot-assisted endoscopy techniques are fairly limited, primarily serving to planned trajectories and provide a stable scope holder. We report the implementation of a robot-assisted ventriculoscopy (RAV) system for 3D reconstruction, registration, and augmentation of the neuroendoscopic scene with intraoperative imaging, enabling guidance even in the presence of tissue deformation and providing visualization of structures beyond the endoscopic field-of-view. Phantom studies were performed to quantitatively evaluate image sampling requirements, registration accuracy, and computational runtime for two reconstruction methods and a variety of clinically relevant ventriculoscope trajectories. A median target registration error of 1.2 mm was achieved with an update rate of 2.34 frames per second, validating the RAV concept and motivating translation to future clinical studies.
Collapse
Affiliation(s)
- Prasad Vagdargi
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Ali Uneri
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Craig K. Jones
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD USA
| | - Pengwei Wu
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Runze Han
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mark G. Luciano
- Department of Neurosurgery, Johns Hopkins Medicine, Baltimore, MD, USA
| | | | | | - Gregory D. Hager
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Jeffrey H. Siewerdsen
- Department of Biomedical Engineering and Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
20
|
Bardozzo F, Collins T, Forgione A, Hostettler A, Tagliaferri R. StaSiS-Net: a stacked and siamese disparity estimation network for depth reconstruction in modern 3D laparoscopy. Med Image Anal 2022; 77:102380. [DOI: 10.1016/j.media.2022.102380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 01/26/2022] [Accepted: 01/27/2022] [Indexed: 10/19/2022]
|
21
|
Shao S, Pei Z, Chen W, Zhu W, Wu X, Sun D, Zhang B. Self-Supervised monocular depth and ego-Motion estimation in endoscopy: Appearance flow to the rescue. Med Image Anal 2021; 77:102338. [PMID: 35016079 DOI: 10.1016/j.media.2021.102338] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 10/24/2021] [Accepted: 12/14/2021] [Indexed: 11/25/2022]
Abstract
Recently, self-supervised learning technology has been applied to calculate depth and ego-motion from monocular videos, achieving remarkable performance in autonomous driving scenarios. One widely adopted assumption of depth and ego-motion self-supervised learning is that the image brightness remains constant within nearby frames. Unfortunately, the endoscopic scene does not meet this assumption because there are severe brightness fluctuations induced by illumination variations, non-Lambertian reflections and interreflections during data collection, and these brightness fluctuations inevitably deteriorate the depth and ego-motion estimation accuracy. In this work, we introduce a novel concept referred to as appearance flow to address the brightness inconsistency problem. The appearance flow takes into consideration any variations in the brightness pattern and enables us to develop a generalized dynamic image constraint. Furthermore, we build a unified self-supervised framework to estimate monocular depth and ego-motion simultaneously in endoscopic scenes, which comprises a structure module, a motion module, an appearance module and a correspondence module, to accurately reconstruct the appearance and calibrate the image brightness. Extensive experiments are conducted on the SCARED dataset and EndoSLAM dataset, and the proposed unified framework exceeds other self-supervised approaches by a large margin. To validate our framework's generalization ability on different patients and cameras, we train our model on SCARED but test it on the SERV-CT and Hamlyn datasets without any fine-tuning, and the superior results reveal its strong generalization ability. Code is available at: https://github.com/ShuweiShao/AF-SfMLearner.
Collapse
Affiliation(s)
- Shuwei Shao
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
| | - Zhongcai Pei
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China; Hangzhou Innovation Institute, Beihang University, Hangzhou, China
| | - Weihai Chen
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China; Hangzhou Innovation Institute, Beihang University, Hangzhou, China.
| | | | - Xingming Wu
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
| | - Dianmin Sun
- Shandong Cancer Hospital Affiliated to Shandong University, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, China
| | - Baochang Zhang
- Institute of Artificial Intelligence, Beihang University, Beijing, China.
| |
Collapse
|
22
|
Baudoin T, Gregurić T, Bacan F, Jelavić B, Geber G, Košec A. A systematic review of common landmarks in navigated endoscopic sinus surgery (NESS). Comput Assist Surg (Abingdon) 2021; 26:77-84. [PMID: 34874220 DOI: 10.1080/24699322.2021.1992504] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
Abstract
BACKGROUND Navigation brought about a tremendous improvement in functional endoscopic sinus surgery (FESS). When upgraded accordingly, FESS becomes navigated endoscopic sinus surgery (NESS). Indications for intraoperative use of navigation can be broadened to almost any FESS case. NESS in advanced sinus surgery is currently still not used routinely and requires systematic practice guidelines. PURPOSE The purpose of this paper is to report on commonly identified landmarks while performing advanced NESS according to evidence-based medicine (EBM) principles. MATERIAL AND METHODS This review paper has been assembled following PRISMA guidelines. A PubMed and Scopus (EMBASE) search on anatomical landmarks in functional endoscopic and navigated sinus surgery resulted in 47 results. Of these, only 14 (29.8%) contained original data, constituting the synthesis of best-quality available evidence. RESULTS Anatomical landmarks are considered to be the most important points of orientation for optimal use of navigation systems during FESS surgery. The most commonly identified significant landmarks are as follows: (1) Maxillary sinus ostium; (2) Orbital wall; (3) Frontal recess; (4) Skull base; (5) Ground lamella; (6) Fovea posterior; (7) Sphenoid sinus ostium. Conclusions: Establishing common landmarks are essential in performing NESS. This is true for advanced and novice surgeons alike and offers a possibility to use navigation systems systematically, taking advantage of all the benefits of endoscopic navigated surgery.
Collapse
Affiliation(s)
- Tomislav Baudoin
- Department of Otorhinolaryngology & Head and Neck Surgery, University Hospital Center Sestre Milosrdnice, Zagreb University School of Medicine, Zagreb, Croatia
| | - Tomislav Gregurić
- Clinical Department of Diagnostic and Interventional Radiology, University Hospital Center Sestre Milosrdnice, Zagreb University School of Dental Medicine, Zagreb, Croatia
| | - Filip Bacan
- Department of Otorhinolaryngology & Head and Neck Surgery, University Hospital Center Sestre Milosrdnice, Zagreb University School of Medicine, Zagreb, Croatia
| | - Boris Jelavić
- Department of Otorhinolaryngology & Head and Neck Surgery, University of Mostar School of Medicine, Mostar, Bosnia and Herzegovina
| | - Goran Geber
- Department of Otorhinolaryngology & Head and Neck Surgery, University Hospital Center Sestre Milosrdnice, Zagreb University School of Dental Medicine, Zagreb, Croatia
| | - Andro Košec
- Department of Otorhinolaryngology & Head and Neck Surgery, University Hospital Center Sestre Milosrdnice, Zagreb University School of Medicine, Zagreb, Croatia
| |
Collapse
|
23
|
Applying Self-Supervised Learning to Medicine: Review of the State of the Art and Medical Implementations. INFORMATICS 2021. [DOI: 10.3390/informatics8030059] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Machine learning has become an increasingly ubiquitous technology, as big data continues to inform and influence everyday life and decision-making. Currently, in medicine and healthcare, as well as in most other industries, the two most prevalent machine learning paradigms are supervised learning and transfer learning. Both practices rely on large-scale, manually annotated datasets to train increasingly complex models. However, the requirement of data to be manually labeled leaves an excess of unused, unlabeled data available in both public and private data repositories. Self-supervised learning (SSL) is a growing area of machine learning that can take advantage of unlabeled data. Contrary to other machine learning paradigms, SSL algorithms create artificial supervisory signals from unlabeled data and pretrain algorithms on these signals. The aim of this review is two-fold: firstly, we provide a formal definition of SSL, divide SSL algorithms into their four unique subsets, and review the state of the art published in each of those subsets between the years of 2014 and 2020. Second, this work surveys recent SSL algorithms published in healthcare, in order to provide medical experts with a clearer picture of how they can integrate SSL into their research, with the objective of leveraging unlabeled data.
Collapse
|
24
|
Ozyoruk KB, Gokceler GI, Bobrow TL, Coskun G, Incetan K, Almalioglu Y, Mahmood F, Curto E, Perdigoto L, Oliveira M, Sahin H, Araujo H, Alexandrino H, Durr NJ, Gilbert HB, Turan M. EndoSLAM dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Med Image Anal 2021; 71:102058. [PMID: 33930829 DOI: 10.1016/j.media.2021.102058] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 01/23/2021] [Accepted: 03/29/2021] [Indexed: 02/07/2023]
Abstract
Deep learning techniques hold promise to develop dense topography reconstruction and pose estimation methods for endoscopic videos. However, currently available datasets do not support effective quantitative benchmarking. In this paper, we introduce a comprehensive endoscopic SLAM dataset consisting of 3D point cloud data for six porcine organs, capsule and standard endoscopy recordings, synthetically generated data as well as clinically in use conventional endoscope recording of the phantom colon with computed tomography(CT) scan ground truth. A Panda robotic arm, two commercially available capsule endoscopes, three conventional endoscopes with different camera properties, two high precision 3D scanners, and a CT scanner were employed to collect data from eight ex-vivo porcine gastrointestinal (GI)-tract organs and a silicone colon phantom model. In total, 35 sub-datasets are provided with 6D pose ground truth for the ex-vivo part: 18 sub-datasets for colon, 12 sub-datasets for stomach, and 5 sub-datasets for small intestine, while four of these contain polyp-mimicking elevations carried out by an expert gastroenterologist. To verify the applicability of this data for use with real clinical systems, we recorded a video sequence with a state-of-the-art colonoscope from a full representation silicon colon phantom. Synthetic capsule endoscopy frames from stomach, colon, and small intestine with both depth and pose annotations are included to facilitate the study of simulation-to-real transfer learning algorithms. Additionally, we propound Endo-SfMLearner, an unsupervised monocular depth and pose estimation method that combines residual networks with a spatial attention module in order to dictate the network to focus on distinguishable and highly textured tissue regions. The proposed approach makes use of a brightness-aware photometric loss to improve the robustness under fast frame-to-frame illumination changes that are commonly seen in endoscopic videos. To exemplify the use-case of the EndoSLAM dataset, the performance of Endo-SfMLearner is extensively compared with the state-of-the-art: SC-SfMLearner, Monodepth2, and SfMLearner. The codes and the link for the dataset are publicly available at https://github.com/CapsuleEndoscope/EndoSLAM. A video demonstrating the experimental setup and procedure is accessible as Supplementary Video 1.
Collapse
Affiliation(s)
| | | | - Taylor L Bobrow
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Gulfize Coskun
- Institute of Biomedical Engineering, Bogazici University, Turkey
| | - Kagan Incetan
- Institute of Biomedical Engineering, Bogazici University, Turkey
| | | | - Faisal Mahmood
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Cancer Data Science, Dana Farber Cancer Institute, Boston, MA, USA; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Eva Curto
- Institute for Systems and Robotics, University of Coimbra, Portugal
| | - Luis Perdigoto
- Institute for Systems and Robotics, University of Coimbra, Portugal
| | - Marina Oliveira
- Institute for Systems and Robotics, University of Coimbra, Portugal
| | - Hasan Sahin
- Institute of Biomedical Engineering, Bogazici University, Turkey
| | - Helder Araujo
- Institute for Systems and Robotics, University of Coimbra, Portugal
| | - Henrique Alexandrino
- Faculty of Medicine, Clinical Academic Center of Coimbra, University of Coimbra, Coimbra, Portugal
| | - Nicholas J Durr
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Hunter B Gilbert
- Department of Mechanical and Industrial Engineering, Louisiana State University, Baton Rouge, LA, USA
| | - Mehmet Turan
- Institute of Biomedical Engineering, Bogazici University, Turkey.
| |
Collapse
|
25
|
Qin F, Lin S, Li Y, Bly RA, Moe KS, Hannaford B. Towards Better Surgical Instrument Segmentation in Endoscopic Vision: Multi-Angle Feature Aggregation and Contour Supervision. IEEE Robot Autom Lett 2020. [DOI: 10.1109/lra.2020.3009073] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
26
|
Linxweiler M, Pillong L, Kopanja D, Kühn JP, Wagenpfeil S, Radosa JC, Wang J, Morris LGT, Al Kadah B, Bochen F, Körner S, Schick B. Augmented reality-enhanced navigation in endoscopic sinus surgery: A prospective, randomized, controlled clinical trial. Laryngoscope Investig Otolaryngol 2020; 5:621-629. [PMID: 32864433 PMCID: PMC7444769 DOI: 10.1002/lio2.436] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 07/11/2020] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE Endoscopic sinus surgery represents the gold standard for surgical treatment of chronic sinus diseases. Thereby, navigation systems can be of distinct use. In our study, we tested the recently developed KARL STORZ NAV1 SinusTracker navigation software that incorporates elements of augmented reality (AR) to provide a better preoperative planning and guidance during the surgical procedure. METHODS One hundred patients with chronic sinus disease were operated on using either a conventional navigation software (n = 52, non-AR, control group) or a navigation software incorporating AR elements (n = 48, AR, intervention group). Incidence of postoperative complications, duration of surgery, surgeon-reported benefit from the navigation system and patient-reported postoperative rehabilitation were assessed. RESULTS The surgeons reported a higher benefit during surgery, used the navigation system for more surgical steps and spent longer time with preoperative image analysis when using the AR system as compared with the non-AR system. No significant differences were seen in terms of postoperative complications, target registration error, operation time and postoperative rehabilitation. CONCLUSION The AR enhanced navigation software shows a high acceptance by sinus surgeons in different stages of surgical training and offers potential benefits during surgery without affecting the duration of the operation or the incidence of postoperative complications. LEVEL OF EVIDENCE 1b.
Collapse
Affiliation(s)
- Maximilian Linxweiler
- Department of Otorhinolaryngology, Head and Neck SurgerySaarland University Medical CentreHomburgGermany
| | - Lukas Pillong
- Department of Otorhinolaryngology, Head and Neck SurgerySaarland University Medical CentreHomburgGermany
| | - Dragan Kopanja
- Department of Otorhinolaryngology, Head and Neck SurgerySaarland University Medical CentreHomburgGermany
| | - Jan P. Kühn
- Department of Otorhinolaryngology, Head and Neck SurgerySaarland University Medical CentreHomburgGermany
| | - Stefan Wagenpfeil
- Institute of Medical Biometry, Epidemiology and Medical InformaticsSaarland University Medical CentreHomburgGermany
| | - Julia C. Radosa
- Department of Gynecology, Obstetrics and Reproductive MedicineSaarland University Medical CentreHomburgGermany
| | - Jingming Wang
- Human Oncology and Pathogenesis ProgramMemorial Sloan Kettering Cancer CenterNew YorkNew YorkUSA
| | - Luc G. T. Morris
- Immunogenomics and Precision Oncology PlatformMemorial Sloan Kettering Cancer CenterNew YorkNew YorkUSA
- Department of SurgeryMemorial Sloan Kettering Cancer CenterNew YorkNew YorkUSA
| | - Basel Al Kadah
- Department of OtorhinolaryngologyBethanien HospitalPlauenGermany
| | - Florian Bochen
- Department of Otorhinolaryngology, Head and Neck SurgerySaarland University Medical CentreHomburgGermany
| | - Sandrina Körner
- Department of Otorhinolaryngology, Head and Neck SurgerySaarland University Medical CentreHomburgGermany
| | - Bernhard Schick
- Department of Otorhinolaryngology, Head and Neck SurgerySaarland University Medical CentreHomburgGermany
| |
Collapse
|
27
|
Liu X, Sinha A, Ishii M, Hager GD, Reiter A, Taylor RH, Unberath M. Dense Depth Estimation in Monocular Endoscopy With Self-Supervised Learning Methods. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020; 39:1438-1447. [PMID: 31689184 PMCID: PMC7289272 DOI: 10.1109/tmi.2019.2950936] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
We present a self-supervised approach to training convolutional neural networks for dense depth estimation from monocular endoscopy data without a priori modeling of anatomy or shading. Our method only requires monocular endoscopic videos and a multi-view stereo method, e.g., structure from motion, to supervise learning in a sparse manner. Consequently, our method requires neither manual labeling nor patient computed tomography (CT) scan in the training and application phases. In a cross-patient experiment using CT scans as groundtruth, the proposed method achieved submillimeter mean residual error. In a comparison study to recent self-supervised depth estimation methods designed for natural video on in vivo sinus endoscopy data, we demonstrate that the proposed approach outperforms the previous methods by a large margin. The source code for this work is publicly available online at https://github.com/lppllppl920/EndoscopyDepthEstimation-Pytorch.
Collapse
|
28
|
Qiu L, Ren H. Endoscope navigation with SLAM-based registration to computed tomography for transoral surgery. INTERNATIONAL JOURNAL OF INTELLIGENT ROBOTICS AND APPLICATIONS 2020. [DOI: 10.1007/s41315-020-00127-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
29
|
Vercauteren T, Unberath M, Padoy N, Navab N. CAI4CAI: The Rise of Contextual Artificial Intelligence in Computer Assisted Interventions. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2020; 108:198-214. [PMID: 31920208 PMCID: PMC6952279 DOI: 10.1109/jproc.2019.2946993] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 09/12/2019] [Accepted: 10/04/2019] [Indexed: 05/10/2023]
Abstract
Data-driven computational approaches have evolved to enable extraction of information from medical images with a reliability, accuracy and speed which is already transforming their interpretation and exploitation in clinical practice. While similar benefits are longed for in the field of interventional imaging, this ambition is challenged by a much higher heterogeneity. Clinical workflows within interventional suites and operating theatres are extremely complex and typically rely on poorly integrated intra-operative devices, sensors, and support infrastructures. Taking stock of some of the most exciting developments in machine learning and artificial intelligence for computer assisted interventions, we highlight the crucial need to take context and human factors into account in order to address these challenges. Contextual artificial intelligence for computer assisted intervention, or CAI4CAI, arises as an emerging opportunity feeding into the broader field of surgical data science. Central challenges being addressed in CAI4CAI include how to integrate the ensemble of prior knowledge and instantaneous sensory information from experts, sensors and actuators; how to create and communicate a faithful and actionable shared representation of the surgery among a mixed human-AI actor team; how to design interventional systems and associated cognitive shared control schemes for online uncertainty-aware collaborative decision making ultimately producing more precise and reliable interventions.
Collapse
Affiliation(s)
- Tom Vercauteren
- School of Biomedical Engineering & Imaging SciencesKing’s College LondonLondonWC2R 2LSU.K.
| | - Mathias Unberath
- Department of Computer ScienceJohns Hopkins UniversityBaltimoreMD21218USA
| | - Nicolas Padoy
- ICube institute, CNRS, IHU Strasbourg, University of Strasbourg67081StrasbourgFrance
| | - Nassir Navab
- Fakultät für InformatikTechnische Universität München80333MunichGermany
| |
Collapse
|
30
|
Gastroscopic Panoramic View: Application to Automatic Polyps Detection under Gastroscopy. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2019; 2019:4393124. [PMID: 31885680 PMCID: PMC6925673 DOI: 10.1155/2019/4393124] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 10/12/2019] [Indexed: 12/22/2022]
Abstract
Endoscopic diagnosis is an important means for gastric polyp detection. In this paper, a panoramic image of gastroscopy is developed, which can display the inner surface of the stomach intuitively and comprehensively. Moreover, the proposed automatic detection solution can help doctors locate the polyps automatically and reduce missed diagnosis. The main contributions of this paper are firstly, a gastroscopic panorama reconstruction method is developed. The reconstruction does not require additional hardware devices and can solve the problem of texture dislocation and illumination imbalance properly; secondly, an end-to-end multiobject detection for gastroscopic panorama is trained based on a deep learning framework. Compared with traditional solutions, the automatic polyp detection system can locate all polyps in the inner wall of the stomach in real time and assist doctors to find the lesions. Thirdly, the system was evaluated in the Affiliated Hospital of Zhejiang University. The results show that the average error of the panorama is less than 2 mm, the accuracy of the polyp detection is 95%, and the recall rate is 99%. In addition, the research roadmap of this paper has guiding significance for endoscopy-assisted detection of other human soft cavities.
Collapse
|
31
|
Sinha A, Billings SD, Reiter A, Liu X, Ishii M, Hager GD, Taylor RH. The deformable most-likely-point paradigm. Med Image Anal 2019; 55:148-164. [PMID: 31078111 PMCID: PMC6681672 DOI: 10.1016/j.media.2019.04.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Revised: 04/22/2019] [Accepted: 04/30/2019] [Indexed: 11/30/2022]
Abstract
In this paper, we present three deformable registration algorithms designed within a paradigm that uses 3D statistical shape models to accomplish two tasks simultaneously: 1) register point features from previously unseen data to a statistically derived shape (e.g., mean shape), and 2) deform the statistically derived shape to estimate the shape represented by the point features. This paradigm, called the deformable most-likely-point paradigm, is motivated by the idea that generative shape models built from available data can be used to estimate previously unseen data. We developed three deformable registration algorithms within this paradigm using statistical shape models built from reliably segmented objects with correspondences. Results from several experiments show that our algorithms produce accurate registrations and reconstructions in a variety of applications with errors up to CT resolution on medical datasets. Our code is available at https://github.com/AyushiSinha/cisstICP.
Collapse
Affiliation(s)
- Ayushi Sinha
- Department of Computer Science, the Johns Hopkins University, Baltimore, MD, USA.
| | - Seth D Billings
- Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA
| | - Austin Reiter
- Department of Computer Science, the Johns Hopkins University, Baltimore, MD, USA
| | - Xingtong Liu
- Department of Computer Science, the Johns Hopkins University, Baltimore, MD, USA
| | - Masaru Ishii
- Department of Otolaryngology - Head and Neck Surgery, Johns Hopkins Medical Institutions, Baltimore, MD, USA
| | - Gregory D Hager
- Department of Computer Science, the Johns Hopkins University, Baltimore, MD, USA
| | - Russell H Taylor
- Department of Computer Science, the Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
32
|
Sinha A, Ishii M, Hager GD, Taylor RH. Endoscopic navigation in the clinic: registration in the absence of preoperative imaging. Int J Comput Assist Radiol Surg 2019; 14:1495-1506. [PMID: 31152381 DOI: 10.1007/s11548-019-02005-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 05/22/2019] [Indexed: 10/26/2022]
Abstract
PURPOSE Clinical examinations that involve endoscopic exploration of the nasal cavity and sinuses often do not have a reference preoperative image, like a computed tomography (CT) scan, to provide structural context to the clinician. The aim of this work is to provide structural context during clinical exploration without requiring additional CT acquisition. METHODS We present a method for registration during clinical endoscopy in the absence of CT scans by making use of shape statistics from past CT scans. Using a deformable registration algorithm that uses these shape statistics along with dense point clouds from video, we simultaneously achieve two goals: (1) register the statistically mean shape of the target anatomy with the video point cloud, and (2) estimate patient shape by deforming the mean shape to fit the video point cloud. Finally, we use statistical tests to assign confidence to the computed registration. RESULTS We are able to achieve submillimeter errors in registrations and patient shape reconstructions using simulated data. We establish and evaluate the confidence criteria for our registrations using simulated data. Finally, we evaluate our registration method on in vivo clinical data and assign confidence to these registrations using the criteria established in simulation. All registrations that are not rejected by our criteria produce submillimeter residual errors. CONCLUSION Our deformable registration method can produce submillimeter registrations and reconstructions as well as statistical scores that can be used to assign confidence to the registrations.
Collapse
Affiliation(s)
- Ayushi Sinha
- Laboratory for Computational and Sensing Robotics, The Johns Hopkins University, Baltimore, MD, 21218, USA.
| | - Masaru Ishii
- Department of Otolaryngology - Head and Neck Surgery, Johns Hopkins Medical Institutions, Baltimore, MD, 21205, USA
| | - Gregory D Hager
- Laboratory for Computational and Sensing Robotics, The Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Russell H Taylor
- Laboratory for Computational and Sensing Robotics, The Johns Hopkins University, Baltimore, MD, 21218, USA
| |
Collapse
|