1
|
Yang Z, Dai J, Pan J. 3D reconstruction from endoscopy images: A survey. Comput Biol Med 2024; 175:108546. [PMID: 38704902 DOI: 10.1016/j.compbiomed.2024.108546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/05/2024] [Accepted: 04/28/2024] [Indexed: 05/07/2024]
Abstract
Three-dimensional reconstruction of images acquired through endoscopes is playing a vital role in an increasing number of medical applications. Endoscopes used in the clinic are commonly classified as monocular endoscopes and binocular endoscopes. We have reviewed the classification of methods for depth estimation according to the type of endoscope. Basically, depth estimation relies on feature matching of images and multi-view geometry theory. However, these traditional techniques have many problems in the endoscopic environment. With the increasing development of deep learning techniques, there is a growing number of works based on learning methods to address challenges such as inconsistent illumination and texture sparsity. We have reviewed over 170 papers published in the 10 years from 2013 to 2023. The commonly used public datasets and performance metrics are summarized. We also give a taxonomy of methods and analyze the advantages and drawbacks of algorithms. Summary tables and result atlas are listed to facilitate the comparison of qualitative and quantitative performance of different methods in each category. In addition, we summarize commonly used scene representation methods in endoscopy and speculate on the prospects of deep estimation research in medical applications. We also compare the robustness performance, processing time, and scene representation of the methods to facilitate doctors and researchers in selecting appropriate methods based on surgical applications.
Collapse
Affiliation(s)
- Zhuoyue Yang
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China; Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China
| | - Ju Dai
- Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China
| | - Junjun Pan
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China; Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China.
| |
Collapse
|
2
|
Yang Z, Pan J, Dai J, Sun Z, Xiao Y. Self-Supervised Lightweight Depth Estimation in Endoscopy Combining CNN and Transformer. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1934-1944. [PMID: 38198275 DOI: 10.1109/tmi.2024.3352390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2024]
Abstract
In recent years, an increasing number of medical engineering tasks, such as surgical navigation, pre-operative registration, and surgical robotics, rely on 3D reconstruction techniques. Self-supervised depth estimation has attracted interest in endoscopic scenarios because it does not require ground truth. Most existing methods depend on expanding the size of parameters to improve their performance. There, designing a lightweight self-supervised model that can obtain competitive results is a hot topic. We propose a lightweight network with a tight coupling of convolutional neural network (CNN) and Transformer for depth estimation. Unlike other methods that use CNN and Transformer to extract features separately and then fuse them on the deepest layer, we utilize the modules of CNN and Transformer to extract features at different scales in the encoder. This hierarchical structure leverages the advantages of CNN in texture perception and Transformer in shape extraction. In the same scale of feature extraction, the CNN is used to acquire local features while the Transformer encodes global information. Finally, we add multi-head attention modules to the pose network to improve the accuracy of predicted poses. Experiments demonstrate that our approach obtains comparable results while effectively compressing the model parameters on two datasets.
Collapse
|
3
|
Schmidt A, Mohareri O, DiMaio S, Yip MC, Salcudean SE. Tracking and mapping in medical computer vision: A review. Med Image Anal 2024; 94:103131. [PMID: 38442528 DOI: 10.1016/j.media.2024.103131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 02/08/2024] [Accepted: 02/29/2024] [Indexed: 03/07/2024]
Abstract
As computer vision algorithms increase in capability, their applications in clinical systems will become more pervasive. These applications include: diagnostics, such as colonoscopy and bronchoscopy; guiding biopsies, minimally invasive interventions, and surgery; automating instrument motion; and providing image guidance using pre-operative scans. Many of these applications depend on the specific visual nature of medical scenes and require designing algorithms to perform in this environment. In this review, we provide an update to the field of camera-based tracking and scene mapping in surgery and diagnostics in medical computer vision. We begin with describing our review process, which results in a final list of 515 papers that we cover. We then give a high-level summary of the state of the art and provide relevant background for those who need tracking and mapping for their clinical applications. After which, we review datasets provided in the field and the clinical needs that motivate their design. Then, we delve into the algorithmic side, and summarize recent developments. This summary should be especially useful for algorithm designers and to those looking to understand the capability of off-the-shelf methods. We maintain focus on algorithms for deformable environments while also reviewing the essential building blocks in rigid tracking and mapping since there is a large amount of crossover in methods. With the field summarized, we discuss the current state of the tracking and mapping methods along with needs for future algorithms, needs for quantification, and the viability of clinical applications. We then provide some research directions and questions. We conclude that new methods need to be designed or combined to support clinical applications in deformable environments, and more focus needs to be put into collecting datasets for training and evaluation.
Collapse
Affiliation(s)
- Adam Schmidt
- Department of Electrical and Computer Engineering, University of British Columbia, 2329 West Mall, Vancouver V6T 1Z4, BC, Canada.
| | - Omid Mohareri
- Advanced Research, Intuitive Surgical, 1020 Kifer Rd, Sunnyvale, CA 94086, USA
| | - Simon DiMaio
- Advanced Research, Intuitive Surgical, 1020 Kifer Rd, Sunnyvale, CA 94086, USA
| | - Michael C Yip
- Department of Electrical and Computer Engineering, University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA
| | - Septimiu E Salcudean
- Department of Electrical and Computer Engineering, University of British Columbia, 2329 West Mall, Vancouver V6T 1Z4, BC, Canada
| |
Collapse
|
4
|
Liu M, Han Y, Wang J, Wang C, Wang Y, Meijering E. LSKANet: Long Strip Kernel Attention Network for Robotic Surgical Scene Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1308-1322. [PMID: 38015689 DOI: 10.1109/tmi.2023.3335406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Surgical scene segmentation is a critical task in Robotic-assisted surgery. However, the complexity of the surgical scene, which mainly includes local feature similarity (e.g., between different anatomical tissues), intraoperative complex artifacts, and indistinguishable boundaries, poses significant challenges to accurate segmentation. To tackle these problems, we propose the Long Strip Kernel Attention network (LSKANet), including two well-designed modules named Dual-block Large Kernel Attention module (DLKA) and Multiscale Affinity Feature Fusion module (MAFF), which can implement precise segmentation of surgical images. Specifically, by introducing strip convolutions with different topologies (cascaded and parallel) in two blocks and a large kernel design, DLKA can make full use of region- and strip-like surgical features and extract both visual and structural information to reduce the false segmentation caused by local feature similarity. In MAFF, affinity matrices calculated from multiscale feature maps are applied as feature fusion weights, which helps to address the interference of artifacts by suppressing the activations of irrelevant regions. Besides, the hybrid loss with Boundary Guided Head (BGH) is proposed to help the network segment indistinguishable boundaries effectively. We evaluate the proposed LSKANet on three datasets with different surgical scenes. The experimental results show that our method achieves new state-of-the-art results on all three datasets with improvements of 2.6%, 1.4%, and 3.4% mIoU, respectively. Furthermore, our method is compatible with different backbones and can significantly increase their segmentation accuracy. Code is available at https://github.com/YubinHan73/LSKANet.
Collapse
|
5
|
Rueckert T, Rueckert D, Palm C. Methods and datasets for segmentation of minimally invasive surgical instruments in endoscopic images and videos: A review of the state of the art. Comput Biol Med 2024; 169:107929. [PMID: 38184862 DOI: 10.1016/j.compbiomed.2024.107929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 12/02/2023] [Accepted: 01/01/2024] [Indexed: 01/09/2024]
Abstract
In the field of computer- and robot-assisted minimally invasive surgery, enormous progress has been made in recent years based on the recognition of surgical instruments in endoscopic images and videos. In particular, the determination of the position and type of instruments is of great interest. Current work involves both spatial and temporal information, with the idea that predicting the movement of surgical tools over time may improve the quality of the final segmentations. The provision of publicly available datasets has recently encouraged the development of new methods, mainly based on deep learning. In this review, we identify and characterize datasets used for method development and evaluation and quantify their frequency of use in the literature. We further present an overview of the current state of research regarding the segmentation and tracking of minimally invasive surgical instruments in endoscopic images and videos. The paper focuses on methods that work purely visually, without markers of any kind attached to the instruments, considering both single-frame semantic and instance segmentation approaches, as well as those that incorporate temporal information. The publications analyzed were identified through the platforms Google Scholar, Web of Science, and PubMed. The search terms used were "instrument segmentation", "instrument tracking", "surgical tool segmentation", and "surgical tool tracking", resulting in a total of 741 articles published between 01/2015 and 07/2023, of which 123 were included using systematic selection criteria. A discussion of the reviewed literature is provided, highlighting existing shortcomings and emphasizing the available potential for future developments.
Collapse
Affiliation(s)
- Tobias Rueckert
- Regensburg Medical Image Computing (ReMIC), Ostbayerische Technische Hochschule Regensburg (OTH Regensburg), Germany.
| | - Daniel Rueckert
- Artificial Intelligence in Healthcare and Medicine, Klinikum rechts der Isar, Technical University of Munich, Germany; Department of Computing, Imperial College London, UK
| | - Christoph Palm
- Regensburg Medical Image Computing (ReMIC), Ostbayerische Technische Hochschule Regensburg (OTH Regensburg), Germany; Regensburg Center of Health Sciences and Technology (RCHST), OTH Regensburg, Germany
| |
Collapse
|
6
|
Wang Y, Lam HK, Xu Y, Yin F, Qian K. Multi-task learning framework to predict the status of central venous catheter based on radiographs. Artif Intell Med 2023; 146:102721. [PMID: 38042594 DOI: 10.1016/j.artmed.2023.102721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Revised: 09/29/2023] [Accepted: 11/14/2023] [Indexed: 12/04/2023]
Abstract
Hospital patients can have catheters and lines inserted during the course of their admission to give medicines for the treatment of medical issues, especially the central venous catheter (CVC). However, malposition of CVC will lead to many complications, even death. Clinicians always detect the status of the catheter to avoid the above issues via X-ray images. To reduce the workload of clinicians and improve the efficiency of CVC status detection, a multi-task learning framework for catheter status classification based on the convolutional neural network (CNN) is proposed. The proposed framework contains three significant components which are modified HRNet, multi-task supervision including segmentation supervision and heatmap regression supervision as well as classification branch. The modified HRNet maintaining high-resolution features from the start to the end can ensure to generation of high-quality assisted information for classification. The multi-task supervision can assist in alleviating the presence of other line-like structures such as other tubes and anatomical structures shown in the X-ray image. Furthermore, during the inference, this module is also considered as an interpretation interface to show where the framework pays attention to. Eventually, the classification branch is proposed to predict the class of the status of the catheter. A public CVC dataset is utilized to evaluate the performance of the proposed method, which gains 0.823 AUC (Area under the ROC curve) and 82.6% accuracy in the test dataset. Compared with two state-of-the-art methods (ATCM method and EDMC method), the proposed method can perform best.
Collapse
Affiliation(s)
- Yuhan Wang
- Department of Engineering, King's College London, Strand, London, WC2R 2LS, United Kingdom
| | - Hak Keung Lam
- Department of Engineering, King's College London, Strand, London, WC2R 2LS, United Kingdom.
| | - Yujia Xu
- Department of Engineering, King's College London, Strand, London, WC2R 2LS, United Kingdom
| | - Faliang Yin
- Department of Engineering, King's College London, Strand, London, WC2R 2LS, United Kingdom
| | - Kun Qian
- Center for the Developing Brain, School of Biomedical Engineering and Imaging Sciences, King's College London, St Thomas' Campus, St Thomas' Hospital, Westminster Bridge Road, London, SE1 7EH, United Kingdom
| |
Collapse
|
7
|
Liu Z, Zheng L, Yang S, Zhong Z, Zhang G. MFF-Net: Multiscale feature fusion semantic segmentation network for intracranial surgical instruments. Int J Med Robot 2023:e2595. [PMID: 37932905 DOI: 10.1002/rcs.2595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 09/25/2023] [Accepted: 10/26/2023] [Indexed: 11/08/2023]
Abstract
BACKGROUND In robot-assisted surgery, automatic segmentation of surgical instrument images is crucial for surgical safety. The proposed method addresses challenges in the craniotomy environment, such as occlusion and illumination, through an efficient surgical instrument segmentation network. METHODS The network uses YOLOv8 as the target detection framework and integrates a semantic segmentation head to achieve detection and segmentation capabilities. A concatenation of multi-channel feature maps is designed to enhance model generalisation by fusing deep and shallow features. The innovative GBC2f module ensures the lightweight of the network and the ability to capture global information. RESULTS Experimental validation of the intracranial glioma surgical instrument dataset shows excellent performance: 94.9% MPA score, 89.9% MIoU value, and 126.6 FPS. CONCLUSIONS According to the experimental results, the segmentation model proposed in this study has significant advantages over other state-of-the-art models. This provides a valuable reference for the further development of intelligent surgical robots.
Collapse
Affiliation(s)
- Zhenzhong Liu
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, China
- National Demonstration Center for Experimental Mechanical and Electrical Engineering Education (Tianjin University of Technology), Tianjin, China
| | - Laiwang Zheng
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, China
- National Demonstration Center for Experimental Mechanical and Electrical Engineering Education (Tianjin University of Technology), Tianjin, China
| | - Shubin Yang
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, China
- National Demonstration Center for Experimental Mechanical and Electrical Engineering Education (Tianjin University of Technology), Tianjin, China
| | - Zichen Zhong
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, China
- National Demonstration Center for Experimental Mechanical and Electrical Engineering Education (Tianjin University of Technology), Tianjin, China
| | - Guobin Zhang
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, China
- National Demonstration Center for Experimental Mechanical and Electrical Engineering Education (Tianjin University of Technology), Tianjin, China
| |
Collapse
|
8
|
Liu Z, Zheng L, Gu L, Yang S, Zhong Z, Zhang G. InstrumentNet: An integrated model for real-time segmentation of intracranial surgical instruments. Comput Biol Med 2023; 166:107565. [PMID: 37839219 DOI: 10.1016/j.compbiomed.2023.107565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 09/13/2023] [Accepted: 10/10/2023] [Indexed: 10/17/2023]
Abstract
In robot-assisted surgery, precise surgical instrument segmentation technology can provide accurate location and pose data for surgeons, helping them perform a series of surgical operations efficiently and safely. However, there are still some interfering factors, such as surgical instruments being covered by tissue, multiple surgical instruments interlacing with each other, and instrument shaking during surgery. To better address these issues, an effective surgical instrument segmentation network called InstrumentNet is proposed, which adopts YOLOv7 as the object detection framework to achieve a real-time detection solution. Specifically, a multiscale feature fusion network is constructed, which aims to avoid problems such as feature redundancy and feature loss and enhance the generalization ability. Furthermore, an adaptive feature-weighted fusion mechanism is introduced to regulate network learning and convergence. Finally, a semantic segmentation head is introduced to integrate the detection and segmentation functions, and a multitask learning loss function is specifically designed to optimize the surgical instrument segmentation performance. The proposed segmentation model is validated on a dataset of intracranial surgical instruments provided by seven experts from Beijing Tiantan Hospital and achieved an mAP score of 93.5 %, Dice score of 82.49 %, and MIoU score of 85.48 %, demonstrating its universality and superiority. The experimental results demonstrate that the proposed model achieves good segmentation performance on surgical instruments compared to other advanced models and can provide a reference for developing intelligent medical robots.
Collapse
Affiliation(s)
- Zhenzhong Liu
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, 300384, China; National Demonstration Center for Experimental Mechanical and Electrical Engineering Education (Tianjin University of Technology), China
| | - Laiwang Zheng
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, 300384, China; National Demonstration Center for Experimental Mechanical and Electrical Engineering Education (Tianjin University of Technology), China
| | - Lin Gu
- RIkagaku KENkyusho, Tokyo, Japan; Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan
| | - Shubin Yang
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, 300384, China; National Demonstration Center for Experimental Mechanical and Electrical Engineering Education (Tianjin University of Technology), China
| | - Zichen Zhong
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, 300384, China; National Demonstration Center for Experimental Mechanical and Electrical Engineering Education (Tianjin University of Technology), China
| | - Guobin Zhang
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, 300384, China; National Demonstration Center for Experimental Mechanical and Electrical Engineering Education (Tianjin University of Technology), China.
| |
Collapse
|