1
|
Zhao Z, Liu Y, Wu H, Wang M, Li Y, Wang S, Teng L, Liu D, Cui Z, Wang Q, Shen D. CLIP in medical imaging: A survey. Med Image Anal 2025; 102:103551. [PMID: 40127590 DOI: 10.1016/j.media.2025.103551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 03/02/2025] [Accepted: 03/10/2025] [Indexed: 03/26/2025]
Abstract
Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training paradigm, successfully introduces text supervision to vision models. It has shown promising results across various tasks due to its generalizability and interpretability. The use of CLIP has recently gained increasing interest in the medical imaging domain, serving as a pre-training paradigm for image-text alignment, or a critical component in diverse clinical tasks. With the aim of facilitating a deeper understanding of this promising direction, this survey offers an in-depth exploration of the CLIP within the domain of medical imaging, regarding both refined CLIP pre-training and CLIP-driven applications. In this paper, we (1) first start with a brief introduction to the fundamentals of CLIP methodology; (2) then investigate the adaptation of CLIP pre-training in the medical imaging domain, focusing on how to optimize CLIP given characteristics of medical images and reports; (3) further explore practical utilization of CLIP pre-trained models in various tasks, including classification, dense prediction, and cross-modal tasks; and (4) finally discuss existing limitations of CLIP in the context of medical imaging, and propose forward-looking directions to address the demands of medical imaging domain. Studies featuring technical and practical value are both investigated. We expect this survey will provide researchers with a holistic understanding of the CLIP paradigm and its potential implications. The project page of this survey can also be found on Github.
Collapse
Affiliation(s)
- Zihao Zhao
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
| | - Yuxiao Liu
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
| | - Han Wu
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
| | - Mei Wang
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China; School of Biomedical Engineering, Southern Medical University, Guangzhou, China
| | - Yonghao Li
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
| | - Sheng Wang
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China; School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Lin Teng
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
| | - Disheng Liu
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
| | - Zhiming Cui
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China.
| | - Qian Wang
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China.
| | - Dinggang Shen
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China; Department of Research and Development, Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China; Shanghai Clinical Research and Trial Center, Shanghai, China.
| |
Collapse
|
2
|
Bai X, Bai F, Huo X, Ge J, Lu J, Ye X, Shu M, Yan K, Xia Y. UAE: Universal Anatomical Embedding on multi-modality medical images. Med Image Anal 2025; 103:103562. [PMID: 40209554 DOI: 10.1016/j.media.2025.103562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/20/2025] [Accepted: 03/23/2025] [Indexed: 04/12/2025]
Abstract
Identifying anatomical structures (e.g., lesions or landmarks) is crucial for medical image analysis. Exemplar-based landmark detection methods are gaining attention as they allow the detection of arbitrary points during inference without needing annotated landmarks during training. These methods use self-supervised learning to create a discriminative voxel embedding and match corresponding landmarks via nearest-neighbor searches, showing promising results. However, current methods still face challenges in (1) differentiating voxels with similar appearance but different semantic meanings (e.g., two adjacent structures without clear borders); (2) matching voxels with similar semantics but markedly different appearance (e.g., the same vessel before and after contrast injection); and (3) cross-modality matching (e.g., CT-MRI landmark-based registration). To overcome these challenges, we propose a Unified framework for learning Anatomical Embeddings (UAE). UAE is designed to learn appearance, semantic, and cross-modality anatomical embeddings. Specifically, UAE incorporates three key innovations: (1) semantic embedding learning with prototypical contrastive loss; (2) a fixed-point-based matching strategy; and (3) an iterative approach for cross-modality embedding learning. We thoroughly evaluated UAE across intra- and inter-modality tasks, including one-shot landmark detection, lesion tracking on longitudinal CT scans, and CT-MRI affine/rigid registration with varying fields of view. Our results suggest that UAE outperforms state-of-the-art methods, offering a robust and versatile approach for landmark-based medical image analysis tasks. Code and trained models are available at: https://github.com/alibaba-damo-academy/self-supervised-anatomical-embedding-v2.
Collapse
Affiliation(s)
- Xiaoyu Bai
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Fan Bai
- Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | | | - Jia Ge
- The First Affiliated Hospital, Zhejiang University, Hangzhou, China
| | - Jingjing Lu
- Peking Union Medical College Hospital, Beijing, China
| | - Xianghua Ye
- The First Affiliated Hospital, Zhejiang University, Hangzhou, China
| | - Minglei Shu
- Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan Shandong 250014, China
| | - Ke Yan
- DAMO Academy, Alibaba Group, China; Hupan Lab, 310023, Hangzhou, China.
| | - Yong Xia
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China.
| |
Collapse
|
3
|
Albuquerque C, Henriques R, Castelli M. Deep learning-based object detection algorithms in medical imaging: Systematic review. Heliyon 2025; 11:e41137. [PMID: 39758372 PMCID: PMC11699422 DOI: 10.1016/j.heliyon.2024.e41137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 12/04/2024] [Accepted: 12/10/2024] [Indexed: 01/06/2025] Open
Abstract
Over the past decade, Deep Learning (DL) techniques have demonstrated remarkable advancements across various domains, driving their widespread adoption. Particularly in medical image analysis, DL received greater attention for tasks like image segmentation, object detection, and classification. This paper provides an overview of DL-based object recognition in medical images, exploring recent methods and emphasizing different imaging techniques and anatomical applications. Utilizing a meticulous quantitative and qualitative analysis following PRISMA guidelines, we examined publications based on citation rates to explore into the utilization of DL-based object detectors across imaging modalities and anatomical domains. Our findings reveal a consistent rise in the utilization of DL-based object detection models, indicating unexploited potential in medical image analysis. Predominantly within Medicine and Computer Science domains, research in this area is most active in the US, China, and Japan. Notably, DL-based object detection methods have gotten significant interest across diverse medical imaging modalities and anatomical domains. These methods have been applied to a range of techniques including CR scans, pathology images, and endoscopic imaging, showcasing their adaptability. Moreover, diverse anatomical applications, particularly in digital pathology and microscopy, have been explored. The analysis underscores the presence of varied datasets, often with significant discrepancies in size, with a notable percentage being labeled as private or internal, and with prospective studies in this field remaining scarce. Our review of existing trends in DL-based object detection in medical images offers insights for future research directions. The continuous evolution of DL algorithms highlighted in the literature underscores the dynamic nature of this field, emphasizing the need for ongoing research and fitted optimization for specific applications.
Collapse
|
4
|
Liu L, Liu J, Santra B, Parnell C, Mukherjee P, Mathai T, Zhu Y, Anand A, Summers RM. Utilizing domain knowledge to improve the classification of intravenous contrast phase of CT scans. Comput Med Imaging Graph 2025; 119:102458. [PMID: 39740481 DOI: 10.1016/j.compmedimag.2024.102458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 10/03/2024] [Accepted: 10/28/2024] [Indexed: 01/02/2025]
Abstract
Multiple intravenous contrast phases of CT scans are commonly used in clinical practice to facilitate disease diagnosis. However, contrast phase information is commonly missing or incorrect due to discrepancies in CT series descriptions and imaging practices. This work aims to develop a classification algorithm to automatically determine the contrast phase of a CT scan. We hypothesize that image intensities of key organs (e.g. aorta, inferior vena cava) affected by contrast enhancement are inherent feature information to decide the contrast phase. These organs are segmented by TotalSegmentator followed by generating intensity features on each segmented organ region. Two internal and one external dataset were collected to validate the classification accuracy. In comparison with the baseline ResNet classification method that did not make use of key organs features, the proposed method achieved the comparable accuracy of 92.5% and F1 score of 92.5% in one internal dataset. The accuracy was improved from 63.9% to 79.8% and F1 score from 43.9% to 65.0% using the proposed method on the other internal dataset. The accuracy improved from 63.5% to 85.1% and the F1 score from 56.4% to 83.9% on the external dataset. Image intensity features from key organs are critical for improving the classification accuracy of contrast phases of CT scans. The classification method based on these features is robust to different scanners and imaging protocols from different institutes. Our results suggested improved classification accuracy over existing approaches, which advances the application of automatic contrast phase classification toward real clinical practice. The code for this work can be found here: (https://github.com/rsummers11/CT_Contrast_Phase_Classifier).
Collapse
Affiliation(s)
- Liangchen Liu
- Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Clinical Center, National Institutes of Health, United States of America.
| | - Jianfei Liu
- Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Clinical Center, National Institutes of Health, United States of America
| | - Bikash Santra
- Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Clinical Center, National Institutes of Health, United States of America; Indian Institute of Technology, Jodhpur, India
| | | | - Pritam Mukherjee
- Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Clinical Center, National Institutes of Health, United States of America
| | - Tejas Mathai
- Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Clinical Center, National Institutes of Health, United States of America
| | - Yingying Zhu
- The University of Texas at Arlington, United States of America
| | - Akshaya Anand
- The University of Maryland, United States of America
| | - Ronald M Summers
- Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Clinical Center, National Institutes of Health, United States of America.
| |
Collapse
|
5
|
Bai X, Chen G, Ma B, Li C, Zhang J, Xia Y. Exploratory Training for Universal Lesion Detection: Enhancing Lesion Mining Quality Through Temporal Verification. IEEE J Biomed Health Inform 2024; 28:6117-6129. [PMID: 38905094 DOI: 10.1109/jbhi.2024.3417274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]
Abstract
Universal lesion detection (ULD) has great value in clinical practice as it can detect various lesions across multiple organs. Deep learning-based detectors have great potential but require high-quality annotated training data. In practice, due to cost, expertise requirements, and the diverse nature of lesions, incomplete annotations are encountered. Directly training ULD detectors under this condition can yield suboptimal results. Leading pseudo-label methods rely on a dynamic lesion-mining mechanism operating at the mini-batch level to address this issue. However, the quality of mined lesions is inconsistent across different iterations, potentially limiting performance enhancement. Inspired by the observation that deep models learn concepts with increasing complexity, we propose an exploratory-training-based ULD (ET-ULD) method to assess the reliability of mined lesions over time. Our approach uses a teacher-student detection model where the teacher mines suspicious lesions, which are then combined with incomplete annotations to train the student. On top of that, we design a bounding-box bank to record the mining timestamps. Each image is trained in several rounds, allowing us to get a sequence of timestamps for the mined lesions. If a mined lesion consistently appears, it is likely to be a true lesion, otherwise, it may just be a noise. This serves as a crucial criterion for selecting reliable mined lesions for retraining. Experimental results show that ET-ULD surpass existing state-of-the-art methods on two distinct lesion image datasets. Notably, on the DeepLesion dataset, ET-ULD achieved a 5.4% improvement in Average Precision (AP) over the previous methods, demonstrating its superior performance.
Collapse
|
6
|
Ma J, Yoon JH, Lu L, Yang H, Guo P, Yang D, Li J, Shen J, Schwartz LH, Zhao B. A quantitative analysis of the improvement provided by comprehensive annotation on CT lesion detection using deep learning. J Appl Clin Med Phys 2024; 25:e14434. [PMID: 39078867 PMCID: PMC11492393 DOI: 10.1002/acm2.14434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 04/25/2024] [Accepted: 05/20/2024] [Indexed: 10/22/2024] Open
Abstract
BACKGROUND Data collected from hospitals are usually partially annotated by radiologists due to time constraints. Developing and evaluating deep learning models on these data may result in over or under estimation PURPOSE: We aimed to quantitatively investigate how the percentage of annotated lesions in CT images will influence the performance of universal lesion detection (ULD) algorithms. METHODS We trained a multi-view feature pyramid network with position-aware attention (MVP-Net) to perform ULD. Three versions of the DeepLesion dataset were created for training MVP-Net. Original DeepLesion Dataset (OriginalDL) is the publicly available, widely studied DeepLesion dataset that includes 32 735 lesions in 4427 patients which were partially labeled during routine clinical practice. Enriched DeepLesion Dataset (EnrichedDL) is an enhanced dataset that features fully labeled at one or more time points for 4145 patients with 34 317 lesions. UnionDL is the union of the OriginalDL and EnrichedDL with 54 510 labeled lesions in 4427 patients. Each dataset was used separately to train MVP-Net, resulting in the following models: OriginalCNN (replicating the original result), EnrichedCNN (testing the effect of increased annotation), and UnionCNN (featuring the greatest number of annotations). RESULTS Although the reported mean sensitivity of OriginalCNN was 84.3% using the OriginalDL testing set, the performance fell sharply when tested on the EnrichedDL testing set, yielding mean sensitivities of 56.1%, 66.0%, and 67.8% for OriginalCNN, EnrichedCNN, and UnionCNN, respectively. We also found that increasing the percentage of annotated lesions in the training set increased sensitivity, but the margin of increase in performance gradually diminished according to the power law. CONCLUSIONS We expanded and improved the existing DeepLesion dataset by annotating additional 21 775 lesions, and we demonstrated that using fully labeled CT images avoided overestimation of MVP-Net's performance while increasing the algorithm's sensitivity, which may have a huge impact to the future CT lesion detection research. The annotated lesions are at https://github.com/ComputationalImageAnalysisLab/DeepLesionData.
Collapse
Affiliation(s)
- Jingchen Ma
- Department of RadiologyMemorial Sloan Kettering Cancer CenterNew YorkNew YorkUSA
| | - Jin H. Yoon
- Department of RadiologyColumbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Lin Lu
- Department of RadiologyMemorial Sloan Kettering Cancer CenterNew YorkNew YorkUSA
| | - Hao Yang
- Department of RadiologyMemorial Sloan Kettering Cancer CenterNew YorkNew YorkUSA
| | - Pingzhen Guo
- Department of RadiologyMemorial Sloan Kettering Cancer CenterNew YorkNew YorkUSA
| | - Dawei Yang
- Department of RadiologyBeijing Friendship HospitalCapital Medical UniversityBeijingChina
| | - Jing Li
- Department of RadiologyBeijing Friendship HospitalCapital Medical UniversityBeijingChina
| | - Jingxian Shen
- Medical Imaging DepartmentSun Yat‐Sen University Cancer CenterState Key Laboratory of Oncology in South ChinaGuangzhouChina
| | - Lawrence H. Schwartz
- Department of RadiologyMemorial Sloan Kettering Cancer CenterNew YorkNew YorkUSA
| | - Binsheng Zhao
- Department of RadiologyMemorial Sloan Kettering Cancer CenterNew YorkNew YorkUSA
| |
Collapse
|
7
|
Kim S, Park H, Kang M, Jin KH, Adeli E, Pohl KM, Park SH. Federated learning with knowledge distillation for multi-organ segmentation with partially labeled datasets. Med Image Anal 2024; 95:103156. [PMID: 38603844 DOI: 10.1016/j.media.2024.103156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 03/11/2024] [Accepted: 03/20/2024] [Indexed: 04/13/2024]
Abstract
The state-of-the-art multi-organ CT segmentation relies on deep learning models, which only generalize when trained on large samples of carefully curated data. However, it is challenging to train a single model that can segment all organs and types of tumors since most large datasets are partially labeled or are acquired across multiple institutes that may differ in their acquisitions. A possible solution is Federated learning, which is often used to train models on multi-institutional datasets where the data is not shared across sites. However, predictions of federated learning can be unreliable after the model is locally updated at sites due to 'catastrophic forgetting'. Here, we address this issue by using knowledge distillation (KD) so that the local training is regularized with the knowledge of a global model and pre-trained organ-specific segmentation models. We implement the models in a multi-head U-Net architecture that learns a shared embedding space for different organ segmentation, thereby obtaining multi-organ predictions without repeated processes. We evaluate the proposed method using 8 publicly available abdominal CT datasets of 7 different organs. Of those datasets, 889 CTs were used for training, 233 for internal testing, and 30 volumes for external testing. Experimental results verified that our proposed method substantially outperforms other state-of-the-art methods in terms of accuracy, inference time, and the number of parameters.
Collapse
Affiliation(s)
- Soopil Kim
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology, Republic of Korea; Department of Psychiatry and Behavioral Sciences, Stanford University, CA 94305, USA
| | - Heejung Park
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology, Republic of Korea
| | - Myeongkyun Kang
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology, Republic of Korea; Department of Psychiatry and Behavioral Sciences, Stanford University, CA 94305, USA
| | - Kyong Hwan Jin
- School of Electrical Engineering, Korea University, Republic of Korea
| | - Ehsan Adeli
- Department of Psychiatry and Behavioral Sciences, Stanford University, CA 94305, USA
| | - Kilian M Pohl
- Department of Psychiatry and Behavioral Sciences, Stanford University, CA 94305, USA
| | - Sang Hyun Park
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology, Republic of Korea.
| |
Collapse
|
8
|
Xu R, Liu Z, Luo Y, Hu H, Shen L, Du B, Kuang K, Yang J. SGDA: Towards 3-D Universal Pulmonary Nodule Detection via Slice Grouped Domain Attention. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1093-1105. [PMID: 37028322 DOI: 10.1109/tcbb.2023.3253713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Lung cancer is the leading cause of cancer death worldwide. The best solution for lung cancer is to diagnose the pulmonary nodules in the early stage, which is usually accomplished with the aid of thoracic computed tomography (CT). As deep learning thrives, convolutional neural networks (CNNs) have been introduced into pulmonary nodule detection to help doctors in this labor-intensive task and demonstrated to be very effective. However, the current pulmonary nodule detection methods are usually domain-specific, and cannot satisfy the requirement of working in diverse real-world scenarios. To address this issue, we propose a slice grouped domain attention (SGDA) module to enhance the generalization capability of the pulmonary nodule detection networks. This attention module works in the axial, coronal, and sagittal directions. In each direction, we divide the input feature into groups, and for each group, we utilize a universal adapter bank to capture the feature subspaces of the domains spanned by all pulmonary nodule datasets. Then the bank outputs are combined from the perspective of domain to modulate the input group. Extensive experiments demonstrate that SGDA enables substantially better multi-domain pulmonary nodule detection performance compared with the state-of-the-art multi-domain learning methods.
Collapse
|
9
|
Jenke AC, Bodenstedt S, Kolbinger FR, Distler M, Weitz J, Speidel S. One model to use them all: training a segmentation model with complementary datasets. Int J Comput Assist Radiol Surg 2024; 19:1233-1241. [PMID: 38678102 PMCID: PMC11178567 DOI: 10.1007/s11548-024-03145-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 04/08/2024] [Indexed: 04/29/2024]
Abstract
PURPOSE Understanding surgical scenes is crucial for computer-assisted surgery systems to provide intelligent assistance functionality. One way of achieving this is via scene segmentation using machine learning (ML). However, such ML models require large amounts of annotated training data, containing examples of all relevant object classes, which are rarely available. In this work, we propose a method to combine multiple partially annotated datasets, providing complementary annotations, into one model, enabling better scene segmentation and the use of multiple readily available datasets. METHODS Our method aims to combine available data with complementary labels by leveraging mutual exclusive properties to maximize information. Specifically, we propose to use positive annotations of other classes as negative samples and to exclude background pixels of these binary annotations, as we cannot tell if a positive prediction by the model is correct. RESULTS We evaluate our method by training a DeepLabV3 model on the publicly available Dresden Surgical Anatomy Dataset, which provides multiple subsets of binary segmented anatomical structures. Our approach successfully combines 6 classes into one model, significantly increasing the overall Dice Score by 4.4% compared to an ensemble of models trained on the classes individually. By including information on multiple classes, we were able to reduce the confusion between classes, e.g. a 24% drop for stomach and colon. CONCLUSION By leveraging multiple datasets and applying mutual exclusion constraints, we developed a method that improves surgical scene segmentation performance without the need for fully annotated datasets. Our results demonstrate the feasibility of training a model on multiple complementary datasets. This paves the way for future work further alleviating the need for one specialized large, fully segmented dataset but instead the use of already existing datasets.
Collapse
Affiliation(s)
- Alexander C Jenke
- Department of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC) Dresden, Fetscherstraße 74, Dresden, Germany.
- German Cancer Research Center (DKFZ), Heidelberg, Germany.
- Faculty of Medicine and University Hospital Carl Gustav Carus, Technical University Dresden, Dresden, Germany.
- Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany.
| | - Sebastian Bodenstedt
- Department of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC) Dresden, Fetscherstraße 74, Dresden, Germany
- German Cancer Research Center (DKFZ), Heidelberg, Germany
- Faculty of Medicine and University Hospital Carl Gustav Carus, Technical University Dresden, Dresden, Germany
- Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
- Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technical University Dresden, CeTI Exzellenz-Cluster, Dresden, Saxony, Germany
| | - Fiona R Kolbinger
- Department of Visceral, Thoracic and Vascular Surgery, University Hospital and Faculty of Medicine Carl Gustav Carus, Technical University Dresden, Fetscherstraße 74, Dresden, Saxony, Germany
- Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technical University Dresden, CeTI Exzellenz-Cluster, Dresden, Saxony, Germany
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN, USA
| | - Marius Distler
- Department of Visceral, Thoracic and Vascular Surgery, University Hospital and Faculty of Medicine Carl Gustav Carus, Technical University Dresden, Fetscherstraße 74, Dresden, Saxony, Germany
- Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technical University Dresden, CeTI Exzellenz-Cluster, Dresden, Saxony, Germany
| | - Jürgen Weitz
- Department of Visceral, Thoracic and Vascular Surgery, University Hospital and Faculty of Medicine Carl Gustav Carus, Technical University Dresden, Fetscherstraße 74, Dresden, Saxony, Germany
- Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technical University Dresden, CeTI Exzellenz-Cluster, Dresden, Saxony, Germany
| | - Stefanie Speidel
- Department of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC) Dresden, Fetscherstraße 74, Dresden, Germany
- German Cancer Research Center (DKFZ), Heidelberg, Germany
- Faculty of Medicine and University Hospital Carl Gustav Carus, Technical University Dresden, Dresden, Germany
- Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
- Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technical University Dresden, CeTI Exzellenz-Cluster, Dresden, Saxony, Germany
| |
Collapse
|
10
|
Zhan F, Wang W, Chen Q, Guo Y, He L, Wang L. Three-Direction Fusion for Accurate Volumetric Liver and Tumor Segmentation. IEEE J Biomed Health Inform 2024; 28:2175-2186. [PMID: 38109246 DOI: 10.1109/jbhi.2023.3344392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Biomedical image segmentation of organs, tissues and lesions has gained increasing attention in clinical treatment planning and navigation, which involves the exploration of two-dimensional (2D) and three-dimensional (3D) contexts in the biomedical image. Compared to 2D methods, 3D methods pay more attention to inter-slice correlations, which offer additional spatial information for image segmentation. An organ or tumor has a 3D structure that can be observed from three directions. Previous studies focus only on the vertical axis, limiting the understanding of the relationship between a tumor and its surrounding tissues. Important information can also be obtained from sagittal and coronal axes. Therefore, spatial information of organs and tumors can be obtained from three directions, i.e. the sagittal, coronal and vertical axes, to understand better the invasion depth of tumor and its relationship with the surrounding tissues. Moreover, the edges of organs and tumors in biomedical image may be blurred. To address these problems, we propose a three-direction fusion volumetric segmentation (TFVS) model for segmenting 3D biomedical images from three perspectives in sagittal, coronal and transverse planes, respectively. We use the dataset of the liver task provided by the Medical Segmentation Decathlon challenge to train our model. The TFVS method demonstrates a competitive performance on the 3D-IRCADB dataset. In addition, the t-test and Wilcoxon signed-rank test are also performed to show the statistical significance of the improvement by the proposed method as compared with the baseline methods. The proposed method is expected to be beneficial in guiding and facilitating clinical diagnosis and treatment.
Collapse
|
11
|
Abstract
Artificial intelligence (AI) is an epoch-making technology, among which the 2 most advanced parts are machine learning and deep learning algorithms that have been further developed by machine learning, and it has been partially applied to assist EUS diagnosis. AI-assisted EUS diagnosis has been reported to have great value in the diagnosis of pancreatic tumors and chronic pancreatitis, gastrointestinal stromal tumors, esophageal early cancer, biliary tract, and liver lesions. The application of AI in EUS diagnosis still has some urgent problems to be solved. First, the development of sensitive AI diagnostic tools requires a large amount of high-quality training data. Second, there is overfitting and bias in the current AI algorithms, leading to poor diagnostic reliability. Third, the value of AI still needs to be determined in prospective studies. Fourth, the ethical risks of AI need to be considered and avoided.
Collapse
Affiliation(s)
- Deyu Zhang
- Department of Gastroenterology, Changhai hospital, Naval Medical University, Shanghai 200433, China
| | - Chang Wu
- Department of Gastroenterology, Changhai hospital, Naval Medical University, Shanghai 200433, China
| | - Zhenghui Yang
- Department of Gastroenterology, Changhai hospital, Naval Medical University, Shanghai 200433, China
| | - Hua Yin
- Department of Gastroenterology, General Hospital of Ningxia Medical University, Yinchuan 750004, Ningxia Hui Autonomous Region, China
| | - Yue Liu
- Department of Gastroenterology, Changhai hospital, Naval Medical University, Shanghai 200433, China
| | - Wanshun Li
- Department of Gastroenterology, Changhai hospital, Naval Medical University, Shanghai 200433, China
| | - Haojie Huang
- Department of Gastroenterology, Changhai hospital, Naval Medical University, Shanghai 200433, China
| | - Zhendong Jin
- Department of Gastroenterology, Changhai hospital, Naval Medical University, Shanghai 200433, China
| |
Collapse
|
12
|
Han J, Wei X, Faisal AA. EEG decoding for datasets with heterogenous electrode configurations using transfer learning graph neural networks. J Neural Eng 2023; 20:066027. [PMID: 37931308 DOI: 10.1088/1741-2552/ad09ff] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 11/06/2023] [Indexed: 11/08/2023]
Abstract
Objective. Brain-machine interfacing (BMI) has greatly benefited from adopting machine learning methods for feature learning that require extensive data for training, which are often unavailable from a single dataset. Yet, it is difficult to combine data across labs or even data within the same lab collected over the years due to the variation in recording equipment and electrode layouts resulting in shifts in data distribution, changes in data dimensionality, and altered identity of data dimensions. Our objective is to overcome this limitation and learn from many different and diverse datasets across labs with different experimental protocols.Approach. To tackle the domain adaptation problem, we developed a novel machine learning framework combining graph neural networks (GNNs) and transfer learning methodologies for non-invasive motor imagery (MI) EEG decoding, as an example of BMI. Empirically, we focus on the challenges of learning from EEG data with different electrode layouts and varying numbers of electrodes. We utilize three MI EEG databases collected using very different numbers of EEG sensors (from 22 channels to 64) and layouts (from custom layouts to 10-20).Main results. Our model achieved the highest accuracy with lower standard deviations on the testing datasets. This indicates that the GNN-based transfer learning framework can effectively aggregate knowledge from multiple datasets with different electrode layouts, leading to improved generalization in subject-independent MI EEG classification.Significance. The findings of this study have important implications for brain-computer-interface research, as they highlight a promising method for overcoming the limitations posed by non-unified experimental setups. By enabling the integration of diverse datasets with varying electrode layouts, our proposed approach can help advance the development and application of BMI technologies.
Collapse
Affiliation(s)
- Jinpei Han
- Brain & Behaviour Lab, Department of Computing, Imperial College London, London SW7 2AZ, United Kingdom
| | - Xiaoxi Wei
- Brain & Behaviour Lab, Department of Computing, Imperial College London, London SW7 2AZ, United Kingdom
| | - A Aldo Faisal
- Brain & Behaviour Lab, Department of Computing, Imperial College London, London SW7 2AZ, United Kingdom
- Chair in Digital Health & Data Science, University of Bayreuth, 95447 Bayreuth, Germany
| |
Collapse
|
13
|
Kumar G, Sharma N, Paul A. An extremely lightweight CNN model for the diagnosis of chest radiographs in resource-constrained environments. Med Phys 2023; 50:7568-7578. [PMID: 37665774 DOI: 10.1002/mp.16722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 06/29/2023] [Accepted: 08/06/2023] [Indexed: 09/06/2023] Open
Abstract
BACKGROUND In recent years, deep learning methods have been successfully used for chest x-ray diagnosis. However, such deep learning models often contain millions of trainable parameters and have high computation demands. As a result, providing the benefits of cutting-edge deep learning technology to areas with low computational resources would not be easy. Computationally lightweight deep learning models may potentially alleviate this problem. PURPOSE We aim to create a computationally lightweight model for the diagnosis of chest radiographs. Our model has only 0.14M parameters and 550 KB size. These make the proposed model potentially useful for deployment in resource-constrained environments. METHODS We fuse the concept of depthwise convolutions with squeeze and expand blocks to design the proposed architecture. The basic building block of our model is called Depthwise Convolution In Squeeze and Expand (DCISE) block. Using these DCISE blocks, we design an extremely lightweight convolutional neural network model (ExLNet), a computationally lightweight convolutional neural network (CNN) model for chest x-ray diagnosis. RESULTS We perform rigorous experiments on three publicly available datasets, namely, National Institutes of Health (NIH), VinBig ,and Chexpert for binary and multi-class classification tasks. We train the proposed architecture on NIH dataset and evaluate the performance on VinBig and Chexpert datasets. The proposed method outperforms several state-of-the-art approaches for both binary and multi-class classification tasks despite having a significantly less number of parameters. CONCLUSIONS We design a lightweight CNN architecture for the chest x-ray classification task by introducing ExLNet which uses a novel DCISE blocks to reduce the computational burden. We show the effectiveness of the proposed architecture through various experiments performed on publicly available datasets. The proposed architecture shows consistent performance in binary as well as multi-class classification tasks and outperforms other lightweight CNN architectures. Due to a significant reduction in the computational requirements, our method can be useful for resource-constrained clinical environment as well.
Collapse
Affiliation(s)
- Gautam Kumar
- Indian Institute of Technology Jodhpur, Jodhpur, Rajasthan, India
| | - Nirbhay Sharma
- Indian Institute of Technology Jodhpur, Jodhpur, Rajasthan, India
| | - Angshuman Paul
- Indian Institute of Technology Jodhpur, Jodhpur, Rajasthan, India
| |
Collapse
|
14
|
Xu X, Deng HH, Gateno J, Yan P. Federated Multi-Organ Segmentation With Inconsistent Labels. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:2948-2960. [PMID: 37097793 PMCID: PMC10592562 DOI: 10.1109/tmi.2023.3270140] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Federated learning is an emerging paradigm allowing large-scale decentralized learning without sharing data across different data owners, which helps address the concern of data privacy in medical image analysis. However, the requirement for label consistency across clients by the existing methods largely narrows its application scope. In practice, each clinical site may only annotate certain organs of interest with partial or no overlap with other sites. Incorporating such partially labeled data into a unified federation is an unexplored problem with clinical significance and urgency. This work tackles the challenge by using a novel federated multi-encoding U-Net (Fed-MENU) method for multi-organ segmentation. In our method, a multi-encoding U-Net (MENU-Net) is proposed to extract organ-specific features through different encoding sub-networks. Each sub-network can be seen as an expert of a specific organ and trained for that client. Moreover, to encourage the organ-specific features extracted by different sub-networks to be informative and distinctive, we regularize the training of the MENU-Net by designing an auxiliary generic decoder (AGD). Extensive experiments on six public abdominal CT datasets show that our Fed-MENU method can effectively obtain a federated learning model using the partially labeled datasets with superior performance to other models trained by either localized or centralized learning methods. Source code is publicly available at https://github.com/DIAL-RPI/Fed-MENU.
Collapse
|
15
|
Song Y, Yu L, Lei B, Choi KS, Qin J. Data Discernment for Affordable Training in Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:1431-1445. [PMID: 37015694 DOI: 10.1109/tmi.2022.3228316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Collecting sufficient high-quality training data for deep neural networks is often expensive or even unaffordable in medical image segmentation tasks. We thus propose to train the network by using external data that can be collected in a cheaper way, e.g., crowd-sourcing. We show that by data discernment, the network is able to mine valuable knowledge from external data, even though the data distribution is very different from that of the original (internal) data. We discern the external data by learning an importance weight for each of them, with the goal to enhance the contribution of informative external data to network updating, while suppressing the data that are 'useless' or even 'harmful'. An iterative algorithm that alternatively estimates the importance weight and updates the network is developed by formulating the data discernment as a constrained nonlinear programming problem. It estimates the importance weight according to the distribution discrepancy between the external data and the internal dataset, and imposes a constraint to drive the network to learn more effectively, compared with the network without using the external data. We evaluate the proposed algorithm on two tasks: abdominal CT image and cervical smear image segmentation, using totally 6 publicly available datasets. The effectiveness of the algorithm is demonstrated by extensive experiments. Source codes are available at: https://github.com/YouyiSong/Data-Discernment.
Collapse
|
16
|
Teramoto A, Shibata T, Yamada H, Hirooka Y, Saito K, Fujita H. Detection and Characterization of Gastric Cancer Using Cascade Deep Learning Model in Endoscopic Images. Diagnostics (Basel) 2022; 12:1996. [PMID: 36010346 PMCID: PMC9406996 DOI: 10.3390/diagnostics12081996] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 07/31/2022] [Accepted: 08/17/2022] [Indexed: 12/02/2022] Open
Abstract
Endoscopy is widely applied in the examination of gastric cancer. However, extensive knowledge and experience are required, owing to the need to examine the lesion while manipulating the endoscope. Various diagnostic support techniques have been reported for this examination. In our previous study, segmentation of invasive areas of gastric cancer was performed directly from endoscopic images and the detection sensitivity per case was 0.98. This method has challenges of false positives and computational costs because segmentation was applied to all healthy images that were captured during the examination. In this study, we propose a cascaded deep learning model to perform categorization of endoscopic images and identification of the invasive region to solve the above challenges. Endoscopic images are first classified as normal, showing early gastric cancer and showing advanced gastric cancer using a convolutional neural network. Segmentation on the extent of gastric cancer invasion is performed for the images classified as showing cancer using two separate U-Net models. In an experiment, 1208 endoscopic images collected from healthy subjects, 533 images collected from patients with early stage gastric cancer, and 637 images from patients with advanced gastric cancer were used for evaluation. The sensitivity and specificity of the proposed approach in the detection of gastric cancer via image classification were 97.0% and 99.4%, respectively. Furthermore, both detection sensitivity and specificity reached 100% in a case-based evaluation. The extent of invasion was also identified at an acceptable level, suggesting that the proposed method may be considered useful for the classification of endoscopic images and identification of the extent of cancer invasion.
Collapse
Affiliation(s)
- Atsushi Teramoto
- School of Medical Sciences, Fujita Health University, Toyoake 470-1192, Japan
| | - Tomoyuki Shibata
- Department of Gastroenterology and Hepatology, Fujita Health University, Toyoake 470-1192, Japan
| | - Hyuga Yamada
- Department of Gastroenterology and Hepatology, Fujita Health University, Toyoake 470-1192, Japan
| | - Yoshiki Hirooka
- Department of Gastroenterology and Hepatology, Fujita Health University, Toyoake 470-1192, Japan
| | - Kuniaki Saito
- School of Medical Sciences, Fujita Health University, Toyoake 470-1192, Japan
| | - Hiroshi Fujita
- Faculty of Engineering, Gifu University, Gifu 501-1194, Japan
| |
Collapse
|
17
|
Chen X, Wang X, Zhang K, Fung KM, Thai TC, Moore K, Mannel RS, Liu H, Zheng B, Qiu Y. Recent advances and clinical applications of deep learning in medical image analysis. Med Image Anal 2022; 79:102444. [PMID: 35472844 PMCID: PMC9156578 DOI: 10.1016/j.media.2022.102444] [Citation(s) in RCA: 275] [Impact Index Per Article: 91.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 03/09/2022] [Accepted: 04/01/2022] [Indexed: 02/07/2023]
Abstract
Deep learning has received extensive research interest in developing new medical image processing algorithms, and deep learning based models have been remarkably successful in a variety of medical imaging tasks to support disease detection and diagnosis. Despite the success, the further improvement of deep learning models in medical image analysis is majorly bottlenecked by the lack of large-sized and well-annotated datasets. In the past five years, many studies have focused on addressing this challenge. In this paper, we reviewed and summarized these recent studies to provide a comprehensive overview of applying deep learning methods in various medical image analysis tasks. Especially, we emphasize the latest progress and contributions of state-of-the-art unsupervised and semi-supervised deep learning in medical image analysis, which are summarized based on different application scenarios, including classification, segmentation, detection, and image registration. We also discuss major technical challenges and suggest possible solutions in the future research efforts.
Collapse
Affiliation(s)
- Xuxin Chen
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA
| | - Ximin Wang
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Ke Zhang
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA
| | - Kar-Ming Fung
- Department of Pathology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Theresa C Thai
- Department of Radiology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Kathleen Moore
- Department of Obstetrics and Gynecology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Robert S Mannel
- Department of Obstetrics and Gynecology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Hong Liu
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA
| | - Bin Zheng
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA
| | - Yuchen Qiu
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA.
| |
Collapse
|
18
|
Tushar FI, D’Anniballe VM, Hou R, Mazurowski MA, Fu W, Samei E, Rubin GD, Lo JY. Classification of Multiple Diseases on Body CT Scans Using Weakly Supervised Deep Learning. Radiol Artif Intell 2022; 4:e210026. [PMID: 35146433 PMCID: PMC8823458 DOI: 10.1148/ryai.210026] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 10/04/2021] [Accepted: 11/15/2021] [Indexed: 04/14/2023]
Abstract
PURPOSE To design multidisease classifiers for body CT scans for three different organ systems using automatically extracted labels from radiology text reports. MATERIALS AND METHODS This retrospective study included a total of 12 092 patients (mean age, 57 years ± 18 [standard deviation]; 6172 women) for model development and testing. Rule-based algorithms were used to extract 19 225 disease labels from 13 667 body CT scans performed between 2012 and 2017. Using a three-dimensional DenseVNet, three organ systems were segmented: lungs and pleura, liver and gallbladder, and kidneys and ureters. For each organ system, a three-dimensional convolutional neural network classified each as no apparent disease or for the presence of four common diseases, for a total of 15 different labels across all three models. Testing was performed on a subset of 2158 CT volumes relative to 2875 manually derived reference labels from 2133 patients (mean age, 58 years ± 18; 1079 women). Performance was reported as area under the receiver operating characteristic curve (AUC), with 95% CIs calculated using the DeLong method. RESULTS Manual validation of the extracted labels confirmed 91%-99% accuracy across the 15 different labels. AUCs for lungs and pleura labels were as follows: atelectasis, 0.77 (95% CI: 0.74, 0.81); nodule, 0.65 (95% CI: 0.61, 0.69); emphysema, 0.89 (95% CI: 0.86, 0.92); effusion, 0.97 (95% CI: 0.96, 0.98); and no apparent disease, 0.89 (95% CI: 0.87, 0.91). AUCs for liver and gallbladder were as follows: hepatobiliary calcification, 0.62 (95% CI: 0.56, 0.67); lesion, 0.73 (95% CI: 0.69, 0.77); dilation, 0.87 (95% CI: 0.84, 0.90); fatty, 0.89 (95% CI: 0.86, 0.92); and no apparent disease, 0.82 (95% CI: 0.78, 0.85). AUCs for kidneys and ureters were as follows: stone, 0.83 (95% CI: 0.79, 0.87); atrophy, 0.92 (95% CI: 0.89, 0.94); lesion, 0.68 (95% CI: 0.64, 0.72); cyst, 0.70 (95% CI: 0.66, 0.73); and no apparent disease, 0.79 (95% CI: 0.75, 0.83). CONCLUSION Weakly supervised deep learning models were able to classify diverse diseases in multiple organ systems from CT scans.Keywords: CT, Diagnosis/Classification/Application Domain, Semisupervised Learning, Whole-Body Imaging© RSNA, 2022.
Collapse
|
19
|
Automated Detection of Gastric Cancer by Retrospective Endoscopic Image Dataset Using U-Net R-CNN. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app112311275] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Upper gastrointestinal endoscopy is widely performed to detect early gastric cancers. As an automated detection method for early gastric cancer from endoscopic images, a method involving an object detection model, which is a deep learning technique, was proposed. However, there were challenges regarding the reduction in false positives in the detected results. In this study, we proposed a novel object detection model, U-Net R-CNN, based on a semantic segmentation technique that extracts target objects by performing a local analysis of the images. U-Net was introduced as a semantic segmentation method to detect early candidates for gastric cancer. These candidates were classified as gastric cancer cases or false positives based on box classification using a convolutional neural network. In the experiments, the detection performance was evaluated via the 5-fold cross-validation method using 1208 images of healthy subjects and 533 images of gastric cancer patients. When DenseNet169 was used as the convolutional neural network for box classification, the detection sensitivity and the number of false positives evaluated on a lesion basis were 98% and 0.01 per image, respectively, which improved the detection performance compared to the previous method. These results indicate that the proposed method will be useful for the automated detection of early gastric cancer from endoscopic images.
Collapse
|