1
|
Kang X, Ma Z, Liu K, Li Y, Miao Q. Modeling multi-scale uncertainty with evidence integration for reliable polyp segmentation. Neural Netw 2025; 189:107553. [PMID: 40409011 DOI: 10.1016/j.neunet.2025.107553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2024] [Revised: 03/26/2025] [Accepted: 04/25/2025] [Indexed: 05/25/2025]
Abstract
Polyp segmentation is critical in medical image analysis. Traditional methods, while capable of producing precise outputs in well-defined regions, often struggle with blurry or ambiguous areas in medical images, which can lead to errors in clinical decision-making. Additionally, these methods typically generate only a single deterministic segmentation result, failing to account for the inherent uncertainty in the segmentation process. This limitation undermines the reliability of segmentation models in clinical practice, as they lack the ability to provide insights into the confidence or certainty of their predictions, leaving clinicians skeptical of their utility. To address these challenges, we propose a novel multi-scale uncertainty modeling framework for polyp segmentation, grounded in evidence theory. Our approach leverages the Dirichlet distribution to classify pixels within polyp images while integrating uncertainty across different scales. We first employ an Uncertainty Region Enhancement Process (UREP) to refine uncertain regions and Integrated Balance Module (IBM) to dynamically balance the weights between different feature maps for generating semantic fusion feature maps. Subsequently, we utilize two feature extraction sub-networks to learn feature representations from original images and semantic fusion feature maps. We further develop a Multi-scale Evidence Integration Network (MEIN) to robustly model uncertainty through subjective logic, merging results from two sub-networks to ensure a comprehensive understanding of uncertainty and produce reliable segmentation results. In contrast to most existing methods, our approach not only generates segmentation results but also provides uncertainty estimates, offering clinicians valuable insights into the reliability of the predictions. Experimental results on five polyp segmentation datasets demonstrate that our proposed method remains competitive and generates effective uncertainty estimations compared to existing representative methods. The code is available at https://github.com/q1216355254/MEIN.
Collapse
Affiliation(s)
- Xiaolu Kang
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China; Xi'an Key Laboratory of Big Data and Intelligent Vision, Xi'an, 710071, Shaanxi, China
| | - Zhuoqi Ma
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China; Xi'an Key Laboratory of Big Data and Intelligent Vision, Xi'an, 710071, Shaanxi, China
| | - Kang Liu
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China; Xi'an Key Laboratory of Big Data and Intelligent Vision, Xi'an, 710071, Shaanxi, China
| | - Yunan Li
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China; Xi'an Key Laboratory of Big Data and Intelligent Vision, Xi'an, 710071, Shaanxi, China
| | - Qiguang Miao
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China; Xi'an Key Laboratory of Big Data and Intelligent Vision, Xi'an, 710071, Shaanxi, China.
| |
Collapse
|
2
|
Huang K, Zhou T, Fu H, Zhang Y, Zhou Y, Gong C, Liang D. Learnable Prompting SAM-Induced Knowledge Distillation for Semi-Supervised Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:2295-2306. [PMID: 40030924 DOI: 10.1109/tmi.2025.3530097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
The limited availability of labeled data has driven advancements in semi-supervised learning for medical image segmentation. Modern large-scale models tailored for general segmentation, such as the Segment Anything Model (SAM), have revealed robust generalization capabilities. However, applying these models directly to medical image segmentation still exposes performance degradation. In this paper, we propose a learnable prompting SAM-induced Knowledge distillation framework (KnowSAM) for semi-supervised medical image segmentation. Firstly, we propose a Multi-view Co-training (MC) strategy that employs two distinct sub-networks to employ a co-teaching paradigm, resulting in more robust outcomes. Secondly, we present a Learnable Prompt Strategy (LPS) to dynamically produce dense prompts and integrate an adapter to fine-tune SAM specifically for medical image segmentation tasks. Moreover, we propose SAM-induced Knowledge Distillation (SKD) to transfer useful knowledge from SAM to two sub-networks, enabling them to learn from SAM's predictions and alleviate the effects of incorrect pseudo-labels during training. Notably, the predictions generated by our subnets are used to produce mask prompts for SAM, facilitating effective inter-module information exchange. Extensive experimental results on various medical segmentation tasks demonstrate that our model outperforms the state-of-the-art semi-supervised segmentation approaches. Crucially, our SAM distillation framework can be seamlessly integrated into other semi-supervised segmentation methods to enhance performance. The code will be released upon acceptance of this manuscript at https://github.com/taozh2017/KnowSAM.
Collapse
|
3
|
Wang M, Xu C, Fan K. An efficient fine tuning strategy of segment anything model for polyp segmentation. Sci Rep 2025; 15:14088. [PMID: 40269089 PMCID: PMC12019216 DOI: 10.1038/s41598-025-97802-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2025] [Accepted: 04/07/2025] [Indexed: 04/25/2025] Open
Abstract
Colon cancer is a prevalent disease on a global scale, thus making its detection and prevention a critical area in the medical field. In addressing the challenges of high annotation costs and the need for improved accuracy in colon polyp detection, this study explores the segment anything model (SAM) application and fine-tuning strategies for colon polyp segmentation. Conventional full fine-tuning approaches frequently result in catastrophic forgetting, thereby compromising the model's generalization capabilities. To address this challenge, this paper proposes an efficient fine-tuning method, PSF-SAM, which mitigates catastrophic forgetting while enhancing performance in few-shot scenarios. This is achieved by freezing most SAM parameters and optimizing only specific structures. The efficacy of PSF-SAM is substantiated by experimental evaluations on the Kvasir-SEG and CVC-ClinicDB datasets, which demonstrate its superior performance in metrics such as mDice coefficients and mIoU, as well as its notable advantages in few-shot learning scenarios when compared to existing fine-tuning methods.
Collapse
Affiliation(s)
- Mingyan Wang
- Information Technology Center, Tsinghua University, Beijing, 100084, China.
| | - Cun Xu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Kefeng Fan
- China Electronics Standardization Institute, Beijing, 100007, China
| |
Collapse
|
4
|
Su C, Luo X, Li S, Chen L, Wang J. VMKLA-UNet: vision Mamba with KAN linear attention U-Net. Sci Rep 2025; 15:13258. [PMID: 40246881 PMCID: PMC12006406 DOI: 10.1038/s41598-025-97397-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Accepted: 04/04/2025] [Indexed: 04/19/2025] Open
Abstract
In the domain of medical image segmentation, while convolutional neural networks (CNNs) and Transformer-based architectures have attained notable success, they continue to face substantial challenges. CNNs are often limited in their ability to capture long-range dependencies, while Transformer models are frequently constrained by significant computational overhead. Recently, the Vision Mamba model, combined with KAN linear attention, has emerged as a highly promising alternative. In this study, we propose a novel model for medical image segmentation, termed VMKLA-UNet. The encoder of this architecture harnesses the VMamba framework, which employs a bidirectional state-space model for global visual context modeling and positional embedding, thus enabling efficient feature extraction and representation learning. For the decoder, we introduce the MKCSA architecture, which incorporates KAN linear attention-rooted in the Mamba framework-alongside a channel-spatial attention mechanism. KAN linear attention substantially mitigates computational complexity while enhancing the model's capacity to focus on salient regions of interest, thereby facilitating efficient global context comprehension. The channel attention mechanism dynamically modulates the importance of each feature channel, accentuating critical features and bolstering the model's ability to differentiate between various tissue types or lesion areas. Concurrently, the spatial attention mechanism refines the model's focus on key regions within the image, enhancing segmentation boundary accuracy and detail resolution. This synergistic integration of channel and spatial attention mechanisms augments the model's adaptability, leading to superior segmentation performance across diverse lesion types. Extensive experiments on public datasets, including Polyp, ISIC 2017, ISIC 2018, PH2, and Synapse, demonstrate that VMKLA-UNet consistently achieves high segmentation accuracy and robustness, establishing it as a highly effective solution for medical image segmentation tasks.
Collapse
Affiliation(s)
- Chenhong Su
- School of Electronic Information Engineering, China West Normal University, No. 1 Shida Road, Nanchong, 637009, Sichuan, China
- Institute of Artificial Intelligence, China West Normal University, No. 1 Shida Road, Nanchong, 637009, Sichuan, China
| | - Xuegang Luo
- School of Mathematics and Computer Science, Panzhihua University, Panzhihua, 617000, Sichuan, China
| | - Shiqing Li
- Department of Gastroenterology, The Second Clinical College of North Sichuan Medical College, Nanchong City Central Hospital, Nanchong, 637000, Sichuan, China
| | - Li Chen
- Department of Radiology, Affiliated Hospital of North Sichuan Medical College, Nanchong, 637000, Sichuan, China
| | - Juan Wang
- School of Computer Science, China West Normal University, No. 1 Shida Road, Nanchong, 637009, Sichuan, China.
- Institute of Artificial Intelligence, China West Normal University, No. 1 Shida Road, Nanchong, 637009, Sichuan, China.
| |
Collapse
|
5
|
Wang Z, Li T, Liu M, Jiang J, Liu X. DCATNet: polyp segmentation with deformable convolution and contextual-aware attention network. BMC Med Imaging 2025; 25:120. [PMID: 40229681 PMCID: PMC11998341 DOI: 10.1186/s12880-025-01661-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Accepted: 04/03/2025] [Indexed: 04/16/2025] Open
Abstract
Polyp segmentation is crucial in computer-aided diagnosis but remains challenging due to the complexity of medical images and anatomical variations. Current state-of-the-art methods struggle with accurate polyp segmentation due to the variability in size, shape, and texture. These factors make boundary detection challenging, often resulting in incomplete or inaccurate segmentation. To address these challenges, we propose DCATNet, a novel deep learning architecture specifically designed for polyp segmentation. DCATNet is a U-shaped network that combines ResNetV2-50 as an encoder for capturing local features and a Transformer for modeling long-range dependencies. It integrates three key components: the Geometry Attention Module (GAM), the Contextual Attention Gate (CAG), and the Multi-scale Feature Extraction (MSFE) block. We evaluated DCATNet on five public datasets. On Kvasir-SEG and CVC-ClinicDB, the model achieved mean dice scores of 0.9351 and 0.9444, respectively, outperforming previous state-of-the-art (SOTA) methods. Cross-validation further demonstrated its superior generalization capability. Ablation studies confirmed the effectiveness of each component in DCATNet. Integrating GAM, CAG, and MSFE effectively improves feature representation and fusion, leading to precise and reliable segmentation results. These findings underscore DCATNet's potential for clinical application and can be used for a wide range of medical image segmentation tasks.
Collapse
Affiliation(s)
- Zenan Wang
- Department of Gastroenterology, Beijing Chaoyang Hospital, The Third Clinical Medical College of Capital Medical University, Beijing, China
| | - Tianshu Li
- Department of Gastroenterology, Beijing Chaoyang Hospital, The Third Clinical Medical College of Capital Medical University, Beijing, China
| | - Ming Liu
- Hunan Key Laboratory of Nonferrous Resources and Geological Hazard Exploration, Changsha, China
| | - Jue Jiang
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York City, NY, USA
| | - Xinjuan Liu
- Department of Gastroenterology, Beijing Chaoyang Hospital, The Third Clinical Medical College of Capital Medical University, Beijing, China.
| |
Collapse
|
6
|
Chitca DD, Popescu V, Dumitrescu A, Botezatu C, Mastalier B. Advancing Colorectal Cancer Diagnostics from Barium Enema to AI-Assisted Colonoscopy. Diagnostics (Basel) 2025; 15:974. [PMID: 40310348 PMCID: PMC12026282 DOI: 10.3390/diagnostics15080974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2025] [Revised: 03/18/2025] [Accepted: 04/09/2025] [Indexed: 05/02/2025] Open
Abstract
Colorectal cancer (CRC) remains a major global health burden, necessitating continuous advancements in diagnostic methodologies. Traditional screening techniques, including barium enema and fecal occult blood tests, have been progressively replaced by more precise modalities, such as colonoscopy, liquid biopsy, and artificial intelligence (AI)-assisted imaging. Objective: This review explores the evolution of CRC diagnostic tools, from conventional imaging methods to cutting-edge AI-driven approaches, emphasizing their clinical utility, cost-effectiveness, and integration into multidisciplinary healthcare settings. Methods: A comprehensive literature search was conducted using the PubMed, Medline, and Scopus databases, selecting studies that evaluate various CRC diagnostic tools, including endoscopic advancements, liquid biopsy applications, and AI-assisted imaging techniques. Key inclusion criteria include studies on diagnostic accuracy, sensitivity, specificity, clinical outcomes, and economic feasibility. Results: AI-assisted colonoscopy has demonstrated superior adenoma detection rates (ADR), reduced interobserver variability, and enhanced real-time lesion classification, offering a cost-effective alternative to liquid biopsy, particularly in high-volume healthcare institutions. While liquid biopsy provides a non-invasive means of molecular profiling, it remains cost-intensive and requires frequent testing, making it more suitable for post-treatment surveillance and high-risk patient monitoring. Conclusions: The future of CRC diagnostics lies in a hybrid model, leveraging AI-assisted endoscopic precision with molecular insights from liquid biopsy. This integration is expected to revolutionize early detection, risk stratification, and personalized treatment approaches, ultimately improving patient outcomes and healthcare efficiency.
Collapse
Affiliation(s)
- Dumitru-Dragos Chitca
- General Surgery Clinic, Colentina Clinical Hospital, 020125 Bucharest, Romania; (V.P.); (C.B.); (B.M.)
| | - Valentin Popescu
- General Surgery Clinic, Colentina Clinical Hospital, 020125 Bucharest, Romania; (V.P.); (C.B.); (B.M.)
- General Surgery Clinic, Carol Davila University of Medicine and Pharmacy, 050474 Bucharest, Romania
| | - Anca Dumitrescu
- Family Medicine, Vitan Polyclinic, 031087 Bucharest, Romania;
| | - Cristian Botezatu
- General Surgery Clinic, Colentina Clinical Hospital, 020125 Bucharest, Romania; (V.P.); (C.B.); (B.M.)
- General Surgery Clinic, Carol Davila University of Medicine and Pharmacy, 050474 Bucharest, Romania
| | - Bogdan Mastalier
- General Surgery Clinic, Colentina Clinical Hospital, 020125 Bucharest, Romania; (V.P.); (C.B.); (B.M.)
- General Surgery Clinic, Carol Davila University of Medicine and Pharmacy, 050474 Bucharest, Romania
| |
Collapse
|
7
|
Du X, Zhang X, Chen J, Li L. Boosting polyp screening with improved point-teacher weakly semi-supervised. Comput Biol Med 2025; 191:109998. [PMID: 40198989 DOI: 10.1016/j.compbiomed.2025.109998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2024] [Revised: 02/19/2025] [Accepted: 03/04/2025] [Indexed: 04/10/2025]
Abstract
Polyps, like a silent time bomb in the gut, are always lurking and can explode into deadly colorectal cancer at any time. Many methods are attempted to maximize the early detection of colon polyps by screening, however, there are still face some challenges: (i) the scarcity of per-pixel annotation data and clinical features such as the blurred boundary and low contrast of polyps result in poor performance. (ii) existing weakly semi-supervised methods directly using pseudo-labels to supervise student tend to ignore the value brought by intermediate features in the teacher. To adapt the point-prompt teacher model to the challenging scenarios of complex medical images and limited annotation data, we creatively leverage the diverse inductive biases of CNN and Transformer to extract robust and complementary representation of polyp features (boundary and context). At the same time, a novel designed teacher-student intermediate feature distillation method is introduced rather than just using pseudo-labels to guide student learning. Comprehensive experiments demonstrate that our proposed method effectively handles scenarios with limited annotations and exhibits good segmentation performance. All code is available at https://github.com/dxqllp/WSS-Polyp.
Collapse
Affiliation(s)
- Xiuquan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, China; School of Computer Science and Technology, Anhui University, Hefei, China
| | - Xuejun Zhang
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Jiajia Chen
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Lei Li
- Department of Neurology, Shuyang Affiliated Hospital of Nanjing University of Traditional Chinese Medicine, Suqian, China.
| |
Collapse
|
8
|
Wang H, Wang KN, Hua J, Tang Y, Chen Y, Zhou GQ, Li S. Dynamic spectrum-driven hierarchical learning network for polyp segmentation. Med Image Anal 2025; 101:103449. [PMID: 39847953 DOI: 10.1016/j.media.2024.103449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 12/06/2024] [Accepted: 12/26/2024] [Indexed: 01/25/2025]
Abstract
Accurate automatic polyp segmentation in colonoscopy is crucial for the prompt prevention of colorectal cancer. However, the heterogeneous nature of polyps and differences in lighting and visibility conditions present significant challenges in achieving reliable and consistent segmentation across different cases. Therefore, this study proposes a novel dynamic spectrum-driven hierarchical learning model (DSHNet), the first to specifically leverage image frequency domain information to explore region-level salience differences among and within polyps for precise segmentation. A novel spectral decoupler is advanced to separate low-frequency and high-frequency components, leveraging their distinct characteristics to guide the model in learning valuable frequency features without bias through automatic masking. The low-frequency driven region-level saliency modeling then generates dynamic convolution kernels with individual frequency-aware features, which regulate region-level saliency modeling together with the supervision of the hierarchy of labels, thus enabling adaptation to polyp heterogeneous and illumination variation simultaneously. Meanwhile, the high-frequency attention module is designed to preserve the detailed information at the skip connections, which complements the focus on spatial features at various stages. Experimental results demonstrate that the proposed method outperforms other state-of-the-art polyp segmentation techniques, achieving robust and superior results on five diverse datasets. Codes are available at https://github.com/gardnerzhou/DSHNet.
Collapse
Affiliation(s)
- Haolin Wang
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China; Jiangsu Key Laboratory of Biomaterials and Devices, Southeast University, Nanjing, China
| | - Kai-Ni Wang
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China; Jiangsu Key Laboratory of Biomaterials and Devices, Southeast University, Nanjing, China
| | - Jie Hua
- The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Yi Tang
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China; Jiangsu Key Laboratory of Biomaterials and Devices, Southeast University, Nanjing, China
| | - Yang Chen
- Laboratory of Image Science and Technology, Southeast University, Nanjing, China; Key Laboratory of Computer Network and Information Integration, Southeast University, Nanjing, China
| | - Guang-Quan Zhou
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China; Jiangsu Key Laboratory of Biomaterials and Devices, Southeast University, Nanjing, China.
| | - Shuo Li
- Department of Computer and Data Science and Department of Biomedical Engineering, Case Western Reserve University, Cleveland, USA
| |
Collapse
|
9
|
Wang Z, Guo L, Zhao S, Zhang S, Zhao X, Fang J, Wang G, Lu H, Yu J, Tian Q. Multi-Scale Group Agent Attention-Based Graph Convolutional Decoding Networks for 2D Medical Image Segmentation. IEEE J Biomed Health Inform 2025; 29:2718-2730. [PMID: 40030822 DOI: 10.1109/jbhi.2024.3523112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Automated medical image segmentation plays a crucial role in assisting doctors in diagnosing diseases. Feature decoding is a critical yet challenging issue for medical image segmentation. To address this issue, this work proposes a novel feature decoding network, called multi-scale group agent attention-based graph convolutional decoding networks (MSGAA-GCDN), to learn local-global features in graph structures for 2D medical image segmentation. The proposed MSGAA-GCDN combines graph convolutional network (GCN) and a lightweight multi-scale group agent attention (MSGAA) mechanism to represent features globally and locally within a graph structure. Moreover, in skip connections a simple yet efficient attention-based upsampling convolution fusion (AUCF) module is designed to enhance encoder-decoder feature fusion in both channel and spatial dimensions. Extensive experiments are conducted on three typical medical image segmentation tasks, namely Synapse abdominal multi-organs, Cardiac organs, and Polyp lesions. Experimental results demonstrate that the proposed MSGAA-GCDN outperforms the state-of-the-art methods, and the designed MSGAA is a lightweight yet effective attention architecture. The proposed MSGAA-GCDN can be easily taken as a plug-and-play decoder cascaded with other encoders for general medical image segmentation tasks.
Collapse
|
10
|
Zhang Z, Jiang Y, Wang Y, Xie B, Zhang W, Li Y, Chen Z, Jin X, Zeng W. Exploring Contrastive Pre-Training for Domain Connections in Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:1686-1698. [PMID: 40030864 DOI: 10.1109/tmi.2024.3525095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Unsupervised domain adaptation (UDA) in medical image segmentation aims to improve the generalization of deep models by alleviating domain gaps caused by inconsistency across equipment, imaging protocols, and patient conditions. However, existing UDA works remain insufficiently explored and present great limitations: 1) Exhibit cumbersome designs that prioritize aligning statistical metrics and distributions, which limits the model's flexibility and generalization while also overlooking the potential knowledge embedded in unlabeled data; 2) More applicable in a certain domain, lack the generalization capability to handle diverse shifts encountered in clinical scenarios. To overcome these limitations, we introduce MedCon, a unified framework that leverages general unsupervised contrastive pre-training to establish domain connections, effectively handling diverse domain shifts without tailored adjustments. Specifically, it initially explores a general contrastive pre-training to establish domain connections by leveraging the rich prior knowledge from unlabeled images. Thereafter, the pre-trained backbone is fine-tuned using source-based images to ultimately identify per-pixel semantic categories. To capture both intra- and inter-domain connections of anatomical structures, we construct positive-negative pairs from a hybrid aspect of both local and global scales. In this regard, a shared-weight encoder-decoder is employed to generate pixel-level representations, which are then mapped into hyper-spherical space using a non-learnable projection head to facilitate positive pair matching. Comprehensive experiments on diverse medical image datasets confirm that MedCon outperforms previous methods by effectively managing a wide range of domain shifts and showcasing superior generalization capabilities.
Collapse
|
11
|
Kim Y, Keum JS, Kim JH, Chun J, Oh SI, Kim KN, Yoon YH, Park H. Real-World Colonoscopy Video Integration to Improve Artificial Intelligence Polyp Detection Performance and Reduce Manual Annotation Labor. Diagnostics (Basel) 2025; 15:901. [PMID: 40218251 PMCID: PMC11988911 DOI: 10.3390/diagnostics15070901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2025] [Revised: 03/25/2025] [Accepted: 03/31/2025] [Indexed: 04/14/2025] Open
Abstract
Background/Objectives: Artificial intelligence (AI) integration in colon polyp detection often exhibits high sensitivity but notably low specificity in real-world settings, primarily due to reliance on publicly available datasets alone. To address this limitation, we proposed a semi-automatic annotation method using real colonoscopy videos to enhance AI model performance and reduce manual labeling labor. Methods: An integrated AI model was trained and validated on 86,258 training images and 17,616 validation images. Model 1 utilized only publicly available datasets, while Model 2 additionally incorporated images obtained from real colonoscopy videos of patients through a semi-automatic annotation process, significantly reducing the labeling burden on expert endoscopists. Results: The integrated AI model (Model 2) significantly outperformed the public-dataset-only model (Model 1). At epoch 35, Model 2 achieved a sensitivity of 90.6%, a specificity of 96.0%, an overall accuracy of 94.5%, and an F1 score of 89.9%. All polyps in the test videos were successfully detected, demonstrating considerable enhancement in detection performance compared to the public-dataset-only model. Conclusions: Integrating real-world colonoscopy video data using semi-automatic annotation markedly improved diagnostic accuracy while potentially reducing the need for extensive manual annotation typically performed by expert endoscopists. However, the findings need validation through multicenter external datasets to ensure generalizability.
Collapse
Affiliation(s)
- Yuna Kim
- Department of Internal Medicine, Division of Gastroenterology, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul 06273, Republic of Korea; (Y.K.)
| | - Ji-Soo Keum
- Waycen Inc., Seoul 06167, Republic of Korea; (J.-S.K.)
| | - Jie-Hyun Kim
- Department of Internal Medicine, Division of Gastroenterology, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul 06273, Republic of Korea; (Y.K.)
| | - Jaeyoung Chun
- Department of Internal Medicine, Division of Gastroenterology, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul 06273, Republic of Korea; (Y.K.)
| | - Sang-Il Oh
- Waycen Inc., Seoul 06167, Republic of Korea; (J.-S.K.)
| | - Kyung-Nam Kim
- Waycen Inc., Seoul 06167, Republic of Korea; (J.-S.K.)
| | - Young-Hoon Yoon
- Department of Internal Medicine, Division of Gastroenterology, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul 06273, Republic of Korea; (Y.K.)
| | - Hyojin Park
- Department of Internal Medicine, Division of Gastroenterology, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul 06273, Republic of Korea; (Y.K.)
| |
Collapse
|
12
|
Zhu M, Mao A, Liu J, Yuan Y. DEeR: Deviation Eliminating and Noise Regulating for Privacy-Preserving Federated Low-Rank Adaptation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:1783-1795. [PMID: 40030741 DOI: 10.1109/tmi.2024.3518539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Integrating low-rank adaptation (LoRA) with federated learning (FL) has received widespread attention recently, aiming to adapt pretrained foundation models (FMs) to downstream medical tasks via privacy-preserving decentralized training. However, owing to the direct combination of LoRA and FL, current methods generally undergo two problems, i.e., aggregation deviation, and differential privacy (DP) noise amplification effect. To address these problems, we propose a novel privacy-preserving federated finetuning framework called Deviation Eliminating and Noise Regulating (DEeR). Specifically, we firstly theoretically prove that the necessary condition to eliminate aggregation deviation is guaranteeing the equivalence between LoRA parameters of clients. Based on the theoretical insight, a deviation eliminator is designed to utilize alternating minimization algorithm to iteratively optimize the zero-initialized and non-zero-initialized parameter matrices of LoRA, ensuring that aggregation deviation always be zeros during training. Furthermore, we also conduct an in-depth analysis of the noise amplification effect and find that this problem is mainly caused by the "linear relationship" between DP noise and LoRA parameters. To suppress the noise amplification effect, we propose a noise regulator that exploits two regulator factors to decouple relationship between DP and LoRA, thereby achieving robust privacy protection and excellent finetuning performance. Additionally, we perform comprehensive ablated experiments to verify the effectiveness of the deviation eliminator and noise regulator. DEeR shows better performance on public medical datasets in comparison with state-of-the-art approaches. The code is available at https://github.com/CUHK-AIM-Group/DEeR.
Collapse
|
13
|
Zhang Z, Li Y, Shin BS. Enhancing generalization of medical image segmentation via game theory-based domain selection. J Biomed Inform 2025; 164:104802. [PMID: 40049504 DOI: 10.1016/j.jbi.2025.104802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 01/08/2025] [Accepted: 02/03/2025] [Indexed: 03/17/2025]
Abstract
Medical image segmentation models often fail to generalize well to new datasets due to substantial variability in imaging conditions, anatomical differences, and patient demographics. Conventional domain generalization (DG) methods focus on learning domain-agnostic features but often overlook the importance of maintaining performance balance across different domains, leading to suboptimal results. To address these issues, we propose a novel approach using game theory to model the training process as a zero-sum game, aiming for a Nash equilibrium to enhance adaptability and robustness against domain shifts. Specifically, our adaptive domain selection method, guided by the Beta distribution and optimized via reinforcement learning, dynamically adjusts to the variability across different domains, thus improving model generalization. We conducted extensive experiments on benchmark datasets for polyp segmentation, optic cup/optic disc (OC/OD) segmentation, and prostate segmentation. Our method achieved an average Dice score improvement of 1.75% compared with other methods, demonstrating the effectiveness of our approach in enhancing the generalization performance of medical image segmentation models.
Collapse
Affiliation(s)
- Zuyu Zhang
- Key Laboratory of Big Data Intelligent Computing, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Yan Li
- Department of Electrical and Computer Engineering, Inha University, Incheon, 22212, Republic of Korea.
| | - Byeong-Seok Shin
- Department of Electrical and Computer Engineering, Inha University, Incheon, 22212, Republic of Korea.
| |
Collapse
|
14
|
Peng L, Liu W, Xie S, Ye L, Ye P, Xiao F, Bian L. Uncertainty-Driven Parallel Transformer-Based Segmentation for Oral Disease Dataset. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:1632-1644. [PMID: 40036515 DOI: 10.1109/tip.2025.3544139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Accurate oral disease segmentation is a challenging task, for three major reasons: 1) The same type of oral disease has a diversity of size, color and texture; 2) The boundary between oral lesions and their surrounding mucosa is not sharp; 3) There is a lack of public large-scale oral disease segmentation datasets. To address these issues, we first report an oral disease segmentation network termed Oralformer, which enables to tackle multiple oral diseases. Specifically, we use a parallel design to combine local-window self-attention (LWSA) with channel-wise convolution (CWC), modeling cross-window connections to enlarge the receptive fields while maintaining linear complexity. Meanwhile, we connect these two branches with bi-directional interactions to form a basic parallel Transformer block namely LC-block. We insert the LC-block as the main building block in a U-shape encoder-decoder architecture to form Oralformer. Second, we introduce an uncertainty-driven self-adaptive loss function which can reinforce the network's attention on the lesion's edge regions that are easily confused, thus improving the segmentation accuracy of these regions. Third, we construct a large-scale oral disease segmentation (ODS) dataset containing 2602 image pairs. It covers three common oral diseases (including dental plaque, calculus and caries) and all age groups, which we hope will advance the field. Extensive experiments on six challenging datasets show that our Oralformer achieves state-of-the-art segmentation accuracy, and presents advantages in terms of generalizability and real-time segmentation efficiency (35fps). The code and ODS dataset will be publicly available at https://github.com/LintaoPeng/Oralformer.
Collapse
|
15
|
Li G, Wang J, Wei J, Xu Z. IRFNet: Cognitive-Inspired Iterative Refinement Fusion Network for Camouflaged Object Detection. SENSORS (BASEL, SWITZERLAND) 2025; 25:1555. [PMID: 40096411 PMCID: PMC11902440 DOI: 10.3390/s25051555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2025] [Revised: 02/27/2025] [Accepted: 02/27/2025] [Indexed: 03/19/2025]
Abstract
Camouflaged Object Detection (COD) aims to identify objects that are intentionally concealed within their surroundings through appearance, texture, or pattern adaptations. Despite recent advances, extreme object-background similarity causes existing methods struggle with accurately capturing discriminative features and effectively modeling multiscale patterns while preserving fine details. To address these challenges, we propose Iterative Refinement Fusion Network (IRFNet), a novel framework that mimics human visual cognition through progressive feature enhancement and iterative optimization. Our approach incorporates the following: (1) a Hierarchical Feature Enhancement Module (HFEM) coupled with a dynamic channel-spatial attention mechanism, which enriches multiscale feature representations through bilateral and trilateral fusion pathways; and (2) a Context-guided Iterative Optimization Framework (CIOF) that combines transformer-based global context modeling with iterative refinement through dual-branch supervision. Extensive experiments on three challenging benchmark datasets (CAMO, COD10K, and NC4K) demonstrate that IRFNet consistently outperforms fourteen state-of-the-art methods, achieving improvements of 0.9-13.7% across key metrics. Comprehensive ablation studies validate the effectiveness of each proposed component and demonstrate how our iterative refinement strategy enables progressive improvement in detection accuracy.
Collapse
Affiliation(s)
- Guohan Li
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China; (G.L.); (J.W.)
- School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jingxin Wang
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China; (G.L.); (J.W.)
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Jianming Wei
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China; (G.L.); (J.W.)
| | - Zhengyi Xu
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China; (G.L.); (J.W.)
| |
Collapse
|
16
|
Waheed Z, Gui J, Heyat MBB, Parveen S, Hayat MAB, Iqbal MS, Aya Z, Nawabi AK, Sawan M. A novel lightweight deep learning based approaches for the automatic diagnosis of gastrointestinal disease using image processing and knowledge distillation techniques. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 260:108579. [PMID: 39798279 DOI: 10.1016/j.cmpb.2024.108579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 12/16/2024] [Accepted: 12/29/2024] [Indexed: 01/15/2025]
Abstract
BACKGROUND Gastrointestinal (GI) diseases pose significant challenges for healthcare systems, largely due to the complexities involved in their detection and treatment. Despite the advancements in deep neural networks, their high computational demands hinder their practical use in clinical environments. OBJECTIVE This study aims to address the computational inefficiencies of deep neural networks by proposing a lightweight model that integrates model compression techniques, ConvLSTM layers, and ConvNext Blocks, all optimized through Knowledge Distillation (KD). METHODS A dataset of 6000 endoscopic images of various GI diseases was utilized. Advanced image preprocessing techniques, including adaptive noise reduction and image detail enhancement, were employed to improve accuracy and interpretability. The model's performance was assessed in terms of accuracy, computational cost, and disk space usage. RESULTS The proposed lightweight model achieved an exceptional overall accuracy of 99.38 %. It operates efficiently with a computational cost of 0.61 GFLOPs and occupies only 3.09 MB of disk space. Additionally, Grad-CAM visualizations demonstrated enhanced model saliency and interpretability, offering insights into the decision-making process of the model post-KD. CONCLUSION The proposed model represents a significant advancement in the diagnosis of GI diseases. It provides a cost-effective and efficient alternative to traditional deep neural network methods, overcoming their computational limitations and contributing valuable insights for improved clinical application.
Collapse
Affiliation(s)
- Zafran Waheed
- School of Computer Science and Engineering, Central South University, China.
| | - Jinsong Gui
- School of Electronic Information, Central South University, China.
| | - Md Belal Bin Heyat
- CenBRAIN Neurotech Center of Excellence, School of Engineering, Westlake University, Zhejiang, Hangzhou, China.
| | - Saba Parveen
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Mohd Ammar Bin Hayat
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, China
| | - Muhammad Shahid Iqbal
- Department of Computer Science and Information Technology, Women University of Azad Jammu & Kashmir, Pakistan
| | - Zouheir Aya
- College of Mechanical Engineering, Changsha University of Science and Technology, Changsha, Hunan, China
| | - Awais Khan Nawabi
- Department of Electronics, Computer science and Electrical Engineering, University of Pavia, Italy
| | - Mohamad Sawan
- CenBRAIN Neurotech Center of Excellence, School of Engineering, Westlake University, Zhejiang, Hangzhou, China
| |
Collapse
|
17
|
Elamin S, Johri S, Rajpurkar P, Geisler E, Berzin TM. From data to artificial intelligence: evaluating the readiness of gastrointestinal endoscopy datasets. J Can Assoc Gastroenterol 2025; 8:S81-S86. [PMID: 39990508 PMCID: PMC11842897 DOI: 10.1093/jcag/gwae041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/25/2025] Open
Abstract
The incorporation of artificial intelligence (AI) into gastrointestinal (GI) endoscopy represents a promising advancement in gastroenterology. With over 40 published randomized controlled trials and numerous ongoing clinical trials, gastroenterology leads other medical disciplines in AI research. Computer-aided detection algorithms for identifying colorectal polyps have achieved regulatory approval and are in routine clinical use, while other AI applications for GI endoscopy are in advanced development stages. Near-term opportunities include the potential for computer-aided diagnosis to replace conventional histopathology for diagnosing small colon polyps and increased AI automation in capsule endoscopy. Despite significant development in research settings, the generalizability and robustness of AI models in real clinical practice remain inconsistent. The GI field lags behind other medical disciplines in the breadth of novel AI algorithms, with only 13 out of 882 Food and Drug Administration (FDA)-approved AI models focussed on GI endoscopy as of June 2024. Additionally, existing GI endoscopy image databases are disproportionately focussed on colon polyps, lacking representation of the diversity of other endoscopic findings. High-quality datasets, encompassing a wide range of patient demographics, endoscopic equipment types, and disease states, are crucial for developing effective AI models for GI endoscopy. This article reviews the current state of GI endoscopy datasets, barriers to progress, including dataset size, data diversity, annotation quality, and ethical issues in data collection and usage, and future needs for advancing AI in GI endoscopy.
Collapse
Affiliation(s)
- Sami Elamin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Shreya Johri
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Pranav Rajpurkar
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Enrik Geisler
- Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02115, USA
| | - Tyler M Berzin
- Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
18
|
Pinto L, Figueiredo IN, Figueiredo PN. Reducing reading time and assessing disease in capsule endoscopy videos: A deep learning approach. Int J Med Inform 2025; 195:105792. [PMID: 39817978 DOI: 10.1016/j.ijmedinf.2025.105792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/03/2024] [Accepted: 01/09/2025] [Indexed: 01/18/2025]
Abstract
BACKGROUND The wireless capsule endoscope (CE) is a valuable diagnostic tool in gastroenterology, offering a safe and minimally invasive visualization of the gastrointestinal tract. One of the few drawbacks identified by the gastroenterology community is the time-consuming task of analyzing CE videos. OBJECTIVES This article investigates the feasibility of a computer-aided diagnostic method to speed up CE video analysis. We aim to generate a significantly smaller CE video with all the anomalies (i.e., diseases) identified by the medical doctors in the original video. METHODS The summarized video consists of the original video frames classified as anomalous by a pre-trained convolutional neural network (CNN). We evaluate our approach on a testing dataset with eight CE videos captured with five CE types and displaying multiple anomalies. RESULTS On average, the summarized videos contain 93.33% of the anomalies identified in the original videos. The average playback time of the summarized videos is just 10 min, compared to 58 min for the original videos. CONCLUSION Our findings demonstrate the potential of deep learning-aided diagnostic methods to accelerate CE video analysis.
Collapse
Affiliation(s)
- Luís Pinto
- University of Coimbra, CMUC, Department of Mathematics, Coimbra, Portugal.
| | | | - Pedro N Figueiredo
- University of Coimbra, Faculty of Medicine, Coimbra, Portugal; Department of Gastroenterology, Centro Hospitalar e Universitário de Coimbra, Coimbra, Portugal.
| |
Collapse
|
19
|
Ke X, Chen G, Liu H, Guo W. MEFA-Net: A mask enhanced feature aggregation network for polyp segmentation. Comput Biol Med 2025; 186:109601. [PMID: 39740513 DOI: 10.1016/j.compbiomed.2024.109601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 11/30/2024] [Accepted: 12/18/2024] [Indexed: 01/02/2025]
Abstract
Accurate polyp segmentation is crucial for early diagnosis and treatment of colorectal cancer. This is a challenging task for three main reasons: (i) the problem of model overfitting and weak generalization due to the multi-center distribution of data; (ii) the problem of interclass ambiguity caused by motion blur and overexposure to endoscopic light; and (iii) the problem of intraclass inconsistency caused by the variety of morphologies and sizes of the same type of polyps. To address these challenges, we propose a new high-precision polyp segmentation framework, MEFA-Net, which consists of three modules, including the plug-and-play Mask Enhancement Module (MEG), Separable Path Attention Enhancement Module (SPAE), and Dynamic Global Attention Pool Module (DGAP). Specifically, firstly, the MEG module regionally masks the high-energy regions of the environment and polyps through a mask, which guides the model to rely on only a small amount of information to distinguish between polyps and background features, avoiding the model from overfitting the environmental information, and improving the robustness of the model. At the same time, this module can effectively counteract the "dark corner phenomenon" in the dataset and further improve the generalization performance of the model. Next, the SPAE module can effectively alleviate the inter-class fuzzy problem by strengthening the feature expression. Then, the DGAP module solves the intra-class inconsistency problem by extracting the invariance of scale, shape and position. Finally, we propose a new evaluation metric, MultiColoScore, for comprehensively evaluating the segmentation performance of the model on five datasets with different domains. We evaluated the new method quantitatively and qualitatively on five datasets using four metrics. Experimental results show that MEFA-Net significantly improves the accuracy of polyp segmentation and outperforms current state-of-the-art algorithms. Code posted on https://github.com/847001315/MEFA-Net.
Collapse
Affiliation(s)
- Xiao Ke
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China
| | - Guanhong Chen
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China
| | - Hao Liu
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China
| | - Wenzhong Guo
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China.
| |
Collapse
|
20
|
He D, Liu Z, Yin X, Liu H, Gao W, Fu Y. Synthesized colonoscopy dataset from high-fidelity virtual colon with abnormal simulation. Comput Biol Med 2025; 186:109672. [PMID: 39826299 DOI: 10.1016/j.compbiomed.2025.109672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2024] [Revised: 01/06/2025] [Accepted: 01/08/2025] [Indexed: 01/22/2025]
Abstract
With the advent of the deep learning-based colonoscopy system, the need for a vast amount of high-quality colonoscopy image datasets for training is crucial. However, the generalization ability of deep learning models is challenged by the limited availability of colonoscopy images due to regulatory restrictions and privacy concerns. In this paper, we propose a method for rendering high-fidelity 3D colon models and synthesizing diversified colonoscopy images with abnormalities such as polyps, bleeding, and ulcers, which can be used to train deep learning models. The geometric model of the colon is derived from CT images. We employed dedicated surface mesh deformation to mimic the shapes of polyps and ulcers and applied texture mapping techniques to generate realistic, lifelike appearances. The generated polyp models were then attached to the inner surface of the colon model, while the ulcers were created directly on the inner surface of the colon model. To realistically model blood behavior, we developed a simulation of the blood diffusion process on the colon's inner surface and colored vertices in the traversed region to reflect blood flow. Ultimately, we generated a comprehensive dataset comprising high-fidelity rendered colonoscopy images with the abnormalities. To validate the effectiveness of the synthesized colonoscopy dataset, we trained state-of-the-art deep learning models on it and other publicly available datasets and assessed the performance of these models in abnormal classification, detection, and segmentation. Notably, the models trained on the synthesized dataset exhibit an enhanced performance in the aforementioned tasks, as evident from the results.
Collapse
Affiliation(s)
- Dongdong He
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150080, China
| | - Ziteng Liu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150080, China
| | - Xunhai Yin
- Department of Gastroenterology, The First Affiliated Hospital of Harbin Medical University, Harbin, 150001, China
| | - Hao Liu
- State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, 110016, China
| | - Wenpeng Gao
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150080, China; State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, 150080, China.
| | - Yili Fu
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, 150080, China.
| |
Collapse
|
21
|
Lin L, Liu Y, Wu J, Cheng P, Cai Z, Wong KKY, Tang X. FedLPPA: Learning Personalized Prompt and Aggregation for Federated Weakly-Supervised Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:1127-1139. [PMID: 39423080 DOI: 10.1109/tmi.2024.3483221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2024]
Abstract
Federated learning (FL) effectively mitigates the data silo challenge brought about by policies and privacy concerns, implicitly harnessing more data for deep model training. However, traditional centralized FL models grapple with diverse multi-center data, especially in the face of significant data heterogeneity, notably in medical contexts. In the realm of medical image segmentation, the growing imperative to curtail annotation costs has amplified the importance of weakly-supervised techniques which utilize sparse annotations such as points, scribbles, etc. A pragmatic FL paradigm shall accommodate diverse annotation formats across different sites, which research topic remains under-investigated. In such context, we propose a novel personalized FL framework with learnable prompt and aggregation (FedLPPA) to uniformly leverage heterogeneous weak supervision for medical image segmentation. In FedLPPA, a learnable universal knowledge prompt is maintained, complemented by multiple learnable personalized data distribution prompts and prompts representing the supervision sparsity. Integrated with sample features through a dual-attention mechanism, those prompts empower each local task decoder to adeptly adjust to both the local distribution and the supervision form. Concurrently, a dual-decoder strategy, predicated on prompt similarity, is introduced for enhancing the generation of pseudo-labels in weakly-supervised learning, alleviating overfitting and noise accumulation inherent to local data, while an adaptable aggregation method is employed to customize the task decoder on a parameter-wise basis. Extensive experiments on four distinct medical image segmentation tasks involving different modalities underscore the superiority of FedLPPA, with its efficacy closely parallels that of fully supervised centralized training. Our code and data will be available at https://github.com/llmir/FedLPPA.
Collapse
|
22
|
Ovi TB, Bashree N, Nyeem H, Wahed MA. FocusU 2Net: Pioneering dual attention with gated U-Net for colonoscopic polyp segmentation. Comput Biol Med 2025; 186:109617. [PMID: 39793349 DOI: 10.1016/j.compbiomed.2024.109617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 12/09/2024] [Accepted: 12/22/2024] [Indexed: 01/13/2025]
Abstract
The detection and excision of colorectal polyps, precursors to colorectal cancer (CRC), can improve survival rates by up to 90%. Automated polyp segmentation in colonoscopy images expedites diagnosis and aids in the precise identification of adenomatous polyps, thus mitigating the burden of manual image analysis. This study introduces FocusU2Net, an innovative bi-level nested U-structure integrated with a dual-attention mechanism. The model integrates Focus Gate (FG) modules for spatial and channel-wise attention and Residual U-blocks (RSU) with multi-scale receptive fields for capturing diverse contextual information. Comprehensive evaluations on five benchmark datasets - Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, ETISLarib, and EndoScene - demonstrate Dice score improvements of 3.14% to 43.59% over state-of-the-art models, with an 85% success rate in cross-dataset validations, significantly surpassing prior competing models with sub-5% success rates. The model combines high segmentation accuracy with computational efficiency, featuring 46.64 million parameters, 78.09 GFLOPs, and 39.02 GMacs, making it suitable for real-time applications. Enhanced with Explainable AI techniques, FocusU2Net provides clear insights into its decision-making process, improving interpretability. This combination of high performance, efficiency, and transparency positions FocusU2Net as a powerful, scalable solution for automated polyp segmentation in clinical practice, advancing medical image analysis and computer-aided diagnosis.
Collapse
Affiliation(s)
- Tareque Bashar Ovi
- Department of EECE, Military Institute of Science and Technology (MIST), Mirpur Cantonment, Dhaka, 1216, Bangladesh.
| | - Nomaiya Bashree
- Department of EECE, Military Institute of Science and Technology (MIST), Mirpur Cantonment, Dhaka, 1216, Bangladesh.
| | - Hussain Nyeem
- Department of EECE, Military Institute of Science and Technology (MIST), Mirpur Cantonment, Dhaka, 1216, Bangladesh.
| | - Md Abdul Wahed
- Department of EECE, Military Institute of Science and Technology (MIST), Mirpur Cantonment, Dhaka, 1216, Bangladesh.
| |
Collapse
|
23
|
Jin K, Chu X, Qian J. Arginine and colorectal cancer: Exploring arginine-related therapeutic strategies and novel insights into cancer immunotherapies. Int Immunopharmacol 2025; 148:114146. [PMID: 39879835 DOI: 10.1016/j.intimp.2025.114146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 01/02/2025] [Accepted: 01/21/2025] [Indexed: 01/31/2025]
Abstract
Concerning the progression of societies and the evolution of lifestyle and dietary habits, the potential for the development of human malignancies, particularly colorectal cancer (CRC), has markedly escalated, positioning it as one of the most prevalent and lethal forms of cancer globally. Empirical evidence indicates that the metabolic processes of cancerous and healthy cells can significantly impact immune responses and the fate of tumors. Arginine, a multifaceted amino acid, assumes a crucial and paradoxical role in various metabolic pathways, as certain tumors exhibit arginine auxotrophy while others do not. Notably, CRC is classified as arginine non-auxotrophic, possessing the ability to synthesize arginine from citrulline. Systemic arginine deprivation and the inhibition of arginine uptake represent two prevalent therapeutic strategies in oncological treatment. However, given the divergent behaviors of tumors concerning the metabolism and synthesis of arginine, one of these therapeutic approaches-namely systemic arginine deprivation-does not apply to CRC. This review elucidates the characteristics of arginine uptake inhibition and systemic arginine deprivation alongside their respective benefits and limitations in CRC. Furthermore, the involvement of arginine in immunotherapeutic strategies is examined in light of the most recent discoveries on various human malignancies.
Collapse
Affiliation(s)
- Ketao Jin
- Department of Colorectal and Anal Surgery, The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, Zhejiang 310003, China.
| | - Xiufeng Chu
- Department of General Surgery, Shaoxing Central Hospital, Shaoxing, Zhejiang 312030, China
| | - Jun Qian
- Department of Colorectal Surgery, Xinchang People's Hospital, Affiliated Xinchang Hospital, Wenzhou Medical University, Xinchang, Zhejiang 312500, China.
| |
Collapse
|
24
|
Li W, Zhang Y, Zhou H, Yang W, Xie Z, He Y. CLMS: Bridging domain gaps in medical imaging segmentation with source-free continual learning for robust knowledge transfer and adaptation. Med Image Anal 2025; 100:103404. [PMID: 39616943 DOI: 10.1016/j.media.2024.103404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 10/01/2024] [Accepted: 11/19/2024] [Indexed: 12/16/2024]
Abstract
Deep learning shows promise for medical image segmentation but suffers performance declines when applied to diverse healthcare sites due to data discrepancies among the different sites. Translating deep learning models to new clinical environments is challenging, especially when the original source data used for training is unavailable due to privacy restrictions. Source-free domain adaptation (SFDA) aims to adapt models to new unlabeled target domains without requiring access to the original source data. However, existing SFDA methods face challenges such as error propagation, misalignment of visual and structural features, and inability to preserve source knowledge. This paper introduces Continual Learning Multi-Scale domain adaptation (CLMS), an end-to-end SFDA framework integrating multi-scale reconstruction, continual learning, and style alignment to bridge domain gaps across medical sites using only unlabeled target data or publicly available data. Compared to the current state-of-the-art methods, CLMS consistently and significantly achieved top performance for different tasks, including prostate MRI segmentation (improved Dice of 10.87 %), colonoscopy polyp segmentation (improved Dice of 17.73 %), and plus disease classification from retinal images (improved AUC of 11.19 %). Crucially, CLMS preserved source knowledge for all the tasks, avoiding catastrophic forgetting. CLMS demonstrates a promising solution for translating deep learning models to new clinical imaging domains towards safe, reliable deployment across diverse healthcare settings.
Collapse
Affiliation(s)
- Weilu Li
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yun Zhang
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Hao Zhou
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Wenhan Yang
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
| | - Yao He
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
25
|
Mao X, Li H, Li X, Bai C, Ming W. C 2E-Net: Cascade attention and context-aware cross-level fusion network via edge learning guidance for polyp segmentation. Comput Biol Med 2025; 185:108770. [PMID: 39653624 DOI: 10.1016/j.compbiomed.2024.108770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 05/15/2024] [Accepted: 06/15/2024] [Indexed: 01/26/2025]
Abstract
Colorectal polyps are one of the most direct causes of colorectal cancer. Polypectomy can effectively block the process of colorectal cancer, but accurate polyp segmentation methods are required as an auxiliary means. However, there are several challenges associated with achieving accurate polyp segmentation, such as the large semantic gap between the encoder and decoder, the incomplete edges, and the potential confusion between folds in uncertain areas and target objects. To address the aforementioned challenges, an advanced polyp segmentation network (C2E-Net) is proposed, leveraging a cascaded attention mechanism and context-aware cross-level fusion guided by edge learning. Firstly, a cascade attention (CA) module is proposed to capture local feature details and increase the receptive field by setting different dilation rates in different convolutional layers, and combines the criss-cross attention mechanism for bridging the semantic gap between codecs. Subsequently, an edge learning guidance (ELG) module is designed that employs parallel axial attention operations to capture complementary edge information with sufficient detail to enrich feature details and edge features. Ultimately, to effectively integrate cross-level features and obtain rich global contextual information, a context-aware cross-level fusion (CCF) module is introduced through a multi-scale channel attention mechanism to minimize potential confusion between folds in uncertain areas and target objects. A plethora of experimental results has shown that C2E-Net is superior over the state-of-the-art methods, with average Dice coefficients on five polyp datasets of 94.54 %, 92.23 %, 82.24 %, 79.53 % and 89.84 %.
Collapse
Affiliation(s)
- Xu Mao
- School of Information, Yunnan University, Kunming, 650504, China
| | - Haiyan Li
- School of Information, Yunnan University, Kunming, 650504, China.
| | - Xiangxian Li
- School of Software, Shandong University, Jinan, 250101, China
| | - Chongbin Bai
- Otolaryngology Department, Honghe Prefecture Second People's Hospital, Jianshui, 654300, China
| | - Wenjun Ming
- The Primary School Affiliated to Yunnan University, Kunming, 650000, China
| |
Collapse
|
26
|
Chu J, Liu W, Tian Q, Lu W. PFPRNet: A Phase-Wise Feature Pyramid With Retention Network for Polyp Segmentation. IEEE J Biomed Health Inform 2025; 29:1137-1150. [PMID: 40030242 DOI: 10.1109/jbhi.2024.3500026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Abstract
Early detection of colonic polyps is crucial for the prevention and diagnosis of colorectal cancer. Currently, deep learning-based polyp segmentation methods have become mainstream and achieved remarkable results. Acquiring a large number of labeled data is time-consuming and labor-intensive, and meanwhile the presence of numerous similar wrinkles in polyp images also hampers model prediction performance. In this paper, we propose a novel approach called Phase-wise Feature Pyramid with Retention Network (PFPRNet), which leverages a pre-trained Transformer-based Encoder to obtain multi-scale feature maps. A Phase-wise Feature Pyramid with Retention Decoder is designed to gradually integrate global features into local features and guide the model's attention towards key regions. Additionally, our custom Enhance Perception module enables capturing image information from a broader perspective. Finally, we introduce an innovative Low-layer Retention module as an alternative to Transformer for more efficient global attention modeling. Evaluation results on several widely-used polyp segmentation datasets demonstrate that our proposed method has strong learning ability and generalization capability, and outperforms the state-of-the-art approaches.
Collapse
|
27
|
Du Y, Jiang Y, Tan S, Liu SQ, Li Z, Li G, Wan X. Highlighted Diffusion Model as Plug-In Priors for Polyp Segmentation. IEEE J Biomed Health Inform 2025; 29:1209-1220. [PMID: 39446534 DOI: 10.1109/jbhi.2024.3485767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2024]
Abstract
Automated polyp segmentation from colonoscopy images is crucial for colorectal cancer diagnosis. The accuracy of such segmentation, however, is challenged by two main factors. First, the variability in polyps' size, shape, and color, coupled with the scarcity of well-annotated data due to the need for specialized manual annotation, hampers the efficacy of existing deep learning methods. Second, concealed polyps often blend with adjacent intestinal tissues, leading to poor contrast that challenges segmentation models. Recently, diffusion models have been explored and adapted for polyp segmentation tasks. However, the significant domain gap between RGB-colonoscopy images and grayscale segmentation masks, along with the low efficiency of the diffusion generation process, hinders the practical implementation of these models. To mitigate these challenges, we introduce the Highlighted Diffusion Model Plus (HDM+), a two-stage polyp segmentation framework. This framework incorporates the Highlighted Diffusion Model (HDM) to provide explicit semantic guidance, thereby enhancing segmentation accuracy. In the initial stage, the HDM is trained using highlighted ground-truth data, which emphasizes polyp regions while suppressing the background in the images. This approach reduces the domain gap by focusing on the image itself rather than on the segmentation mask. In the subsequent second stage, we employ the highlighted features from the trained HDM's U-Net model as plug-in priors for polyp segmentation, rather than generating highlighted images, thereby increasing efficiency. Extensive experiments conducted on six polyp segmentation benchmarks demonstrate the effectiveness of our approach.
Collapse
|
28
|
Liu J, Shi Y, Huang D, Qu J. Neural Radiance Fields for High-Fidelity Soft Tissue Reconstruction in Endoscopy. SENSORS (BASEL, SWITZERLAND) 2025; 25:565. [PMID: 39860938 PMCID: PMC11769054 DOI: 10.3390/s25020565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 01/04/2025] [Accepted: 01/10/2025] [Indexed: 01/27/2025]
Abstract
The advancement of neural radiance fields (NeRFs) has facilitated the high-quality 3D reconstruction of complex scenes. However, for most NeRFs, reconstructing 3D tissues from endoscopy images poses significant challenges due to the occlusion of soft tissue regions by invalid pixels, deformations in soft tissue, and poor image quality, which severely limits their application in endoscopic scenarios. To address the above issues, we propose a novel framework to reconstruct high-fidelity soft tissue scenes from low-quality endoscopic images. We first construct an EndoTissue dataset of soft tissue regions in endoscopic images and fine-tune the Segment Anything Model (SAM) based on EndoTissue to obtain a potent segmentation network. Given a sequence of monocular endoscopic images, this segmentation network can quickly obtain the tissue mask images. Additionally, we incorporate tissue masks into a dynamic scene reconstruction method called Tensor4D to effectively guide the reconstruction of 3D deformable soft tissues. Finally, we propose adopting the image enhancement model EDAU-Net to improve the quality of the rendered views. The experimental results show that our method can effectively focus on the soft tissue regions in the image, achieving higher fidelity in detail and geometric structural integrity in reconstruction compared to state-of-the-art algorithms. Feedback from the user study indicates high participant scores for our method.
Collapse
Affiliation(s)
- Jinhua Liu
- Shanghai Film Academy, Shanghai University, Shanghai 200072, China; (J.L.); (D.H.); (J.Q.)
| | - Yongsheng Shi
- Shanghai Film Academy, Shanghai University, Shanghai 200072, China; (J.L.); (D.H.); (J.Q.)
| | - Dongjin Huang
- Shanghai Film Academy, Shanghai University, Shanghai 200072, China; (J.L.); (D.H.); (J.Q.)
- Shanghai Engineering Research Center of Motion Picture Special Effects, Shanghai 200072, China
| | - Jiantao Qu
- Shanghai Film Academy, Shanghai University, Shanghai 200072, China; (J.L.); (D.H.); (J.Q.)
| |
Collapse
|
29
|
Oukdach Y, Garbaz A, Kerkaou Z, Ansari ME, Koutti L, Ouafdi AFE, Salihoun M. InCoLoTransNet: An Involution-Convolution and Locality Attention-Aware Transformer for Precise Colorectal Polyp Segmentation in GI Images. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01389-7. [PMID: 39825142 DOI: 10.1007/s10278-025-01389-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 12/18/2024] [Accepted: 12/19/2024] [Indexed: 01/20/2025]
Abstract
Gastrointestinal (GI) disease examination presents significant challenges to doctors due to the intricate structure of the human digestive system. Colonoscopy and wireless capsule endoscopy are the most commonly used tools for GI examination. However, the large amount of data generated by these technologies requires the expertise and intervention of doctors for disease identification, making manual analysis a very time-consuming task. Thus, the development of a computer-assisted system is highly desirable to assist clinical professionals in making decisions in a low-cost and effective way. In this paper, we introduce a novel framework called InCoLoTransNet, designed for polyp segmentation. The study is based on a transformer and convolution-involution neural network, following the encoder-decoder architecture. We employed the vision transformer in the encoder section to focus on the global context, while the decoder involves a convolution-involution collaboration for resampling the polyp features. Involution enhances the model's ability to adaptively capture spatial and contextual information, while convolution focuses on local information, leading to more accurate feature extraction. The essential features captured by the transformer encoder are passed to the decoder through two skip connection pathways. The CBAM module refines the features and passes them to the convolution block, leveraging attention mechanisms to emphasize relevant information. Meanwhile, locality self-attention is employed to pass essential features to the involution block, reinforcing the model's ability to capture more global features in the polyp regions. Experiments were conducted on five public datasets: CVC-ClinicDB, CVC-ColonDB, Kvasir-SEG, Etis-LaribPolypDB, and CVC-300. The results obtained by InCoLoTransNet are optimal when compared with 15 state-of-the-art methods for polyp segmentation, achieving the highest mean dice score of 93% on CVC-ColonDB and 90% on mean intersection over union, outperforming the state-of-the-art methods. Additionally, InCoLoTransNet distinguishes itself in terms of polyp segmentation generalization performance. It achieved high scores in mean dice coefficient and mean intersection over union on unseen datasets as follows: 85% and 79% on CVC-ColonDB, 91% and 87% on CVC-300, and 79% and 70% on Etis-LaribPolypDB, respectively.
Collapse
Affiliation(s)
- Yassine Oukdach
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco.
| | - Anass Garbaz
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Zakaria Kerkaou
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Mohamed El Ansari
- Informatics and Applications Laboratory, Department of Computer Sciences, Faculty of Science, Moulay Ismail University, B.P 11201, Meknès, 52000, Morocco
| | - Lahcen Koutti
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Ahmed Fouad El Ouafdi
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Mouna Salihoun
- Faculty of Medicine and Pharmacy of Rabat, Mohammed V University of Rabat, Rabat, 10000, Morocco
| |
Collapse
|
30
|
Du X, Xu X, Chen J, Zhang X, Li L, Liu H, Li S. UM-Net: Rethinking ICGNet for polyp segmentation with uncertainty modeling. Med Image Anal 2025; 99:103347. [PMID: 39316997 DOI: 10.1016/j.media.2024.103347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 05/26/2024] [Accepted: 09/10/2024] [Indexed: 09/26/2024]
Abstract
Automatic segmentation of polyps from colonoscopy images plays a critical role in the early diagnosis and treatment of colorectal cancer. Nevertheless, some bottlenecks still exist. In our previous work, we mainly focused on polyps with intra-class inconsistency and low contrast, using ICGNet to solve them. Due to the different equipment, specific locations and properties of polyps, the color distribution of the collected images is inconsistent. ICGNet was designed primarily with reverse-contour guide information and local-global context information, ignoring this inconsistent color distribution, which leads to overfitting problems and makes it difficult to focus only on beneficial image content. In addition, a trustworthy segmentation model should not only produce high-precision results but also provide a measure of uncertainty to accompany its predictions so that physicians can make informed decisions. However, ICGNet only gives the segmentation result and lacks the uncertainty measure. To cope with these novel bottlenecks, we further extend the original ICGNet to a comprehensive and effective network (UM-Net) with two main contributions that have been proved by experiments to have substantial practical value. Firstly, we employ a color transfer operation to weaken the relationship between color and polyps, making the model more concerned with the shape of the polyps. Secondly, we provide the uncertainty to represent the reliability of the segmentation results and use variance to rectify uncertainty. Our improved method is evaluated on five polyp datasets, which shows competitive results compared to other advanced methods in both learning ability and generalization capability. The source code is available at https://github.com/dxqllp/UM-Net.
Collapse
Affiliation(s)
- Xiuquan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, China; School of Computer Science and Technology, Anhui University, Hefei, China
| | - Xuebin Xu
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Jiajia Chen
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Xuejun Zhang
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Lei Li
- Department of Neurology, Shuyang Affiliated Hospital of Nanjing University of Traditional Chinese Medicine, Suqian, China.
| | - Heng Liu
- Department of Gastroenterology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Shuo Li
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, USA
| |
Collapse
|
31
|
Song Z, Kang X, Wei X, Li S. Pixel-Centric Context Perception Network for Camouflaged Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18576-18589. [PMID: 37819817 DOI: 10.1109/tnnls.2023.3319323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Camouflaged object detection (COD) aims to identify object pixels visually embedded in the background environment. Existing deep learning methods fail to utilize the context information around different pixels adequately and efficiently. In order to solve this problem, a novel pixel-centric context perception network (PCPNet) is proposed, the core of which is to customize the personalized context of each pixel based on the automatic estimation of its surroundings. Specifically, PCPNet first employs an elegant encoder equipped with the designed vital component generation (VCG) module to obtain a set of compact features rich in low-level spatial and high-level semantic information across multiple subspaces. Then, we present a parameter-free pixel importance estimation (PIE) function based on multiwindow information fusion. Object pixels with complex backgrounds will be assigned with higher PIE values. Subsequently, PIE is utilized to regularize the optimization loss. In this way, the network can pay more attention to those pixels with higher PIE values in the decoding stage. Finally, a local continuity refinement module (LCRM) is used to refine the detection results. Extensive experiments on four COD benchmarks, five salient object detection (SOD) benchmarks, and five polyp segmentation benchmarks demonstrate the superiority of PCPNet with respect to other state-of-the-art methods.
Collapse
|
32
|
Erol T, Sarikaya D. PlutoNet: An efficient polyp segmentation network with modified partial decoder and decoder consistency training. Healthc Technol Lett 2024; 11:365-373. [PMID: 39720760 PMCID: PMC11665777 DOI: 10.1049/htl2.12105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Accepted: 11/25/2024] [Indexed: 12/26/2024] Open
Abstract
Deep learning models are used to minimize the number of polyps that goes unnoticed by the experts and to accurately segment the detected polyps during interventions. Although state-of-the-art models are proposed, it remains a challenge to define representations that are able to generalize well and that mediate between capturing low-level features and higher-level semantic details without being redundant. Another challenge with these models is that they are computation and memory intensive, which can pose a problem with real-time applications. To address these problems, PlutoNet is proposed for polyp segmentation which requires only 9 FLOPs and 2,626,537 parameters, less than 10% of the parameters required by its counterparts. With PlutoNet, a novel decoder consistency training approach is proposed that consists of a shared encoder, the modified partial decoder, which is a combination of the partial decoder and full-scale connections that capture salient features at different scales without redundancy, and the auxiliary decoder which focuses on higher-level semantic features. The modified partial decoder and the auxiliary decoder are trained with a combined loss to enforce consistency, which helps strengthen learned representations. Ablation studies and experiments are performed which show that PlutoNet performs significantly better than the state-of-the-art models, particularly on unseen datasets.
Collapse
Affiliation(s)
- Tugberk Erol
- Computer EngineeringGraduate School of Natural and Applied SciencesGazi UniversityAnkaraTürkiye
| | - Duygu Sarikaya
- School of Computer ScienceUniversity of LeedsLeedsUnited Kingdom
| |
Collapse
|
33
|
Xu Z, Miao Y, Chen G, Liu S, Chen H. GLGFormer: Global Local Guidance Network for Mucosal Lesion Segmentation in Gastrointestinal Endoscopy Images. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:2983-2995. [PMID: 38940891 PMCID: PMC11612111 DOI: 10.1007/s10278-024-01162-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 05/05/2024] [Accepted: 06/03/2024] [Indexed: 06/29/2024]
Abstract
Automatic mucosal lesion segmentation is a critical component in computer-aided clinical support systems for endoscopic image analysis. Image segmentation networks currently rely mainly on convolutional neural networks (CNNs) and Transformers, which have demonstrated strong performance in various applications. However, they cannot cope with blurred lesion boundaries and lesions of different scales in gastrointestinal endoscopy images. To address these challenges, we propose a new Transformer-based network, named GLGFormer, for the task of mucosal lesion segmentation. Specifically, we design the global guidance module to guide single-scale features patch-wise, enabling them to incorporate global information from the global map without information loss. Furthermore, a partial decoder is employed to fuse these enhanced single-scale features, achieving single-scale to multi-scale enhancement. Additionally, the local guidance module is designed to refocus attention on the neighboring patch, thus enhancing local features and refining lesion boundary segmentation. We conduct experiments on a private atrophic gastritis segmentation dataset and four public gastrointestinal polyp segmentation datasets. Compared to the current lesion segmentation networks, our proposed GLGFormer demonstrates outstanding learning and generalization capabilities. On the public dataset ClinicDB, GLGFormer achieved a mean intersection over union (mIoU) of 91.0% and a mean dice coefficient (mDice) of 95.0%. On the private dataset Gastritis-Seg, GLGFormer achieved an mIoU of 90.6% and an mDice of 94.6%.
Collapse
Affiliation(s)
- Zhiyang Xu
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, School of Information and Control Engineering, Advanced Robotics Research Center, China University of Mining and Technology, Xuzhou, Jiangsu, 221116, P. R. China
| | - Yanzi Miao
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, School of Information and Control Engineering, Advanced Robotics Research Center, China University of Mining and Technology, Xuzhou, Jiangsu, 221116, P. R. China.
| | - Guangxia Chen
- Department of Gastroenterology, Xuzhou Municipal Hospital Affiliated to Xuzhou Medical University, Xuzhou, Jiangsu, 221002, P. R. China
| | - Shiyu Liu
- Department of Gastroenterology, Xuzhou Municipal Hospital Affiliated to Xuzhou Medical University, Xuzhou, Jiangsu, 221002, P. R. China
| | - Hu Chen
- The First Clinical Medical School of Xuzhou Medical University, Xuzhou, Jiangsu, 221002, P. R. China
| |
Collapse
|
34
|
Peng C, Qian Z, Wang K, Zhang L, Luo Q, Bi Z, Zhang W. MugenNet: A Novel Combined Convolution Neural Network and Transformer Network with Application in Colonic Polyp Image Segmentation. SENSORS (BASEL, SWITZERLAND) 2024; 24:7473. [PMID: 39686010 DOI: 10.3390/s24237473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 11/18/2024] [Accepted: 11/21/2024] [Indexed: 12/18/2024]
Abstract
Accurate polyp image segmentation is of great significance, because it can help in the detection of polyps. Convolutional neural network (CNN) is a common automatic segmentation method, but its main disadvantage is the long training time. Transformer is another method that can be adapted to the automatic segmentation method by employing a self-attention mechanism, which essentially assigns different importance weights to each piece of information, thus achieving high computational efficiency during segmentation. However, a potential drawback with Transformer is the risk of information loss. The study reported in this paper employed the well-known hybridization principle to propose a method to combine CNN and Transformer to retain the strengths of both. Specifically, this study applied this method to the early detection of colonic polyps and to implement a model called MugenNet for colonic polyp image segmentation. We conducted a comprehensive experiment to compare MugenNet with other CNN models on five publicly available datasets. An ablation experiment on MugenNet was conducted as well. The experimental results showed that MugenNet can achieve a mean Dice of 0.714 on the ETIS dataset, which is the optimal performance on this dataset compared to other models, with an inference speed of 56 FPS. The overall outcome of this study is a method to optimally combine two methods of machine learning which are complementary to each other.
Collapse
Affiliation(s)
- Chen Peng
- School of Mechanical and Power Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Zhiqin Qian
- School of Mechanical and Power Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Kunyu Wang
- School of Mechanical and Power Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Lanzhu Zhang
- School of Mechanical and Power Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Qi Luo
- School of Mechanical and Power Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Zhuming Bi
- Department of Engineering, Purdue University, West Lafayette, IN 47907, USA
| | - Wenjun Zhang
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| |
Collapse
|
35
|
Wei X, Sun J, Su P, Wan H, Ning Z. BCL-Former: Localized Transformer Fusion with Balanced Constraint for polyp image segmentation. Comput Biol Med 2024; 182:109182. [PMID: 39341109 DOI: 10.1016/j.compbiomed.2024.109182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 09/18/2024] [Accepted: 09/19/2024] [Indexed: 09/30/2024]
Abstract
Polyp segmentation remains challenging for two reasons: (a) the size and shape of colon polyps are variable and diverse; (b) the distinction between polyps and mucosa is not obvious. To solve the above two challenging problems and enhance the generalization ability of segmentation method, we propose the Localized Transformer Fusion with Balanced Constraint (BCL-Former) for Polyp Segmentation. In BCL-Former, the Strip Local Enhancement module (SLE module) is proposed to capture the enhanced local features. The Progressive Feature Fusion module (PFF module) is presented to make the feature aggregation smoother and eliminate the difference between high-level and low-level features. Moreover, the Tversky-based Appropriate Constrained Loss (TacLoss) is proposed to achieve the balance and constraint between True Positives and False Negatives, improving the ability to generalize across datasets. Extensive experiments are conducted on four benchmark datasets. Results show that our proposed method achieves state-of-the-art performance in both segmentation precision and generalization ability. Also, the proposed method is 5%-8% faster than the benchmark method in training and inference. The code is available at: https://github.com/sjc-lbj/BCL-Former.
Collapse
Affiliation(s)
- Xin Wei
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| | - Jiacheng Sun
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| | - Pengxiang Su
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| | - Huan Wan
- School of Computer Information Engineering, Jiangxi Normal University, 99 Ziyang Avenue, Nanchang, 330022, China.
| | - Zhitao Ning
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| |
Collapse
|
36
|
Huang Q, Li G. Knowledge graph based reasoning in medical image analysis: A scoping review. Comput Biol Med 2024; 182:109100. [PMID: 39244959 DOI: 10.1016/j.compbiomed.2024.109100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 08/04/2024] [Accepted: 08/31/2024] [Indexed: 09/10/2024]
Abstract
Automated computer-aided diagnosis (CAD) is becoming more significant in the field of medicine due to advancements in computer hardware performance and the progress of artificial intelligence. The knowledge graph is a structure for visually representing knowledge facts. In the last decade, a large body of work based on knowledge graphs has effectively improved the organization and interpretability of large-scale complex knowledge. Introducing knowledge graph inference into CAD is a research direction with significant potential. In this review, we briefly review the basic principles and application methods of knowledge graphs firstly. Then, we systematically organize and analyze the research and application of knowledge graphs in medical imaging-assisted diagnosis. We also summarize the shortcomings of the current research, such as medical data barriers and deficiencies, low utilization of multimodal information, and weak interpretability. Finally, we propose future research directions with possibilities and potentials to address the shortcomings of current approaches.
Collapse
Affiliation(s)
- Qinghua Huang
- School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, 127 West Youyi Road, Beilin District, Xi'an, 710072, Shaanxi, China.
| | - Guanghui Li
- School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, 127 West Youyi Road, Beilin District, Xi'an, 710072, Shaanxi, China; School of Computer Science, Northwestern Polytechnical University, 1 Dongxiang Road, Chang'an District, Xi'an, 710129, Shaanxi, China.
| |
Collapse
|
37
|
Cai L, Chen L, Huang J, Wang Y, Zhang Y. Know your orientation: A viewpoint-aware framework for polyp segmentation. Med Image Anal 2024; 97:103288. [PMID: 39096844 DOI: 10.1016/j.media.2024.103288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 07/23/2024] [Accepted: 07/24/2024] [Indexed: 08/05/2024]
Abstract
Automatic polyp segmentation in endoscopic images is critical for the early diagnosis of colorectal cancer. Despite the availability of powerful segmentation models, two challenges still impede the accuracy of polyp segmentation algorithms. Firstly, during a colonoscopy, physicians frequently adjust the orientation of the colonoscope tip to capture underlying lesions, resulting in viewpoint changes in the colonoscopy images. These variations increase the diversity of polyp visual appearance, posing a challenge for learning robust polyp features. Secondly, polyps often exhibit properties similar to the surrounding tissues, leading to indistinct polyp boundaries. To address these problems, we propose a viewpoint-aware framework named VANet for precise polyp segmentation. In VANet, polyps are emphasized as a discriminative feature and thus can be localized by class activation maps in a viewpoint classification process. With these polyp locations, we design a viewpoint-aware Transformer (VAFormer) to alleviate the erosion of attention by the surrounding tissues, thereby inducing better polyp representations. Additionally, to enhance the polyp boundary perception of the network, we develop a boundary-aware Transformer (BAFormer) to encourage self-attention towards uncertain regions. As a consequence, the combination of the two modules is capable of calibrating predictions and significantly improving polyp segmentation performance. Extensive experiments on seven public datasets across six metrics demonstrate the state-of-the-art results of our method, and VANet can handle colonoscopy images in real-world scenarios effectively. The source code is available at https://github.com/1024803482/Viewpoint-Aware-Network.
Collapse
Affiliation(s)
- Linghan Cai
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China; Department of Electronic Information Engineering, Beihang University, Beijing, 100191, China.
| | - Lijiang Chen
- Department of Electronic Information Engineering, Beihang University, Beijing, 100191, China
| | - Jianhao Huang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China
| | - Yifeng Wang
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China
| | - Yongbing Zhang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China.
| |
Collapse
|
38
|
Manan MA, Feng J, Yaqub M, Ahmed S, Imran SMA, Chuhan IS, Khan HA. Multi-scale and multi-path cascaded convolutional network for semantic segmentation of colorectal polyps. ALEXANDRIA ENGINEERING JOURNAL 2024; 105:341-359. [DOI: 10.1016/j.aej.2024.06.095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
|
39
|
Xu W, Xu R, Wang C, Li X, Xu S, Guo L. PSTNet: Enhanced Polyp Segmentation With Multi-Scale Alignment and Frequency Domain Integration. IEEE J Biomed Health Inform 2024; 28:6042-6053. [PMID: 38954569 DOI: 10.1109/jbhi.2024.3421550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
Accurate segmentation of colorectal polyps in colonoscopy images is crucial for effective diagnosis and management of colorectal cancer (CRC). However, current deep learning-based methods primarily rely on fusing RGB information across multiple scales, leading to limitations in accurately identifying polyps due to restricted RGB domain information and challenges in feature misalignment during multi-scale aggregation. To address these limitations, we propose the Polyp Segmentation Network with Shunted Transformer (PSTNet), a novel approach that integrates both RGB and frequency domain cues present in the images. PSTNet comprises three key modules: the Frequency Characterization Attention Module (FCAM) for extracting frequency cues and capturing polyp characteristics, the Feature Supplementary Alignment Module (FSAM) for aligning semantic information and reducing misalignment noise, and the Cross Perception localization Module (CPM) for synergizing frequency cues with high-level semantics to achieve efficient polyp segmentation. Extensive experiments on challenging datasets demonstrate PSTNet's significant improvement in polyp segmentation accuracy across various metrics, consistently outperforming state-of-the-art methods. The integration of frequency domain cues and the novel architectural design of PSTNet contribute to advancing computer-assisted polyp segmentation, facilitating more accurate diagnosis and management of CRC.
Collapse
|
40
|
Tahir AM, Guo L, Ward RK, Yu X, Rideout A, Hore M, Wang ZJ. Explainable machine learning for assessing upper respiratory tract of racehorses from endoscopy videos. Comput Biol Med 2024; 181:109030. [PMID: 39173488 DOI: 10.1016/j.compbiomed.2024.109030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 06/20/2024] [Accepted: 08/13/2024] [Indexed: 08/24/2024]
Abstract
Laryngeal hemiplegia (LH) is a major upper respiratory tract (URT) complication in racehorses. Endoscopy imaging of horse throat is a gold standard for URT assessment. However, current manual assessment faces several challenges, stemming from the poor quality of endoscopy videos and subjectivity of manual grading. To overcome such limitations, we propose an explainable machine learning (ML)-based solution for efficient URT assessment. Specifically, a cascaded YOLOv8 architecture is utilized to segment the key semantic regions and landmarks per frame. Several spatiotemporal features are then extracted from key landmarks points and fed to a decision tree (DT) model to classify LH as Grade 1,2,3 or 4 denoting absence of LH, mild, moderate, and severe LH, respectively. The proposed method, validated through 5-fold cross-validation on 107 videos, showed promising performance in classifying different LH grades with 100%, 91.18%, 94.74% and 100% sensitivity values for Grade 1 to 4, respectively. Further validation on an external dataset of 72 cases confirmed its generalization capability with 90%, 80.95%, 100%, and 100% sensitivity values for Grade 1 to 4, respectively. We introduced several explainability related assessment functions, including: (i) visualization of YOLOv8 output to detect landmark estimation errors which can affect the final classification, (ii) time-series visualization to assess video quality, and (iii) backtracking of the DT output to identify borderline cases. We incorporated domain knowledge (e.g., veterinarian diagnostic procedures) into the proposed ML framework. This provides an assistive tool with clinical-relevance and explainability that can ease and speed up the URT assessment by veterinarians.
Collapse
Affiliation(s)
- Anas Mohammed Tahir
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC, Canada.
| | - Li Guo
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC, Canada.
| | - Rabab K Ward
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC, Canada.
| | - Xinhui Yu
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC, Canada.
| | - Andrew Rideout
- Point To Point Research & Development, Vancouver, BC, Canada.
| | - Michael Hore
- Hagyard Equine Medical Institute, Lexington, KY, USA.
| | - Z Jane Wang
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
41
|
Dai D, Dong C, Yan Q, Sun Y, Zhang C, Li Z, Xu S. I 2U-Net: A dual-path U-Net with rich information interaction for medical image segmentation. Med Image Anal 2024; 97:103241. [PMID: 38897032 DOI: 10.1016/j.media.2024.103241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 04/27/2024] [Accepted: 06/10/2024] [Indexed: 06/21/2024]
Abstract
Although the U-shape networks have achieved remarkable performances in many medical image segmentation tasks, they rarely model the sequential relationship of hierarchical layers. This weakness makes it difficult for the current layer to effectively utilize the historical information of the previous layer, leading to unsatisfactory segmentation results for lesions with blurred boundaries and irregular shapes. To solve this problem, we propose a novel dual-path U-Net, dubbed I2U-Net. The newly proposed network encourages historical information re-usage and re-exploration through rich information interaction among the dual paths, allowing deep layers to learn more comprehensive features that contain both low-level detail description and high-level semantic abstraction. Specifically, we introduce a multi-functional information interaction module (MFII), which can model cross-path, cross-layer, and cross-path-and-layer information interactions via a unified design, making the proposed I2U-Net behave similarly to an unfolded RNN and enjoying its advantage of modeling time sequence information. Besides, to further selectively and sensitively integrate the information extracted by the encoder of the dual paths, we propose a holistic information fusion and augmentation module (HIFA), which can efficiently bridge the encoder and the decoder. Extensive experiments on four challenging tasks, including skin lesion, polyp, brain tumor, and abdominal multi-organ segmentation, consistently show that the proposed I2U-Net has superior performance and generalization ability over other state-of-the-art methods. The code is available at https://github.com/duweidai/I2U-Net.
Collapse
Affiliation(s)
- Duwei Dai
- National-Local Joint Engineering Research Center of Biodiagnosis & Biotherapy, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China; Institute of Medical Artificial Intelligence, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China
| | - Caixia Dong
- Institute of Medical Artificial Intelligence, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China
| | - Qingsen Yan
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Yongheng Sun
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Chunyan Zhang
- National-Local Joint Engineering Research Center of Biodiagnosis & Biotherapy, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China
| | - Zongfang Li
- National-Local Joint Engineering Research Center of Biodiagnosis & Biotherapy, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China; Institute of Medical Artificial Intelligence, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China.
| | - Songhua Xu
- Institute of Medical Artificial Intelligence, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China.
| |
Collapse
|
42
|
Paderno A, Bedi N, Rau A, Holsinger CF. Computer Vision and Videomics in Otolaryngology-Head and Neck Surgery: Bridging the Gap Between Clinical Needs and the Promise of Artificial Intelligence. Otolaryngol Clin North Am 2024; 57:703-718. [PMID: 38981809 DOI: 10.1016/j.otc.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
This article discusses the role of computer vision in otolaryngology, particularly through endoscopy and surgery. It covers recent applications of artificial intelligence (AI) in nonradiologic imaging within otolaryngology, noting the benefits and challenges, such as improving diagnostic accuracy and optimizing therapeutic outcomes, while also pointing out the necessity for enhanced data curation and standardized research methodologies to advance clinical applications. Technical aspects are also covered, providing a detailed view of the progression from manual feature extraction to more complex AI models, including convolutional neural networks and vision transformers and their potential application in clinical settings.
Collapse
Affiliation(s)
- Alberto Paderno
- IRCCS Humanitas Research Hospital, via Manzoni 56, Rozzano, Milan 20089, Italy; Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, Pieve Emanuele, Milan 20072, Italy.
| | - Nikita Bedi
- Division of Head and Neck Surgery, Department of Otolaryngology, Stanford University, Palo Alto, CA, USA
| | - Anita Rau
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | | |
Collapse
|
43
|
Oukdach Y, Garbaz A, Kerkaou Z, El Ansari M, Koutti L, El Ouafdi AF, Salihoun M. UViT-Seg: An Efficient ViT and U-Net-Based Framework for Accurate Colorectal Polyp Segmentation in Colonoscopy and WCE Images. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:2354-2374. [PMID: 38671336 PMCID: PMC11522253 DOI: 10.1007/s10278-024-01124-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/01/2024] [Accepted: 04/13/2024] [Indexed: 04/28/2024]
Abstract
Colorectal cancer (CRC) stands out as one of the most prevalent global cancers. The accurate localization of colorectal polyps in endoscopy images is pivotal for timely detection and removal, contributing significantly to CRC prevention. The manual analysis of images generated by gastrointestinal screening technologies poses a tedious task for doctors. Therefore, computer vision-assisted cancer detection could serve as an efficient tool for polyp segmentation. Numerous efforts have been dedicated to automating polyp localization, with the majority of studies relying on convolutional neural networks (CNNs) to learn features from polyp images. Despite their success in polyp segmentation tasks, CNNs exhibit significant limitations in precisely determining polyp location and shape due to their sole reliance on learning local features from images. While gastrointestinal images manifest significant variation in their features, encompassing both high- and low-level ones, a framework that combines the ability to learn both features of polyps is desired. This paper introduces UViT-Seg, a framework designed for polyp segmentation in gastrointestinal images. Operating on an encoder-decoder architecture, UViT-Seg employs two distinct feature extraction methods. A vision transformer in the encoder section captures long-range semantic information, while a CNN module, integrating squeeze-excitation and dual attention mechanisms, captures low-level features, focusing on critical image regions. Experimental evaluations conducted on five public datasets, including CVC clinic, ColonDB, Kvasir-SEG, ETIS LaribDB, and Kvasir Capsule-SEG, demonstrate UViT-Seg's effectiveness in polyp localization. To confirm its generalization performance, the model is tested on datasets not used in training. Benchmarking against common segmentation methods and state-of-the-art polyp segmentation approaches, the proposed model yields promising results. For instance, it achieves a mean Dice coefficient of 0.915 and a mean intersection over union of 0.902 on the CVC Colon dataset. Furthermore, UViT-Seg has the advantage of being efficient, requiring fewer computational resources for both training and testing. This feature positions it as an optimal choice for real-world deployment scenarios.
Collapse
Affiliation(s)
- Yassine Oukdach
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco.
| | - Anass Garbaz
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Zakaria Kerkaou
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Mohamed El Ansari
- Informatics and Applications Laboratory, Department of Computer Sciences, Faculty of Science, Moulay Ismail University, B.P 11201, Meknès, 52000, Morocco
| | - Lahcen Koutti
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Ahmed Fouad El Ouafdi
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Mouna Salihoun
- Faculty of Medicine and Pharmacy of Rabat, Mohammed V University of Rabat, Rabat, 10000, Morocco
| |
Collapse
|
44
|
Tudela Y, Majó M, de la Fuente N, Galdran A, Krenzer A, Puppe F, Yamlahi A, Tran TN, Matuszewski BJ, Fitzgerald K, Bian C, Pan J, Liu S, Fernández-Esparrach G, Histace A, Bernal J. A complete benchmark for polyp detection, segmentation and classification in colonoscopy images. Front Oncol 2024; 14:1417862. [PMID: 39381041 PMCID: PMC11458519 DOI: 10.3389/fonc.2024.1417862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 07/11/2024] [Indexed: 10/10/2024] Open
Abstract
Introduction Colorectal cancer (CRC) is one of the main causes of deaths worldwide. Early detection and diagnosis of its precursor lesion, the polyp, is key to reduce its mortality and to improve procedure efficiency. During the last two decades, several computational methods have been proposed to assist clinicians in detection, segmentation and classification tasks but the lack of a common public validation framework makes it difficult to determine which of them is ready to be deployed in the exploration room. Methods This study presents a complete validation framework and we compare several methodologies for each of the polyp characterization tasks. Results Results show that the majority of the approaches are able to provide good performance for the detection and segmentation task, but that there is room for improvement regarding polyp classification. Discussion While studied show promising results in the assistance of polyp detection and segmentation tasks, further research should be done in classification task to obtain reliable results to assist the clinicians during the procedure. The presented framework provides a standarized method for evaluating and comparing different approaches, which could facilitate the identification of clinically prepared assisting methods.
Collapse
Affiliation(s)
- Yael Tudela
- Computer Vision Center and Computer Science Department, Universitat Autònoma de Cerdanyola del Valles, Barcelona, Spain
| | - Mireia Majó
- Computer Vision Center and Computer Science Department, Universitat Autònoma de Cerdanyola del Valles, Barcelona, Spain
| | - Neil de la Fuente
- Computer Vision Center and Computer Science Department, Universitat Autònoma de Cerdanyola del Valles, Barcelona, Spain
| | - Adrian Galdran
- Department of Information and Communication Technologies, SymBioSys Research Group, BCNMedTech, Barcelona, Spain
| | - Adrian Krenzer
- Artificial Intelligence and Knowledge Systems, Institute for Computer Science, Julius-Maximilians University of Würzburg, Würzburg, Germany
| | - Frank Puppe
- Artificial Intelligence and Knowledge Systems, Institute for Computer Science, Julius-Maximilians University of Würzburg, Würzburg, Germany
| | - Amine Yamlahi
- Division of Intelligent Medical Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Thuy Nuong Tran
- Division of Intelligent Medical Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Bogdan J. Matuszewski
- Computer Vision and Machine Learning (CVML) Research Group, University of Central Lancashir (UCLan), Preston, United Kingdom
| | - Kerr Fitzgerald
- Computer Vision and Machine Learning (CVML) Research Group, University of Central Lancashir (UCLan), Preston, United Kingdom
| | - Cheng Bian
- Hebei University of Technology, Baoding, China
| | | | - Shijle Liu
- Hebei University of Technology, Baoding, China
| | | | - Aymeric Histace
- ETIS UMR 8051, École Nationale Supérieure de l'Électronique et de ses Applications (ENSEA), Centre national de la recherche scientifique (CNRS), CY Paris Cergy University, Cergy, France
| | - Jorge Bernal
- Computer Vision Center and Computer Science Department, Universitat Autònoma de Cerdanyola del Valles, Barcelona, Spain
| |
Collapse
|
45
|
Meng L, Li Y, Duan W. Three-stage polyp segmentation network based on reverse attention feature purification with Pyramid Vision Transformer. Comput Biol Med 2024; 179:108930. [PMID: 39067285 DOI: 10.1016/j.compbiomed.2024.108930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 06/30/2024] [Accepted: 07/18/2024] [Indexed: 07/30/2024]
Abstract
Colorectal polyps serve as potential precursors of colorectal cancer and automating polyp segmentation aids physicians in accurately identifying potential polyp regions, thereby reducing misdiagnoses and missed diagnoses. However, existing models often fall short in accurately segmenting polyps due to the high degree of similarity between polyp regions and surrounding tissue in terms of color, texture, and shape. To address this challenge, this study proposes a novel three-stage polyp segmentation network, named Reverse Attention Feature Purification with Pyramid Vision Transformer (RAFPNet), which adopts an iterative feedback UNet architecture to refine polyp saliency maps for precise segmentation. Initially, a Multi-Scale Feature Aggregation (MSFA) module is introduced to generate preliminary polyp saliency maps. Subsequently, a Reverse Attention Feature Purification (RAFP) module is devised to effectively suppress low-level surrounding tissue features while enhancing high-level semantic polyp information based on the preliminary saliency maps. Finally, the UNet architecture is leveraged to further refine the feature maps in a coarse-to-fine approach. Extensive experiments conducted on five widely used polyp segmentation datasets and three video polyp segmentation datasets demonstrate the superior performance of RAFPNet over state-of-the-art models across multiple evaluation metrics.
Collapse
Affiliation(s)
- Lingbing Meng
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China
| | - Yuting Li
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China
| | - Weiwei Duan
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China.
| |
Collapse
|
46
|
Wang Z, Liu M, Jiang J, Qu X. Colorectal polyp segmentation with denoising diffusion probabilistic models. Comput Biol Med 2024; 180:108981. [PMID: 39146839 DOI: 10.1016/j.compbiomed.2024.108981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 07/13/2024] [Accepted: 08/01/2024] [Indexed: 08/17/2024]
Abstract
Early detection of polyps is essential to decrease colorectal cancer(CRC) incidence. Therefore, developing an efficient and accurate polyp segmentation technique is crucial for clinical CRC prevention. In this paper, we propose an end-to-end training approach for polyp segmentation that employs diffusion model. The images are considered as priors, and the segmentation is formulated as a mask generation process. In the sampling process, multiple predictions are generated for each input image using the trained model, and significant performance enhancements are achieved through the use of majority vote strategy. Four public datasets and one in-house dataset are used to train and test the model performance. The proposed method achieves mDice scores of 0.934 and 0.967 for datasets Kvasir-SEG and CVC-ClinicDB respectively. Furthermore, one cross-validation is applied to test the generalization of the proposed model, and the proposed methods outperformed previous state-of-the-art(SOTA) models to the best of our knowledge. The proposed method also significantly improves the segmentation accuracy and has strong generalization capability.
Collapse
Affiliation(s)
- Zenan Wang
- Department of Gastroenterology, Beijing Chaoyang Hospital, the Third Clinical Medical College of Capital Medical University, Beijing, China.
| | - Ming Liu
- Hunan Key Laboratory of Nonferrous Resources and Geological Hazard Exploration, Changsha, China
| | - Jue Jiang
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York City, NY, United States
| | - Xiaolei Qu
- School of Instrumentation and Optoelectronics Engineering, Beihang University, Beijing, China
| |
Collapse
|
47
|
Liu J, Jiao G. Cross-domain additive learning of new knowledge rather than replacement. Biomed Eng Lett 2024; 14:1137-1146. [PMID: 39220031 PMCID: PMC11362399 DOI: 10.1007/s13534-024-00399-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 01/10/2024] [Accepted: 05/27/2024] [Indexed: 09/04/2024] Open
Abstract
In medical clinical scenarios for reasons such as patient privacy, information protection and data migration, when domain adaptation is needed for real scenarios, the source-domain data is often inaccessible and only the pre-trained source model on the source-domain is available. Existing solutions for this type of problem tend to forget the rich task experience previously learned on the source domain after adapting, which means that the model simply overfits the target-domain data when adapting and does not learn robust features that facilitate real task decisions. We address this problem by exploring the particular application of source-free domain adaptation in medical image segmentation and propose a two-stage additive source-free adaptation framework. We generalize the domain-invariant features by constraining the core pathological structure and semantic consistency between different perspectives. And we reduce the segmentation generated by locating and filtering elements that may have errors through Monte-Carlo uncertainty estimation. We conduct comparison experiments with some other methods on a cross-device polyp segmentation and a cross-modal brain tumor segmentation dataset, the results in both the target and source domains verify that the proposed method can effectively solve the domain offset problem and the model retains its dominance on the source domain after learning new knowledge of the target domain.This work provides valuable exploration for achieving additive learning on the target and source domains in the absence of source data and offers new ideas and methods for adaptation research in the field of medical image segmentation.
Collapse
Affiliation(s)
- Jiahao Liu
- College of Computer Science, Hengyang Normal University, Hengyang, 421008 China
| | - Ge Jiao
- College of Computer Science, Hengyang Normal University, Hengyang, 421008 China
| |
Collapse
|
48
|
Tang S, Ran H, Yang S, Wang Z, Li W, Li H, Meng Z. A frequency selection network for medical image segmentation. Heliyon 2024; 10:e35698. [PMID: 39220902 PMCID: PMC11365330 DOI: 10.1016/j.heliyon.2024.e35698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 07/18/2024] [Accepted: 08/01/2024] [Indexed: 09/04/2024] Open
Abstract
Existing medical image segmentation methods may only consider feature extraction and information processing in spatial domain, or lack the design of interaction between frequency information and spatial information, or ignore the semantic gaps between shallow and deep features, and lead to inaccurate segmentation results. Therefore, in this paper, we propose a novel frequency selection segmentation network (FSSN), which achieves more accurate lesion segmentation by fusing local spatial features and global frequency information, better design of feature interactions, and suppressing low correlation frequency components for mitigating semantic gaps. Firstly, we propose a global-local feature aggregation module (GLAM) to simultaneously capture multi-scale local features in the spatial domain and exploits global frequency information in the frequency domain, and achieves complementary fusion of local details features and global frequency information. Secondly, we propose a feature filter module (FFM) to mitigate semantic gaps when we conduct cross-level features fusion, and makes FSSN discriminatively determine which frequency information should be preserved for accurate lesion segmentation. Finally, in order to make better use of local information, especially the boundary of lesion region, we employ deformable convolution (DC) to extract pertinent features in the local range, and makes our FSSN can focus on relevant image contents better. Extensive experiments on two public benchmark datasets show that compared with representative medical image segmentation methods, our FSSN can obtain more accurate lesion segmentation results in terms of both objective evaluation indicators and subjective visual effects with fewer parameters and lower computational complexity.
Collapse
Affiliation(s)
- Shu Tang
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Haiheng Ran
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Shuli Yang
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Zhaoxia Wang
- Chongqing Emergency Medical Center, Chongqing University Central Hospital, School of Medicine, Chongqing University, Chongqing, China
| | - Wei Li
- Children’s Hospital of Chongqing Medical University, China
| | - Haorong Li
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Zihao Meng
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| |
Collapse
|
49
|
Chang Q, Ahmad D, Toth J, Bascom R, Higgins WE. ESFPNet: Efficient Stage-Wise Feature Pyramid on Mix Transformer for Deep Learning-Based Cancer Analysis in Endoscopic Video. J Imaging 2024; 10:191. [PMID: 39194980 DOI: 10.3390/jimaging10080191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 07/19/2024] [Accepted: 08/01/2024] [Indexed: 08/29/2024] Open
Abstract
For patients at risk of developing either lung cancer or colorectal cancer, the identification of suspect lesions in endoscopic video is an important procedure. The physician performs an endoscopic exam by navigating an endoscope through the organ of interest, be it the lungs or intestinal tract, and performs a visual inspection of the endoscopic video stream to identify lesions. Unfortunately, this entails a tedious, error-prone search over a lengthy video sequence. We propose a deep learning architecture that enables the real-time detection and segmentation of lesion regions from endoscopic video, with our experiments focused on autofluorescence bronchoscopy (AFB) for the lungs and colonoscopy for the intestinal tract. Our architecture, dubbed ESFPNet, draws on a pretrained Mix Transformer (MiT) encoder and a decoder structure that incorporates a new Efficient Stage-Wise Feature Pyramid (ESFP) to promote accurate lesion segmentation. In comparison to existing deep learning models, the ESFPNet model gave superior lesion segmentation performance for an AFB dataset. It also produced superior segmentation results for three widely used public colonoscopy databases and nearly the best results for two other public colonoscopy databases. In addition, the lightweight ESFPNet architecture requires fewer model parameters and less computation than other competing models, enabling the real-time analysis of input video frames. Overall, these studies point to the combined superior analysis performance and architectural efficiency of the ESFPNet for endoscopic video analysis. Lastly, additional experiments with the public colonoscopy databases demonstrate the learning ability and generalizability of ESFPNet, implying that the model could be effective for region segmentation in other domains.
Collapse
Affiliation(s)
- Qi Chang
- School of Electrical Engineering and Computer Science, Penn State University, University Park, PA 16802, USA
| | - Danish Ahmad
- Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA
| | - Jennifer Toth
- Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA
| | - Rebecca Bascom
- Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA
| | - William E Higgins
- School of Electrical Engineering and Computer Science, Penn State University, University Park, PA 16802, USA
| |
Collapse
|
50
|
Lin Q, Tan W, Cai S, Yan B, Li J, Zhong Y. Lesion-Decoupling-Based Segmentation With Large-Scale Colon and Esophageal Datasets for Early Cancer Diagnosis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11142-11156. [PMID: 37028330 DOI: 10.1109/tnnls.2023.3248804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Lesions of early cancers often show flat, small, and isochromatic characteristics in medical endoscopy images, which are difficult to be captured. By analyzing the differences between the internal and external features of the lesion area, we propose a lesion-decoupling-based segmentation (LDS) network for assisting early cancer diagnosis. We introduce a plug-and-play module called self-sampling similar feature disentangling module (FDM) to obtain accurate lesion boundaries. Then, we propose a feature separation loss (FSL) function to separate pathological features from normal ones. Moreover, since physicians make diagnoses with multimodal data, we propose a multimodal cooperative segmentation network with two different modal images as input: white-light images (WLIs) and narrowband images (NBIs). Our FDM and FSL show a good performance for both single-modal and multimodal segmentations. Extensive experiments on five backbones prove that our FDM and FSL can be easily applied to different backbones for a significant lesion segmentation accuracy improvement, and the maximum increase of mean Intersection over Union (mIoU) is 4.58. For colonoscopy, we can achieve up to mIoU of 91.49 on our Dataset A and 84.41 on the three public datasets. For esophagoscopy, mIoU of 64.32 is best achieved on the WLI dataset and 66.31 on the NBI dataset.
Collapse
|