1
|
Kang X, Ma Z, Liu K, Li Y, Miao Q. Modeling multi-scale uncertainty with evidence integration for reliable polyp segmentation. Neural Netw 2025; 189:107553. [PMID: 40409011 DOI: 10.1016/j.neunet.2025.107553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2024] [Revised: 03/26/2025] [Accepted: 04/25/2025] [Indexed: 05/25/2025]
Abstract
Polyp segmentation is critical in medical image analysis. Traditional methods, while capable of producing precise outputs in well-defined regions, often struggle with blurry or ambiguous areas in medical images, which can lead to errors in clinical decision-making. Additionally, these methods typically generate only a single deterministic segmentation result, failing to account for the inherent uncertainty in the segmentation process. This limitation undermines the reliability of segmentation models in clinical practice, as they lack the ability to provide insights into the confidence or certainty of their predictions, leaving clinicians skeptical of their utility. To address these challenges, we propose a novel multi-scale uncertainty modeling framework for polyp segmentation, grounded in evidence theory. Our approach leverages the Dirichlet distribution to classify pixels within polyp images while integrating uncertainty across different scales. We first employ an Uncertainty Region Enhancement Process (UREP) to refine uncertain regions and Integrated Balance Module (IBM) to dynamically balance the weights between different feature maps for generating semantic fusion feature maps. Subsequently, we utilize two feature extraction sub-networks to learn feature representations from original images and semantic fusion feature maps. We further develop a Multi-scale Evidence Integration Network (MEIN) to robustly model uncertainty through subjective logic, merging results from two sub-networks to ensure a comprehensive understanding of uncertainty and produce reliable segmentation results. In contrast to most existing methods, our approach not only generates segmentation results but also provides uncertainty estimates, offering clinicians valuable insights into the reliability of the predictions. Experimental results on five polyp segmentation datasets demonstrate that our proposed method remains competitive and generates effective uncertainty estimations compared to existing representative methods. The code is available at https://github.com/q1216355254/MEIN.
Collapse
Affiliation(s)
- Xiaolu Kang
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China; Xi'an Key Laboratory of Big Data and Intelligent Vision, Xi'an, 710071, Shaanxi, China
| | - Zhuoqi Ma
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China; Xi'an Key Laboratory of Big Data and Intelligent Vision, Xi'an, 710071, Shaanxi, China
| | - Kang Liu
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China; Xi'an Key Laboratory of Big Data and Intelligent Vision, Xi'an, 710071, Shaanxi, China
| | - Yunan Li
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China; Xi'an Key Laboratory of Big Data and Intelligent Vision, Xi'an, 710071, Shaanxi, China
| | - Qiguang Miao
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China; Xi'an Key Laboratory of Big Data and Intelligent Vision, Xi'an, 710071, Shaanxi, China.
| |
Collapse
|
2
|
Song Y, Du S, Wang R, Liu F, Lin X, Chen J, Li Z, Li Z, Yang L, Zhang Z, Yan H, Zhang Q, Qian D, Li X. Polyp-Size: A Precise Endoscopic Dataset for AI-Driven Polyp Sizing. Sci Data 2025; 12:918. [PMID: 40450075 DOI: 10.1038/s41597-025-05251-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2024] [Accepted: 05/21/2025] [Indexed: 06/03/2025] Open
Abstract
Colorectal cancer often arises from precancerous polyps, where accurate size assessment is vital for clinical decisions but challenged by subjective methods. While artificial intelligence (AI) has shown promise in improving the accuracy of polyp size estimation, its development depends on large, meticulously annotated datasets. We present Polyp-Size, a dataset of 42 high-resolution white-light colonoscopy videos with polyp sizes precisely measured post-resection using vernier calipers to submillimeter precision. Unlike existing datasets primarily focused on polyp detection or segmentation, Polyp-Size offers validated size annotations, diverse polyp features (Paris classification, anatomical location and histological type), and standardized video formats, enabling robust AI models for size estimation. By making this resource publicly available, we aim to foster research collaboration and innovation in automated polyp measurement to ultimately improve clinical practice.
Collapse
Affiliation(s)
- Yiming Song
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, NHC Key Laboratory of Digestive Diseases, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Sijia Du
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Ruilan Wang
- Department of Gastroenterology, Armed Police Forces Hospital of Sichuan, Leshan, Sichuan Province, China
| | - Fei Liu
- Departmant of Gastroenterology, Nine Division Hospital of Xinjiang Production and Construction Corps, Tacheng Xinjiang Uygur Autonomous Region, Tacheng, China
| | - Xiaolu Lin
- Department of Digestive Endoscopy Center, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, Fujian, China
| | - Jinnan Chen
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, NHC Key Laboratory of Digestive Diseases, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zeyu Li
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, NHC Key Laboratory of Digestive Diseases, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhao Li
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, NHC Key Laboratory of Digestive Diseases, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Liuyi Yang
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, NHC Key Laboratory of Digestive Diseases, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhengjie Zhang
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Hao Yan
- The Second Clinical Medical College, Harbin Medical University, Harbin, 150081, China
| | - Qingwei Zhang
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, NHC Key Laboratory of Digestive Diseases, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| | - Dahong Qian
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| | - Xiaobo Li
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, NHC Key Laboratory of Digestive Diseases, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| |
Collapse
|
3
|
Sasmal P, Kumar Panigrahi S, Panda SL, Bhuyan MK. Attention-guided deep framework for polyp localization and subsequent classification via polyp local and Siamese feature fusion. Med Biol Eng Comput 2025:10.1007/s11517-025-03369-z. [PMID: 40314710 DOI: 10.1007/s11517-025-03369-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Accepted: 04/16/2025] [Indexed: 05/03/2025]
Abstract
Colorectal cancer (CRC) is one of the leading causes of death worldwide. This paper proposes an automated diagnostic technique to detect, localize, and classify polyps in colonoscopy video frames. The proposed model adopts the deep YOLOv4 model that incorporates both spatial and contextual information in the form of spatial attention and channel attention blocks, respectively for better localization of polyps. Finally, leveraging a fusion of deep and handcrafted features, the detected polyps are classified as adenoma or non-adenoma. Polyp shape and texture are essential features in discriminating polyp types. Therefore, the proposed work utilizes a pyramid histogram of oriented gradient (PHOG) and embedding features learned via triplet Siamese architecture to extract these features. The PHOG extracts local shape information from each polyp class, whereas the Siamese network extracts intra-polyp discriminating features. The individual and cross-database performances on two databases suggest the robustness of our method in polyp localization. The competitive analysis based on significant clinical parameters with current state-of-the-art methods confirms that our method can be used for automated polyp localization in both real-time and offline colonoscopic video frames. Our method provides an average precision of 0.8971 and 0.9171 and an F1 score of 0.8869 and 0.8812 for the Kvasir-SEG and SUN databases. Similarly, the proposed classification framework for the detected polyps yields a classification accuracy of 96.66% on a publicly available UCI colonoscopy video dataset. Moreover, the classification framework provides an F1 score of 96.54% that validates the potential of the proposed framework in polyp localization and classification.
Collapse
Affiliation(s)
- Pradipta Sasmal
- Department of Electrical Engineering, Indian Institute of Technology, Kharagpur, West Bengal, 721302, India.
| | - Susant Kumar Panigrahi
- Department of Electrical Engineering, Indian Institute of Technology, Kharagpur, West Bengal, 721302, India
| | - Swarna Laxmi Panda
- Department of Electronics and Communication Engineering, National Institute of Technology, Rourkela, Odisha, 769008, India
| | - M K Bhuyan
- Department of Electronics and Electrical Engineering, Indian Institute of Technology, Guwahati, Assam, 781039, India
| |
Collapse
|
4
|
Huang K, Zhou T, Fu H, Zhang Y, Zhou Y, Gong C, Liang D. Learnable Prompting SAM-Induced Knowledge Distillation for Semi-Supervised Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:2295-2306. [PMID: 40030924 DOI: 10.1109/tmi.2025.3530097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
The limited availability of labeled data has driven advancements in semi-supervised learning for medical image segmentation. Modern large-scale models tailored for general segmentation, such as the Segment Anything Model (SAM), have revealed robust generalization capabilities. However, applying these models directly to medical image segmentation still exposes performance degradation. In this paper, we propose a learnable prompting SAM-induced Knowledge distillation framework (KnowSAM) for semi-supervised medical image segmentation. Firstly, we propose a Multi-view Co-training (MC) strategy that employs two distinct sub-networks to employ a co-teaching paradigm, resulting in more robust outcomes. Secondly, we present a Learnable Prompt Strategy (LPS) to dynamically produce dense prompts and integrate an adapter to fine-tune SAM specifically for medical image segmentation tasks. Moreover, we propose SAM-induced Knowledge Distillation (SKD) to transfer useful knowledge from SAM to two sub-networks, enabling them to learn from SAM's predictions and alleviate the effects of incorrect pseudo-labels during training. Notably, the predictions generated by our subnets are used to produce mask prompts for SAM, facilitating effective inter-module information exchange. Extensive experimental results on various medical segmentation tasks demonstrate that our model outperforms the state-of-the-art semi-supervised segmentation approaches. Crucially, our SAM distillation framework can be seamlessly integrated into other semi-supervised segmentation methods to enhance performance. The code will be released upon acceptance of this manuscript at https://github.com/taozh2017/KnowSAM.
Collapse
|
5
|
Wang Z, Li T, Liu M, Jiang J, Liu X. DCATNet: polyp segmentation with deformable convolution and contextual-aware attention network. BMC Med Imaging 2025; 25:120. [PMID: 40229681 PMCID: PMC11998341 DOI: 10.1186/s12880-025-01661-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Accepted: 04/03/2025] [Indexed: 04/16/2025] Open
Abstract
Polyp segmentation is crucial in computer-aided diagnosis but remains challenging due to the complexity of medical images and anatomical variations. Current state-of-the-art methods struggle with accurate polyp segmentation due to the variability in size, shape, and texture. These factors make boundary detection challenging, often resulting in incomplete or inaccurate segmentation. To address these challenges, we propose DCATNet, a novel deep learning architecture specifically designed for polyp segmentation. DCATNet is a U-shaped network that combines ResNetV2-50 as an encoder for capturing local features and a Transformer for modeling long-range dependencies. It integrates three key components: the Geometry Attention Module (GAM), the Contextual Attention Gate (CAG), and the Multi-scale Feature Extraction (MSFE) block. We evaluated DCATNet on five public datasets. On Kvasir-SEG and CVC-ClinicDB, the model achieved mean dice scores of 0.9351 and 0.9444, respectively, outperforming previous state-of-the-art (SOTA) methods. Cross-validation further demonstrated its superior generalization capability. Ablation studies confirmed the effectiveness of each component in DCATNet. Integrating GAM, CAG, and MSFE effectively improves feature representation and fusion, leading to precise and reliable segmentation results. These findings underscore DCATNet's potential for clinical application and can be used for a wide range of medical image segmentation tasks.
Collapse
Affiliation(s)
- Zenan Wang
- Department of Gastroenterology, Beijing Chaoyang Hospital, The Third Clinical Medical College of Capital Medical University, Beijing, China
| | - Tianshu Li
- Department of Gastroenterology, Beijing Chaoyang Hospital, The Third Clinical Medical College of Capital Medical University, Beijing, China
| | - Ming Liu
- Hunan Key Laboratory of Nonferrous Resources and Geological Hazard Exploration, Changsha, China
| | - Jue Jiang
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York City, NY, USA
| | - Xinjuan Liu
- Department of Gastroenterology, Beijing Chaoyang Hospital, The Third Clinical Medical College of Capital Medical University, Beijing, China.
| |
Collapse
|
6
|
Xing H, Sun R, Ren J, Wei J, Feng CM, Ding X, Guo Z, Wang Y, Hu Y, Wei W, Ban X, Xie C, Tan Y, Liu X, Cui S, Duan X, Li Z. Achieving flexible fairness metrics in federated medical imaging. Nat Commun 2025; 16:3342. [PMID: 40199877 PMCID: PMC11978761 DOI: 10.1038/s41467-025-58549-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 03/26/2025] [Indexed: 04/10/2025] Open
Abstract
The rapid adoption of Artificial Intelligence (AI) in medical imaging raises fairness and privacy concerns across demographic groups, especially in diagnosis and treatment decisions. While federated learning (FL) offers decentralized privacy preservation, current frameworks often prioritize collaboration fairness over group fairness, risking healthcare disparities. Here we present FlexFair, an innovative FL framework designed to address both fairness and privacy challenges. FlexFair incorporates a flexible regularization term to facilitate the integration of multiple fairness criteria, including equal accuracy, demographic parity, and equal opportunity. Evaluated across four clinical applications (polyp segmentation, fundus vascular segmentation, cervical cancer segmentation, and skin disease diagnosis), FlexFair outperforms state-of-the-art methods in both fairness and accuracy. Moreover, we curate a multi-center dataset for cervical cancer segmentation that includes 678 patients from four hospitals. This diverse dataset allows for a more comprehensive analysis of model performance across different population groups, ensuring the findings are applicable to a broader range of patients.
Collapse
Grants
- This work was supported by Shenzhen-Hong Kong Joint Funding No. SGDX20211123112401002, by NSFC with Grant No. 62293482, by the Basic Research Project No. HZQB-KCZYZ-2021067 of Hetao Shenzhen HK S&T Cooperation Zone, by Shenzhen General Program No. JCYJ20220530143600001, by the Shenzhen Outstanding Talents Training Fund 202002, by Guang-dong Research Project No. 2017ZT07X152 and No. 2019CX01X104, by the Guangdong Provin-cial Key Laboratory of Future Networks of Intelligence (Grant No. 2022B1212010001), by the Guangdong Provincial Key Laboratory of Big Data Computing, CHUK-Shenzhen, by the NSFC 61931024&12326610, by the Key Area R&D Program of Guangdong Province with grant No. 2018B030338001, by the Shenzhen Key Laboratory of Big Data and Artificial Intelligence (Grant No. ZDSYS201707251409055), and by Tencent & Huawei Open Fund.
Collapse
Affiliation(s)
- Huijun Xing
- Shenzhen Future Network of Intelligence Institute and Guangdong Provincial Key Laboratory of Future Networks of Intelligence, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China
| | - Rui Sun
- Shenzhen Future Network of Intelligence Institute and Guangdong Provincial Key Laboratory of Future Networks of Intelligence, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China
| | - Jinke Ren
- Shenzhen Future Network of Intelligence Institute and Guangdong Provincial Key Laboratory of Future Networks of Intelligence, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China
| | - Jun Wei
- Shenzhen Future Network of Intelligence Institute and Guangdong Provincial Key Laboratory of Future Networks of Intelligence, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China
| | - Chun-Mei Feng
- Institute of High Performance Computing, Agency for Science, Technology and Research, Singapore, Singapore
| | - Xuan Ding
- Department of Statistics, Faculty of Arts and Sciences, Beijing Normal University, Zhuhai, Guangdong, China
| | - Zilu Guo
- Shenzhen Future Network of Intelligence Institute and Guangdong Provincial Key Laboratory of Future Networks of Intelligence, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China
| | - Yu Wang
- Department of Radiology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Yudong Hu
- Aberdeen Institute of Data Science and Artificial Intelligence, South China Normal University, Foshan, Guangdong, China
| | - Wei Wei
- Department of Gynecologic Oncology, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong, China
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, Guangdong, China
| | - Xiaohua Ban
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, Guangdong, China
- Department of Radiology, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong, China
| | - Chuanlong Xie
- Department of Statistics, Faculty of Arts and Sciences, Beijing Normal University, Zhuhai, Guangdong, China.
| | - Yu Tan
- Department of Radiology, Guangdong Women and Children Hospital, Guangzhou, China
| | - Xian Liu
- Radiology Department, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, Guangdong, China
| | - Shuguang Cui
- Shenzhen Future Network of Intelligence Institute and Guangdong Provincial Key Laboratory of Future Networks of Intelligence, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China
| | - Xiaohui Duan
- Department of Radiology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China.
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Medical Research Center, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China.
| | - Zhen Li
- Shenzhen Future Network of Intelligence Institute and Guangdong Provincial Key Laboratory of Future Networks of Intelligence, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China.
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China.
| |
Collapse
|
7
|
Wang Z, Guo L, Zhao S, Zhang S, Zhao X, Fang J, Wang G, Lu H, Yu J, Tian Q. Multi-Scale Group Agent Attention-Based Graph Convolutional Decoding Networks for 2D Medical Image Segmentation. IEEE J Biomed Health Inform 2025; 29:2718-2730. [PMID: 40030822 DOI: 10.1109/jbhi.2024.3523112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Automated medical image segmentation plays a crucial role in assisting doctors in diagnosing diseases. Feature decoding is a critical yet challenging issue for medical image segmentation. To address this issue, this work proposes a novel feature decoding network, called multi-scale group agent attention-based graph convolutional decoding networks (MSGAA-GCDN), to learn local-global features in graph structures for 2D medical image segmentation. The proposed MSGAA-GCDN combines graph convolutional network (GCN) and a lightweight multi-scale group agent attention (MSGAA) mechanism to represent features globally and locally within a graph structure. Moreover, in skip connections a simple yet efficient attention-based upsampling convolution fusion (AUCF) module is designed to enhance encoder-decoder feature fusion in both channel and spatial dimensions. Extensive experiments are conducted on three typical medical image segmentation tasks, namely Synapse abdominal multi-organs, Cardiac organs, and Polyp lesions. Experimental results demonstrate that the proposed MSGAA-GCDN outperforms the state-of-the-art methods, and the designed MSGAA is a lightweight yet effective attention architecture. The proposed MSGAA-GCDN can be easily taken as a plug-and-play decoder cascaded with other encoders for general medical image segmentation tasks.
Collapse
|
8
|
Zhang Z, Jiang Y, Wang Y, Xie B, Zhang W, Li Y, Chen Z, Jin X, Zeng W. Exploring Contrastive Pre-Training for Domain Connections in Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:1686-1698. [PMID: 40030864 DOI: 10.1109/tmi.2024.3525095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Unsupervised domain adaptation (UDA) in medical image segmentation aims to improve the generalization of deep models by alleviating domain gaps caused by inconsistency across equipment, imaging protocols, and patient conditions. However, existing UDA works remain insufficiently explored and present great limitations: 1) Exhibit cumbersome designs that prioritize aligning statistical metrics and distributions, which limits the model's flexibility and generalization while also overlooking the potential knowledge embedded in unlabeled data; 2) More applicable in a certain domain, lack the generalization capability to handle diverse shifts encountered in clinical scenarios. To overcome these limitations, we introduce MedCon, a unified framework that leverages general unsupervised contrastive pre-training to establish domain connections, effectively handling diverse domain shifts without tailored adjustments. Specifically, it initially explores a general contrastive pre-training to establish domain connections by leveraging the rich prior knowledge from unlabeled images. Thereafter, the pre-trained backbone is fine-tuned using source-based images to ultimately identify per-pixel semantic categories. To capture both intra- and inter-domain connections of anatomical structures, we construct positive-negative pairs from a hybrid aspect of both local and global scales. In this regard, a shared-weight encoder-decoder is employed to generate pixel-level representations, which are then mapped into hyper-spherical space using a non-learnable projection head to facilitate positive pair matching. Comprehensive experiments on diverse medical image datasets confirm that MedCon outperforms previous methods by effectively managing a wide range of domain shifts and showcasing superior generalization capabilities.
Collapse
|
9
|
Peng L, Liu W, Xie S, Ye L, Ye P, Xiao F, Bian L. Uncertainty-Driven Parallel Transformer-Based Segmentation for Oral Disease Dataset. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:1632-1644. [PMID: 40036515 DOI: 10.1109/tip.2025.3544139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Accurate oral disease segmentation is a challenging task, for three major reasons: 1) The same type of oral disease has a diversity of size, color and texture; 2) The boundary between oral lesions and their surrounding mucosa is not sharp; 3) There is a lack of public large-scale oral disease segmentation datasets. To address these issues, we first report an oral disease segmentation network termed Oralformer, which enables to tackle multiple oral diseases. Specifically, we use a parallel design to combine local-window self-attention (LWSA) with channel-wise convolution (CWC), modeling cross-window connections to enlarge the receptive fields while maintaining linear complexity. Meanwhile, we connect these two branches with bi-directional interactions to form a basic parallel Transformer block namely LC-block. We insert the LC-block as the main building block in a U-shape encoder-decoder architecture to form Oralformer. Second, we introduce an uncertainty-driven self-adaptive loss function which can reinforce the network's attention on the lesion's edge regions that are easily confused, thus improving the segmentation accuracy of these regions. Third, we construct a large-scale oral disease segmentation (ODS) dataset containing 2602 image pairs. It covers three common oral diseases (including dental plaque, calculus and caries) and all age groups, which we hope will advance the field. Extensive experiments on six challenging datasets show that our Oralformer achieves state-of-the-art segmentation accuracy, and presents advantages in terms of generalizability and real-time segmentation efficiency (35fps). The code and ODS dataset will be publicly available at https://github.com/LintaoPeng/Oralformer.
Collapse
|
10
|
Elamin S, Johri S, Rajpurkar P, Geisler E, Berzin TM. From data to artificial intelligence: evaluating the readiness of gastrointestinal endoscopy datasets. J Can Assoc Gastroenterol 2025; 8:S81-S86. [PMID: 39990508 PMCID: PMC11842897 DOI: 10.1093/jcag/gwae041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/25/2025] Open
Abstract
The incorporation of artificial intelligence (AI) into gastrointestinal (GI) endoscopy represents a promising advancement in gastroenterology. With over 40 published randomized controlled trials and numerous ongoing clinical trials, gastroenterology leads other medical disciplines in AI research. Computer-aided detection algorithms for identifying colorectal polyps have achieved regulatory approval and are in routine clinical use, while other AI applications for GI endoscopy are in advanced development stages. Near-term opportunities include the potential for computer-aided diagnosis to replace conventional histopathology for diagnosing small colon polyps and increased AI automation in capsule endoscopy. Despite significant development in research settings, the generalizability and robustness of AI models in real clinical practice remain inconsistent. The GI field lags behind other medical disciplines in the breadth of novel AI algorithms, with only 13 out of 882 Food and Drug Administration (FDA)-approved AI models focussed on GI endoscopy as of June 2024. Additionally, existing GI endoscopy image databases are disproportionately focussed on colon polyps, lacking representation of the diversity of other endoscopic findings. High-quality datasets, encompassing a wide range of patient demographics, endoscopic equipment types, and disease states, are crucial for developing effective AI models for GI endoscopy. This article reviews the current state of GI endoscopy datasets, barriers to progress, including dataset size, data diversity, annotation quality, and ethical issues in data collection and usage, and future needs for advancing AI in GI endoscopy.
Collapse
Affiliation(s)
- Sami Elamin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Shreya Johri
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Pranav Rajpurkar
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Enrik Geisler
- Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02115, USA
| | - Tyler M Berzin
- Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
11
|
Ke X, Chen G, Liu H, Guo W. MEFA-Net: A mask enhanced feature aggregation network for polyp segmentation. Comput Biol Med 2025; 186:109601. [PMID: 39740513 DOI: 10.1016/j.compbiomed.2024.109601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 11/30/2024] [Accepted: 12/18/2024] [Indexed: 01/02/2025]
Abstract
Accurate polyp segmentation is crucial for early diagnosis and treatment of colorectal cancer. This is a challenging task for three main reasons: (i) the problem of model overfitting and weak generalization due to the multi-center distribution of data; (ii) the problem of interclass ambiguity caused by motion blur and overexposure to endoscopic light; and (iii) the problem of intraclass inconsistency caused by the variety of morphologies and sizes of the same type of polyps. To address these challenges, we propose a new high-precision polyp segmentation framework, MEFA-Net, which consists of three modules, including the plug-and-play Mask Enhancement Module (MEG), Separable Path Attention Enhancement Module (SPAE), and Dynamic Global Attention Pool Module (DGAP). Specifically, firstly, the MEG module regionally masks the high-energy regions of the environment and polyps through a mask, which guides the model to rely on only a small amount of information to distinguish between polyps and background features, avoiding the model from overfitting the environmental information, and improving the robustness of the model. At the same time, this module can effectively counteract the "dark corner phenomenon" in the dataset and further improve the generalization performance of the model. Next, the SPAE module can effectively alleviate the inter-class fuzzy problem by strengthening the feature expression. Then, the DGAP module solves the intra-class inconsistency problem by extracting the invariance of scale, shape and position. Finally, we propose a new evaluation metric, MultiColoScore, for comprehensively evaluating the segmentation performance of the model on five datasets with different domains. We evaluated the new method quantitatively and qualitatively on five datasets using four metrics. Experimental results show that MEFA-Net significantly improves the accuracy of polyp segmentation and outperforms current state-of-the-art algorithms. Code posted on https://github.com/847001315/MEFA-Net.
Collapse
Affiliation(s)
- Xiao Ke
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China
| | - Guanhong Chen
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China
| | - Hao Liu
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China
| | - Wenzhong Guo
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China.
| |
Collapse
|
12
|
Lin L, Liu Y, Wu J, Cheng P, Cai Z, Wong KKY, Tang X. FedLPPA: Learning Personalized Prompt and Aggregation for Federated Weakly-Supervised Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:1127-1139. [PMID: 39423080 DOI: 10.1109/tmi.2024.3483221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2024]
Abstract
Federated learning (FL) effectively mitigates the data silo challenge brought about by policies and privacy concerns, implicitly harnessing more data for deep model training. However, traditional centralized FL models grapple with diverse multi-center data, especially in the face of significant data heterogeneity, notably in medical contexts. In the realm of medical image segmentation, the growing imperative to curtail annotation costs has amplified the importance of weakly-supervised techniques which utilize sparse annotations such as points, scribbles, etc. A pragmatic FL paradigm shall accommodate diverse annotation formats across different sites, which research topic remains under-investigated. In such context, we propose a novel personalized FL framework with learnable prompt and aggregation (FedLPPA) to uniformly leverage heterogeneous weak supervision for medical image segmentation. In FedLPPA, a learnable universal knowledge prompt is maintained, complemented by multiple learnable personalized data distribution prompts and prompts representing the supervision sparsity. Integrated with sample features through a dual-attention mechanism, those prompts empower each local task decoder to adeptly adjust to both the local distribution and the supervision form. Concurrently, a dual-decoder strategy, predicated on prompt similarity, is introduced for enhancing the generation of pseudo-labels in weakly-supervised learning, alleviating overfitting and noise accumulation inherent to local data, while an adaptable aggregation method is employed to customize the task decoder on a parameter-wise basis. Extensive experiments on four distinct medical image segmentation tasks involving different modalities underscore the superiority of FedLPPA, with its efficacy closely parallels that of fully supervised centralized training. Our code and data will be available at https://github.com/llmir/FedLPPA.
Collapse
|
13
|
Ovi TB, Bashree N, Nyeem H, Wahed MA. FocusU 2Net: Pioneering dual attention with gated U-Net for colonoscopic polyp segmentation. Comput Biol Med 2025; 186:109617. [PMID: 39793349 DOI: 10.1016/j.compbiomed.2024.109617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 12/09/2024] [Accepted: 12/22/2024] [Indexed: 01/13/2025]
Abstract
The detection and excision of colorectal polyps, precursors to colorectal cancer (CRC), can improve survival rates by up to 90%. Automated polyp segmentation in colonoscopy images expedites diagnosis and aids in the precise identification of adenomatous polyps, thus mitigating the burden of manual image analysis. This study introduces FocusU2Net, an innovative bi-level nested U-structure integrated with a dual-attention mechanism. The model integrates Focus Gate (FG) modules for spatial and channel-wise attention and Residual U-blocks (RSU) with multi-scale receptive fields for capturing diverse contextual information. Comprehensive evaluations on five benchmark datasets - Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, ETISLarib, and EndoScene - demonstrate Dice score improvements of 3.14% to 43.59% over state-of-the-art models, with an 85% success rate in cross-dataset validations, significantly surpassing prior competing models with sub-5% success rates. The model combines high segmentation accuracy with computational efficiency, featuring 46.64 million parameters, 78.09 GFLOPs, and 39.02 GMacs, making it suitable for real-time applications. Enhanced with Explainable AI techniques, FocusU2Net provides clear insights into its decision-making process, improving interpretability. This combination of high performance, efficiency, and transparency positions FocusU2Net as a powerful, scalable solution for automated polyp segmentation in clinical practice, advancing medical image analysis and computer-aided diagnosis.
Collapse
Affiliation(s)
- Tareque Bashar Ovi
- Department of EECE, Military Institute of Science and Technology (MIST), Mirpur Cantonment, Dhaka, 1216, Bangladesh.
| | - Nomaiya Bashree
- Department of EECE, Military Institute of Science and Technology (MIST), Mirpur Cantonment, Dhaka, 1216, Bangladesh.
| | - Hussain Nyeem
- Department of EECE, Military Institute of Science and Technology (MIST), Mirpur Cantonment, Dhaka, 1216, Bangladesh.
| | - Md Abdul Wahed
- Department of EECE, Military Institute of Science and Technology (MIST), Mirpur Cantonment, Dhaka, 1216, Bangladesh.
| |
Collapse
|
14
|
Li W, Zhang Y, Zhou H, Yang W, Xie Z, He Y. CLMS: Bridging domain gaps in medical imaging segmentation with source-free continual learning for robust knowledge transfer and adaptation. Med Image Anal 2025; 100:103404. [PMID: 39616943 DOI: 10.1016/j.media.2024.103404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 10/01/2024] [Accepted: 11/19/2024] [Indexed: 12/16/2024]
Abstract
Deep learning shows promise for medical image segmentation but suffers performance declines when applied to diverse healthcare sites due to data discrepancies among the different sites. Translating deep learning models to new clinical environments is challenging, especially when the original source data used for training is unavailable due to privacy restrictions. Source-free domain adaptation (SFDA) aims to adapt models to new unlabeled target domains without requiring access to the original source data. However, existing SFDA methods face challenges such as error propagation, misalignment of visual and structural features, and inability to preserve source knowledge. This paper introduces Continual Learning Multi-Scale domain adaptation (CLMS), an end-to-end SFDA framework integrating multi-scale reconstruction, continual learning, and style alignment to bridge domain gaps across medical sites using only unlabeled target data or publicly available data. Compared to the current state-of-the-art methods, CLMS consistently and significantly achieved top performance for different tasks, including prostate MRI segmentation (improved Dice of 10.87 %), colonoscopy polyp segmentation (improved Dice of 17.73 %), and plus disease classification from retinal images (improved AUC of 11.19 %). Crucially, CLMS preserved source knowledge for all the tasks, avoiding catastrophic forgetting. CLMS demonstrates a promising solution for translating deep learning models to new clinical imaging domains towards safe, reliable deployment across diverse healthcare settings.
Collapse
Affiliation(s)
- Weilu Li
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yun Zhang
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Hao Zhou
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Wenhan Yang
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
| | - Yao He
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
15
|
Mao X, Li H, Li X, Bai C, Ming W. C 2E-Net: Cascade attention and context-aware cross-level fusion network via edge learning guidance for polyp segmentation. Comput Biol Med 2025; 185:108770. [PMID: 39653624 DOI: 10.1016/j.compbiomed.2024.108770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 05/15/2024] [Accepted: 06/15/2024] [Indexed: 01/26/2025]
Abstract
Colorectal polyps are one of the most direct causes of colorectal cancer. Polypectomy can effectively block the process of colorectal cancer, but accurate polyp segmentation methods are required as an auxiliary means. However, there are several challenges associated with achieving accurate polyp segmentation, such as the large semantic gap between the encoder and decoder, the incomplete edges, and the potential confusion between folds in uncertain areas and target objects. To address the aforementioned challenges, an advanced polyp segmentation network (C2E-Net) is proposed, leveraging a cascaded attention mechanism and context-aware cross-level fusion guided by edge learning. Firstly, a cascade attention (CA) module is proposed to capture local feature details and increase the receptive field by setting different dilation rates in different convolutional layers, and combines the criss-cross attention mechanism for bridging the semantic gap between codecs. Subsequently, an edge learning guidance (ELG) module is designed that employs parallel axial attention operations to capture complementary edge information with sufficient detail to enrich feature details and edge features. Ultimately, to effectively integrate cross-level features and obtain rich global contextual information, a context-aware cross-level fusion (CCF) module is introduced through a multi-scale channel attention mechanism to minimize potential confusion between folds in uncertain areas and target objects. A plethora of experimental results has shown that C2E-Net is superior over the state-of-the-art methods, with average Dice coefficients on five polyp datasets of 94.54 %, 92.23 %, 82.24 %, 79.53 % and 89.84 %.
Collapse
Affiliation(s)
- Xu Mao
- School of Information, Yunnan University, Kunming, 650504, China
| | - Haiyan Li
- School of Information, Yunnan University, Kunming, 650504, China.
| | - Xiangxian Li
- School of Software, Shandong University, Jinan, 250101, China
| | - Chongbin Bai
- Otolaryngology Department, Honghe Prefecture Second People's Hospital, Jianshui, 654300, China
| | - Wenjun Ming
- The Primary School Affiliated to Yunnan University, Kunming, 650000, China
| |
Collapse
|
16
|
Du Y, Jiang Y, Tan S, Liu SQ, Li Z, Li G, Wan X. Highlighted Diffusion Model as Plug-In Priors for Polyp Segmentation. IEEE J Biomed Health Inform 2025; 29:1209-1220. [PMID: 39446534 DOI: 10.1109/jbhi.2024.3485767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2024]
Abstract
Automated polyp segmentation from colonoscopy images is crucial for colorectal cancer diagnosis. The accuracy of such segmentation, however, is challenged by two main factors. First, the variability in polyps' size, shape, and color, coupled with the scarcity of well-annotated data due to the need for specialized manual annotation, hampers the efficacy of existing deep learning methods. Second, concealed polyps often blend with adjacent intestinal tissues, leading to poor contrast that challenges segmentation models. Recently, diffusion models have been explored and adapted for polyp segmentation tasks. However, the significant domain gap between RGB-colonoscopy images and grayscale segmentation masks, along with the low efficiency of the diffusion generation process, hinders the practical implementation of these models. To mitigate these challenges, we introduce the Highlighted Diffusion Model Plus (HDM+), a two-stage polyp segmentation framework. This framework incorporates the Highlighted Diffusion Model (HDM) to provide explicit semantic guidance, thereby enhancing segmentation accuracy. In the initial stage, the HDM is trained using highlighted ground-truth data, which emphasizes polyp regions while suppressing the background in the images. This approach reduces the domain gap by focusing on the image itself rather than on the segmentation mask. In the subsequent second stage, we employ the highlighted features from the trained HDM's U-Net model as plug-in priors for polyp segmentation, rather than generating highlighted images, thereby increasing efficiency. Extensive experiments conducted on six polyp segmentation benchmarks demonstrate the effectiveness of our approach.
Collapse
|
17
|
Liu J, Shi Y, Huang D, Qu J. Neural Radiance Fields for High-Fidelity Soft Tissue Reconstruction in Endoscopy. SENSORS (BASEL, SWITZERLAND) 2025; 25:565. [PMID: 39860938 PMCID: PMC11769054 DOI: 10.3390/s25020565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 01/04/2025] [Accepted: 01/10/2025] [Indexed: 01/27/2025]
Abstract
The advancement of neural radiance fields (NeRFs) has facilitated the high-quality 3D reconstruction of complex scenes. However, for most NeRFs, reconstructing 3D tissues from endoscopy images poses significant challenges due to the occlusion of soft tissue regions by invalid pixels, deformations in soft tissue, and poor image quality, which severely limits their application in endoscopic scenarios. To address the above issues, we propose a novel framework to reconstruct high-fidelity soft tissue scenes from low-quality endoscopic images. We first construct an EndoTissue dataset of soft tissue regions in endoscopic images and fine-tune the Segment Anything Model (SAM) based on EndoTissue to obtain a potent segmentation network. Given a sequence of monocular endoscopic images, this segmentation network can quickly obtain the tissue mask images. Additionally, we incorporate tissue masks into a dynamic scene reconstruction method called Tensor4D to effectively guide the reconstruction of 3D deformable soft tissues. Finally, we propose adopting the image enhancement model EDAU-Net to improve the quality of the rendered views. The experimental results show that our method can effectively focus on the soft tissue regions in the image, achieving higher fidelity in detail and geometric structural integrity in reconstruction compared to state-of-the-art algorithms. Feedback from the user study indicates high participant scores for our method.
Collapse
Affiliation(s)
- Jinhua Liu
- Shanghai Film Academy, Shanghai University, Shanghai 200072, China; (J.L.); (D.H.); (J.Q.)
| | - Yongsheng Shi
- Shanghai Film Academy, Shanghai University, Shanghai 200072, China; (J.L.); (D.H.); (J.Q.)
| | - Dongjin Huang
- Shanghai Film Academy, Shanghai University, Shanghai 200072, China; (J.L.); (D.H.); (J.Q.)
- Shanghai Engineering Research Center of Motion Picture Special Effects, Shanghai 200072, China
| | - Jiantao Qu
- Shanghai Film Academy, Shanghai University, Shanghai 200072, China; (J.L.); (D.H.); (J.Q.)
| |
Collapse
|
18
|
Oukdach Y, Garbaz A, Kerkaou Z, Ansari ME, Koutti L, Ouafdi AFE, Salihoun M. InCoLoTransNet: An Involution-Convolution and Locality Attention-Aware Transformer for Precise Colorectal Polyp Segmentation in GI Images. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01389-7. [PMID: 39825142 DOI: 10.1007/s10278-025-01389-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 12/18/2024] [Accepted: 12/19/2024] [Indexed: 01/20/2025]
Abstract
Gastrointestinal (GI) disease examination presents significant challenges to doctors due to the intricate structure of the human digestive system. Colonoscopy and wireless capsule endoscopy are the most commonly used tools for GI examination. However, the large amount of data generated by these technologies requires the expertise and intervention of doctors for disease identification, making manual analysis a very time-consuming task. Thus, the development of a computer-assisted system is highly desirable to assist clinical professionals in making decisions in a low-cost and effective way. In this paper, we introduce a novel framework called InCoLoTransNet, designed for polyp segmentation. The study is based on a transformer and convolution-involution neural network, following the encoder-decoder architecture. We employed the vision transformer in the encoder section to focus on the global context, while the decoder involves a convolution-involution collaboration for resampling the polyp features. Involution enhances the model's ability to adaptively capture spatial and contextual information, while convolution focuses on local information, leading to more accurate feature extraction. The essential features captured by the transformer encoder are passed to the decoder through two skip connection pathways. The CBAM module refines the features and passes them to the convolution block, leveraging attention mechanisms to emphasize relevant information. Meanwhile, locality self-attention is employed to pass essential features to the involution block, reinforcing the model's ability to capture more global features in the polyp regions. Experiments were conducted on five public datasets: CVC-ClinicDB, CVC-ColonDB, Kvasir-SEG, Etis-LaribPolypDB, and CVC-300. The results obtained by InCoLoTransNet are optimal when compared with 15 state-of-the-art methods for polyp segmentation, achieving the highest mean dice score of 93% on CVC-ColonDB and 90% on mean intersection over union, outperforming the state-of-the-art methods. Additionally, InCoLoTransNet distinguishes itself in terms of polyp segmentation generalization performance. It achieved high scores in mean dice coefficient and mean intersection over union on unseen datasets as follows: 85% and 79% on CVC-ColonDB, 91% and 87% on CVC-300, and 79% and 70% on Etis-LaribPolypDB, respectively.
Collapse
Affiliation(s)
- Yassine Oukdach
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco.
| | - Anass Garbaz
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Zakaria Kerkaou
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Mohamed El Ansari
- Informatics and Applications Laboratory, Department of Computer Sciences, Faculty of Science, Moulay Ismail University, B.P 11201, Meknès, 52000, Morocco
| | - Lahcen Koutti
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Ahmed Fouad El Ouafdi
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Mouna Salihoun
- Faculty of Medicine and Pharmacy of Rabat, Mohammed V University of Rabat, Rabat, 10000, Morocco
| |
Collapse
|
19
|
Du X, Xu X, Chen J, Zhang X, Li L, Liu H, Li S. UM-Net: Rethinking ICGNet for polyp segmentation with uncertainty modeling. Med Image Anal 2025; 99:103347. [PMID: 39316997 DOI: 10.1016/j.media.2024.103347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 05/26/2024] [Accepted: 09/10/2024] [Indexed: 09/26/2024]
Abstract
Automatic segmentation of polyps from colonoscopy images plays a critical role in the early diagnosis and treatment of colorectal cancer. Nevertheless, some bottlenecks still exist. In our previous work, we mainly focused on polyps with intra-class inconsistency and low contrast, using ICGNet to solve them. Due to the different equipment, specific locations and properties of polyps, the color distribution of the collected images is inconsistent. ICGNet was designed primarily with reverse-contour guide information and local-global context information, ignoring this inconsistent color distribution, which leads to overfitting problems and makes it difficult to focus only on beneficial image content. In addition, a trustworthy segmentation model should not only produce high-precision results but also provide a measure of uncertainty to accompany its predictions so that physicians can make informed decisions. However, ICGNet only gives the segmentation result and lacks the uncertainty measure. To cope with these novel bottlenecks, we further extend the original ICGNet to a comprehensive and effective network (UM-Net) with two main contributions that have been proved by experiments to have substantial practical value. Firstly, we employ a color transfer operation to weaken the relationship between color and polyps, making the model more concerned with the shape of the polyps. Secondly, we provide the uncertainty to represent the reliability of the segmentation results and use variance to rectify uncertainty. Our improved method is evaluated on five polyp datasets, which shows competitive results compared to other advanced methods in both learning ability and generalization capability. The source code is available at https://github.com/dxqllp/UM-Net.
Collapse
Affiliation(s)
- Xiuquan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, China; School of Computer Science and Technology, Anhui University, Hefei, China
| | - Xuebin Xu
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Jiajia Chen
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Xuejun Zhang
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Lei Li
- Department of Neurology, Shuyang Affiliated Hospital of Nanjing University of Traditional Chinese Medicine, Suqian, China.
| | - Heng Liu
- Department of Gastroenterology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Shuo Li
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, USA
| |
Collapse
|
20
|
Gao J, Lao Q, Kang Q, Liu P, Du C, Li K, Zhang L. Boosting Your Context by Dual Similarity Checkup for In-Context Learning Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:310-319. [PMID: 39115986 DOI: 10.1109/tmi.2024.3440311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/10/2024]
Abstract
The recent advent of in-context learning (ICL) capabilities in large pre-trained models has yielded significant advancements in the generalization of segmentation models. By supplying domain-specific image-mask pairs, the ICL model can be effectively guided to produce optimal segmentation outcomes, eliminating the necessity for model fine-tuning or interactive prompting. However, current existing ICL-based segmentation models exhibit significant limitations when applied to medical segmentation datasets with substantial diversity. To address this issue, we propose a dual similarity checkup approach to guarantee the effectiveness of selected in-context samples so that their guidance can be maximally leveraged during inference. We first employ large pre-trained vision models for extracting strong semantic representations from input images and constructing a feature embedding memory bank for semantic similarity checkup during inference. Assuring the similarity in the input semantic space, we then minimize the discrepancy in the mask appearance distribution between the support set and the estimated mask appearance prior through similarity-weighted sampling and augmentation. We validate our proposed dual similarity checkup approach on eight publicly available medical segmentation datasets, and extensive experimental results demonstrate that our proposed method significantly improves the performance metrics of existing ICL-based segmentation models, particularly when applied to medical image datasets characterized by substantial diversity.
Collapse
|
21
|
Nguyen DC, Nguyen HL. ColonNeXt: Fully Convolutional Attention for Polyp Segmentation. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01342-0. [PMID: 39658740 DOI: 10.1007/s10278-024-01342-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 10/21/2024] [Accepted: 11/09/2024] [Indexed: 12/12/2024]
Abstract
This study introduces ColonNeXt, a novel fully convolutional attention-based model for polyp segmentation from colonoscopy images, aimed at the enhancing early detection of colorectal cancer. Utilizing a purely convolutional neural network (CNN), ColonNeXt integrates an encoder-decoder structure with a hierarchical multi-scale context-aware network (MSCAN) in the encoder and a convolutional block attention module (CBAM) in the decoder. The decoder further includes a proposed CNN-based feature attention mechanism for selective feature enhancement, ensuring precise segmentation. A new refinement module effectively improves boundary accuracy, addressing challenges such as variable polyp size, complex textures, and inconsistent illumination. Evaluations on standard datasets show that ColonNeXt achieves high accuracy and efficiency, significantly outperforming competing methods. These results confirm its robustness and precision, establishing ColonNeXt as a state-of-the-art model for polyp segmentation. The code is available at: https://github.com/long-nguyen12/colonnext-pytorch .
Collapse
Affiliation(s)
- Dinh Cong Nguyen
- Hong Duc University, 565 Quang Trung, Dong Ve Ward, Thanh Hoa, 40000, Thanh Hoa, Viet Nam.
| | - Hoang Long Nguyen
- Hong Duc University, 565 Quang Trung, Dong Ve Ward, Thanh Hoa, 40000, Thanh Hoa, Viet Nam.
| |
Collapse
|
22
|
Song Z, Kang X, Wei X, Li S. Pixel-Centric Context Perception Network for Camouflaged Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18576-18589. [PMID: 37819817 DOI: 10.1109/tnnls.2023.3319323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Camouflaged object detection (COD) aims to identify object pixels visually embedded in the background environment. Existing deep learning methods fail to utilize the context information around different pixels adequately and efficiently. In order to solve this problem, a novel pixel-centric context perception network (PCPNet) is proposed, the core of which is to customize the personalized context of each pixel based on the automatic estimation of its surroundings. Specifically, PCPNet first employs an elegant encoder equipped with the designed vital component generation (VCG) module to obtain a set of compact features rich in low-level spatial and high-level semantic information across multiple subspaces. Then, we present a parameter-free pixel importance estimation (PIE) function based on multiwindow information fusion. Object pixels with complex backgrounds will be assigned with higher PIE values. Subsequently, PIE is utilized to regularize the optimization loss. In this way, the network can pay more attention to those pixels with higher PIE values in the decoding stage. Finally, a local continuity refinement module (LCRM) is used to refine the detection results. Extensive experiments on four COD benchmarks, five salient object detection (SOD) benchmarks, and five polyp segmentation benchmarks demonstrate the superiority of PCPNet with respect to other state-of-the-art methods.
Collapse
|
23
|
He C, Li K, Xu G, Yan J, Tang L, Zhang Y, Wang Y, Li X. HQG-Net: Unpaired Medical Image Enhancement With High-Quality Guidance. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18404-18418. [PMID: 37796672 DOI: 10.1109/tnnls.2023.3315307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/07/2023]
Abstract
Unpaired medical image enhancement (UMIE) aims to transform a low-quality (LQ) medical image into a high-quality (HQ) one without relying on paired images for training. While most existing approaches are based on Pix2Pix/CycleGAN and are effective to some extent, they fail to explicitly use HQ information to guide the enhancement process, which can lead to undesired artifacts and structural distortions. In this article, we propose a novel UMIE approach that avoids the above limitation of existing methods by directly encoding HQ cues into the LQ enhancement process in a variational fashion and thus model the UMIE task under the joint distribution between the LQ and HQ domains. Specifically, we extract features from an HQ image and explicitly insert the features, which are expected to encode HQ cues, into the enhancement network to guide the LQ enhancement with the variational normalization module. We train the enhancement network adversarially with a discriminator to ensure the generated HQ image falls into the HQ domain. We further propose a content-aware loss to guide the enhancement process with wavelet-based pixel-level and multiencoder-based feature-level constraints. Additionally, as a key motivation for performing image enhancement is to make the enhanced images serve better for downstream tasks, we propose a bi-level learning scheme to optimize the UMIE task and downstream tasks cooperatively, helping generate HQ images both visually appealing and favorable for downstream tasks. Experiments on three medical datasets verify that our method outperforms existing techniques in terms of both enhancement quality and downstream task performance. The code and the newly collected datasets are publicly available at https://github.com/ChunmingHe/HQG-Net.
Collapse
|
24
|
Wei X, Sun J, Su P, Wan H, Ning Z. BCL-Former: Localized Transformer Fusion with Balanced Constraint for polyp image segmentation. Comput Biol Med 2024; 182:109182. [PMID: 39341109 DOI: 10.1016/j.compbiomed.2024.109182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 09/18/2024] [Accepted: 09/19/2024] [Indexed: 09/30/2024]
Abstract
Polyp segmentation remains challenging for two reasons: (a) the size and shape of colon polyps are variable and diverse; (b) the distinction between polyps and mucosa is not obvious. To solve the above two challenging problems and enhance the generalization ability of segmentation method, we propose the Localized Transformer Fusion with Balanced Constraint (BCL-Former) for Polyp Segmentation. In BCL-Former, the Strip Local Enhancement module (SLE module) is proposed to capture the enhanced local features. The Progressive Feature Fusion module (PFF module) is presented to make the feature aggregation smoother and eliminate the difference between high-level and low-level features. Moreover, the Tversky-based Appropriate Constrained Loss (TacLoss) is proposed to achieve the balance and constraint between True Positives and False Negatives, improving the ability to generalize across datasets. Extensive experiments are conducted on four benchmark datasets. Results show that our proposed method achieves state-of-the-art performance in both segmentation precision and generalization ability. Also, the proposed method is 5%-8% faster than the benchmark method in training and inference. The code is available at: https://github.com/sjc-lbj/BCL-Former.
Collapse
Affiliation(s)
- Xin Wei
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| | - Jiacheng Sun
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| | - Pengxiang Su
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| | - Huan Wan
- School of Computer Information Engineering, Jiangxi Normal University, 99 Ziyang Avenue, Nanchang, 330022, China.
| | - Zhitao Ning
- School of Software, Nanchang University, 235 East Nanjing Road, Nanchang, 330047, China
| |
Collapse
|
25
|
Xu W, Xu R, Wang C, Li X, Xu S, Guo L. PSTNet: Enhanced Polyp Segmentation With Multi-Scale Alignment and Frequency Domain Integration. IEEE J Biomed Health Inform 2024; 28:6042-6053. [PMID: 38954569 DOI: 10.1109/jbhi.2024.3421550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
Accurate segmentation of colorectal polyps in colonoscopy images is crucial for effective diagnosis and management of colorectal cancer (CRC). However, current deep learning-based methods primarily rely on fusing RGB information across multiple scales, leading to limitations in accurately identifying polyps due to restricted RGB domain information and challenges in feature misalignment during multi-scale aggregation. To address these limitations, we propose the Polyp Segmentation Network with Shunted Transformer (PSTNet), a novel approach that integrates both RGB and frequency domain cues present in the images. PSTNet comprises three key modules: the Frequency Characterization Attention Module (FCAM) for extracting frequency cues and capturing polyp characteristics, the Feature Supplementary Alignment Module (FSAM) for aligning semantic information and reducing misalignment noise, and the Cross Perception localization Module (CPM) for synergizing frequency cues with high-level semantics to achieve efficient polyp segmentation. Extensive experiments on challenging datasets demonstrate PSTNet's significant improvement in polyp segmentation accuracy across various metrics, consistently outperforming state-of-the-art methods. The integration of frequency domain cues and the novel architectural design of PSTNet contribute to advancing computer-assisted polyp segmentation, facilitating more accurate diagnosis and management of CRC.
Collapse
|
26
|
Manan MA, Feng J, Yaqub M, Ahmed S, Imran SMA, Chuhan IS, Khan HA. Multi-scale and multi-path cascaded convolutional network for semantic segmentation of colorectal polyps. ALEXANDRIA ENGINEERING JOURNAL 2024; 105:341-359. [DOI: 10.1016/j.aej.2024.06.095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
|
27
|
Dai D, Dong C, Yan Q, Sun Y, Zhang C, Li Z, Xu S. I 2U-Net: A dual-path U-Net with rich information interaction for medical image segmentation. Med Image Anal 2024; 97:103241. [PMID: 38897032 DOI: 10.1016/j.media.2024.103241] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 04/27/2024] [Accepted: 06/10/2024] [Indexed: 06/21/2024]
Abstract
Although the U-shape networks have achieved remarkable performances in many medical image segmentation tasks, they rarely model the sequential relationship of hierarchical layers. This weakness makes it difficult for the current layer to effectively utilize the historical information of the previous layer, leading to unsatisfactory segmentation results for lesions with blurred boundaries and irregular shapes. To solve this problem, we propose a novel dual-path U-Net, dubbed I2U-Net. The newly proposed network encourages historical information re-usage and re-exploration through rich information interaction among the dual paths, allowing deep layers to learn more comprehensive features that contain both low-level detail description and high-level semantic abstraction. Specifically, we introduce a multi-functional information interaction module (MFII), which can model cross-path, cross-layer, and cross-path-and-layer information interactions via a unified design, making the proposed I2U-Net behave similarly to an unfolded RNN and enjoying its advantage of modeling time sequence information. Besides, to further selectively and sensitively integrate the information extracted by the encoder of the dual paths, we propose a holistic information fusion and augmentation module (HIFA), which can efficiently bridge the encoder and the decoder. Extensive experiments on four challenging tasks, including skin lesion, polyp, brain tumor, and abdominal multi-organ segmentation, consistently show that the proposed I2U-Net has superior performance and generalization ability over other state-of-the-art methods. The code is available at https://github.com/duweidai/I2U-Net.
Collapse
Affiliation(s)
- Duwei Dai
- National-Local Joint Engineering Research Center of Biodiagnosis & Biotherapy, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China; Institute of Medical Artificial Intelligence, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China
| | - Caixia Dong
- Institute of Medical Artificial Intelligence, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China
| | - Qingsen Yan
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Yongheng Sun
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Chunyan Zhang
- National-Local Joint Engineering Research Center of Biodiagnosis & Biotherapy, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China
| | - Zongfang Li
- National-Local Joint Engineering Research Center of Biodiagnosis & Biotherapy, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China; Institute of Medical Artificial Intelligence, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China.
| | - Songhua Xu
- Institute of Medical Artificial Intelligence, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China.
| |
Collapse
|
28
|
Bhattacharya D, Reuter K, Behrendt F, Maack L, Grube S, Schlaefer A. PolypNextLSTM: a lightweight and fast polyp video segmentation network using ConvNext and ConvLSTM. Int J Comput Assist Radiol Surg 2024; 19:2111-2119. [PMID: 39115609 PMCID: PMC11442634 DOI: 10.1007/s11548-024-03244-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 07/18/2024] [Indexed: 10/02/2024]
Abstract
PURPOSE Commonly employed in polyp segmentation, single-image UNet architectures lack the temporal insight clinicians gain from video data in diagnosing polyps. To mirror clinical practices more faithfully, our proposed solution, PolypNextLSTM, leverages video-based deep learning, harnessing temporal information for superior segmentation performance with least parameter overhead, making it possibly suitable for edge devices. METHODS PolypNextLSTM employs a UNet-like structure with ConvNext-Tiny as its backbone, strategically omitting the last two layers to reduce parameter overhead. Our temporal fusion module, a Convolutional Long Short Term Memory (ConvLSTM), effectively exploits temporal features. Our primary novelty lies in PolypNextLSTM, which stands out as the leanest in parameters and the fastest model, surpassing the performance of five state-of-the-art image and video-based deep learning models. The evaluation of the SUN-SEG dataset spans easy-to-detect and hard-to-detect polyp scenarios, along with videos containing challenging artefacts like fast motion and occlusion. RESULTS Comparison against 5 image-based and 5 video-based models demonstrates PolypNextLSTM's superiority, achieving a Dice score of 0.7898 on the hard-to-detect polyp test set, surpassing image-based PraNet (0.7519) and video-based PNS+ (0.7486). Notably, our model excels in videos featuring complex artefacts such as ghosting and occlusion. CONCLUSION PolypNextLSTM, integrating pruned ConvNext-Tiny with ConvLSTM for temporal fusion, not only exhibits superior segmentation performance but also maintains the highest frames per speed among evaluated models. Code can be found here: https://github.com/mtec-tuhh/PolypNextLSTM .
Collapse
Affiliation(s)
- Debayan Bhattacharya
- Institute of Medical Technology and Intelligent Systems, Technische Universitaet Hamburg, Hamburg, Germany
| | - Konrad Reuter
- Institute of Medical Technology and Intelligent Systems, Technische Universitaet Hamburg, Hamburg, Germany.
| | - Finn Behrendt
- Institute of Medical Technology and Intelligent Systems, Technische Universitaet Hamburg, Hamburg, Germany
| | - Lennart Maack
- Institute of Medical Technology and Intelligent Systems, Technische Universitaet Hamburg, Hamburg, Germany
| | - Sarah Grube
- Institute of Medical Technology and Intelligent Systems, Technische Universitaet Hamburg, Hamburg, Germany
| | - Alexander Schlaefer
- Institute of Medical Technology and Intelligent Systems, Technische Universitaet Hamburg, Hamburg, Germany
| |
Collapse
|
29
|
Tudela Y, Majó M, de la Fuente N, Galdran A, Krenzer A, Puppe F, Yamlahi A, Tran TN, Matuszewski BJ, Fitzgerald K, Bian C, Pan J, Liu S, Fernández-Esparrach G, Histace A, Bernal J. A complete benchmark for polyp detection, segmentation and classification in colonoscopy images. Front Oncol 2024; 14:1417862. [PMID: 39381041 PMCID: PMC11458519 DOI: 10.3389/fonc.2024.1417862] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 07/11/2024] [Indexed: 10/10/2024] Open
Abstract
Introduction Colorectal cancer (CRC) is one of the main causes of deaths worldwide. Early detection and diagnosis of its precursor lesion, the polyp, is key to reduce its mortality and to improve procedure efficiency. During the last two decades, several computational methods have been proposed to assist clinicians in detection, segmentation and classification tasks but the lack of a common public validation framework makes it difficult to determine which of them is ready to be deployed in the exploration room. Methods This study presents a complete validation framework and we compare several methodologies for each of the polyp characterization tasks. Results Results show that the majority of the approaches are able to provide good performance for the detection and segmentation task, but that there is room for improvement regarding polyp classification. Discussion While studied show promising results in the assistance of polyp detection and segmentation tasks, further research should be done in classification task to obtain reliable results to assist the clinicians during the procedure. The presented framework provides a standarized method for evaluating and comparing different approaches, which could facilitate the identification of clinically prepared assisting methods.
Collapse
Affiliation(s)
- Yael Tudela
- Computer Vision Center and Computer Science Department, Universitat Autònoma de Cerdanyola del Valles, Barcelona, Spain
| | - Mireia Majó
- Computer Vision Center and Computer Science Department, Universitat Autònoma de Cerdanyola del Valles, Barcelona, Spain
| | - Neil de la Fuente
- Computer Vision Center and Computer Science Department, Universitat Autònoma de Cerdanyola del Valles, Barcelona, Spain
| | - Adrian Galdran
- Department of Information and Communication Technologies, SymBioSys Research Group, BCNMedTech, Barcelona, Spain
| | - Adrian Krenzer
- Artificial Intelligence and Knowledge Systems, Institute for Computer Science, Julius-Maximilians University of Würzburg, Würzburg, Germany
| | - Frank Puppe
- Artificial Intelligence and Knowledge Systems, Institute for Computer Science, Julius-Maximilians University of Würzburg, Würzburg, Germany
| | - Amine Yamlahi
- Division of Intelligent Medical Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Thuy Nuong Tran
- Division of Intelligent Medical Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Bogdan J. Matuszewski
- Computer Vision and Machine Learning (CVML) Research Group, University of Central Lancashir (UCLan), Preston, United Kingdom
| | - Kerr Fitzgerald
- Computer Vision and Machine Learning (CVML) Research Group, University of Central Lancashir (UCLan), Preston, United Kingdom
| | - Cheng Bian
- Hebei University of Technology, Baoding, China
| | | | - Shijle Liu
- Hebei University of Technology, Baoding, China
| | | | - Aymeric Histace
- ETIS UMR 8051, École Nationale Supérieure de l'Électronique et de ses Applications (ENSEA), Centre national de la recherche scientifique (CNRS), CY Paris Cergy University, Cergy, France
| | - Jorge Bernal
- Computer Vision Center and Computer Science Department, Universitat Autònoma de Cerdanyola del Valles, Barcelona, Spain
| |
Collapse
|
30
|
Meng L, Li Y, Duan W. Three-stage polyp segmentation network based on reverse attention feature purification with Pyramid Vision Transformer. Comput Biol Med 2024; 179:108930. [PMID: 39067285 DOI: 10.1016/j.compbiomed.2024.108930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 06/30/2024] [Accepted: 07/18/2024] [Indexed: 07/30/2024]
Abstract
Colorectal polyps serve as potential precursors of colorectal cancer and automating polyp segmentation aids physicians in accurately identifying potential polyp regions, thereby reducing misdiagnoses and missed diagnoses. However, existing models often fall short in accurately segmenting polyps due to the high degree of similarity between polyp regions and surrounding tissue in terms of color, texture, and shape. To address this challenge, this study proposes a novel three-stage polyp segmentation network, named Reverse Attention Feature Purification with Pyramid Vision Transformer (RAFPNet), which adopts an iterative feedback UNet architecture to refine polyp saliency maps for precise segmentation. Initially, a Multi-Scale Feature Aggregation (MSFA) module is introduced to generate preliminary polyp saliency maps. Subsequently, a Reverse Attention Feature Purification (RAFP) module is devised to effectively suppress low-level surrounding tissue features while enhancing high-level semantic polyp information based on the preliminary saliency maps. Finally, the UNet architecture is leveraged to further refine the feature maps in a coarse-to-fine approach. Extensive experiments conducted on five widely used polyp segmentation datasets and three video polyp segmentation datasets demonstrate the superior performance of RAFPNet over state-of-the-art models across multiple evaluation metrics.
Collapse
Affiliation(s)
- Lingbing Meng
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China
| | - Yuting Li
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China
| | - Weiwei Duan
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China.
| |
Collapse
|
31
|
Liu J, Jiao G. Cross-domain additive learning of new knowledge rather than replacement. Biomed Eng Lett 2024; 14:1137-1146. [PMID: 39220031 PMCID: PMC11362399 DOI: 10.1007/s13534-024-00399-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 01/10/2024] [Accepted: 05/27/2024] [Indexed: 09/04/2024] Open
Abstract
In medical clinical scenarios for reasons such as patient privacy, information protection and data migration, when domain adaptation is needed for real scenarios, the source-domain data is often inaccessible and only the pre-trained source model on the source-domain is available. Existing solutions for this type of problem tend to forget the rich task experience previously learned on the source domain after adapting, which means that the model simply overfits the target-domain data when adapting and does not learn robust features that facilitate real task decisions. We address this problem by exploring the particular application of source-free domain adaptation in medical image segmentation and propose a two-stage additive source-free adaptation framework. We generalize the domain-invariant features by constraining the core pathological structure and semantic consistency between different perspectives. And we reduce the segmentation generated by locating and filtering elements that may have errors through Monte-Carlo uncertainty estimation. We conduct comparison experiments with some other methods on a cross-device polyp segmentation and a cross-modal brain tumor segmentation dataset, the results in both the target and source domains verify that the proposed method can effectively solve the domain offset problem and the model retains its dominance on the source domain after learning new knowledge of the target domain.This work provides valuable exploration for achieving additive learning on the target and source domains in the absence of source data and offers new ideas and methods for adaptation research in the field of medical image segmentation.
Collapse
Affiliation(s)
- Jiahao Liu
- College of Computer Science, Hengyang Normal University, Hengyang, 421008 China
| | - Ge Jiao
- College of Computer Science, Hengyang Normal University, Hengyang, 421008 China
| |
Collapse
|
32
|
Arsa DMS, Ilyas T, Park SH, Chua L, Kim H. Efficient multi-stage feedback attention for diverse lesion in cancer image segmentation. Comput Med Imaging Graph 2024; 116:102417. [PMID: 39067303 DOI: 10.1016/j.compmedimag.2024.102417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 04/11/2024] [Accepted: 07/10/2024] [Indexed: 07/30/2024]
Abstract
In the domain of Computer-Aided Diagnosis (CAD) systems, the accurate identification of cancer lesions is paramount, given the life-threatening nature of cancer and the complexities inherent in its manifestation. This task is particularly arduous due to the often vague boundaries of cancerous regions, compounded by the presence of noise and the heterogeneity in the appearance of lesions, making precise segmentation a critical yet challenging endeavor. This study introduces an innovative, an iterative feedback mechanism tailored for the nuanced detection of cancer lesions in a variety of medical imaging modalities, offering a refining phase to adjust detection results. The core of our approach is the elimination of the need for an initial segmentation mask, a common limitation in iterative-based segmentation methods. Instead, we utilize a novel system where the feedback for refining segmentation is derived directly from the encoder-decoder architecture of our neural network model. This shift allows for more dynamic and accurate lesion identification. To further enhance the accuracy of our CAD system, we employ a multi-scale feedback attention mechanism to guide and refine predicted mask subsequent iterations. In parallel, we introduce a sophisticated weighted feedback loss function. This function synergistically combines global and iteration-specific loss considerations, thereby refining parameter estimation and improving the overall precision of the segmentation. We conducted comprehensive experiments across three distinct categories of medical imaging: colonoscopy, ultrasonography, and dermoscopic images. The experimental results demonstrate that our method not only competes favorably with but also surpasses current state-of-the-art methods in various scenarios, including both standard and challenging out-of-domain tasks. This evidences the robustness and versatility of our approach in accurately identifying cancer lesions across a spectrum of medical imaging contexts. Our source code can be found at https://github.com/dewamsa/EfficientFeedbackNetwork.
Collapse
Affiliation(s)
- Dewa Made Sri Arsa
- Division of Electronics and Information Engineering, Jeonbuk National University, Republic of Korea; Department of Information Technology, Universitas Udayana, Indonesia; Core Research Institute of Intelligent Robots, Jeonbuk National University, Republic of Korea.
| | - Talha Ilyas
- Division of Electronics and Information Engineering, Jeonbuk National University, Republic of Korea; Core Research Institute of Intelligent Robots, Jeonbuk National University, Republic of Korea.
| | - Seok-Hwan Park
- Division of Electronic Engineering, Jeonbuk National University, Republic of Korea.
| | - Leon Chua
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA.
| | - Hyongsuk Kim
- Core Research Institute of Intelligent Robots, Jeonbuk National University, Republic of Korea.
| |
Collapse
|
33
|
Yan S, Yang B, Chen A. A differential network with multiple gated reverse attention for medical image segmentation. Sci Rep 2024; 14:20274. [PMID: 39217265 PMCID: PMC11365968 DOI: 10.1038/s41598-024-71194-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 08/26/2024] [Indexed: 09/04/2024] Open
Abstract
UNet architecture has achieved great success in medical image segmentation applications. However, these models still encounter several challenges. One is the loss of pixel-level information caused by multiple down-sampling steps. Additionally, the addition or concatenation method used in the decoder can generate redundant information. These limitations affect the localization ability, weaken the complementarity of features at different levels and can lead to blurred boundaries. However, differential features can effectively compensate for these shortcomings and significantly enhance the performance of image segmentation. Therefore, we propose MGRAD-UNet (multi-gated reverse attention multi-scale differential UNet) based on UNet. We utilize the multi-scale differential decoder to generate abundant differential features at both the pixel level and structure level. These features which serve as gate signals, are transmitted to the gate controller and forwarded to the other differential decoder. In order to enhance the focus on important regions, another differential decoder is equipped with reverse attention. The features obtained by two differential decoders are differentiated for the second time. The resulting differential feature obtained is sent back to the controller as a control signal, then transmitted to the encoder for learning the differential feature by two differential decoders. The core design of MGRAD-UNet lies in extracting comprehensive and accurate features through caching overall differential features and multi-scale differential processing, enabling iterative learning from diverse information. We evaluate MGRAD-UNet against state-of-theart (SOTA) methods on two public datasets. Our method surpasses competitors and provides a new approach for the design of UNet.
Collapse
Affiliation(s)
- Shun Yan
- School of Electronic and Information Engineering, Taizhou University, Taizhou, 318000, Zhejiang, China
| | - Benquan Yang
- School of Electronic and Information Engineering, Taizhou University, Taizhou, 318000, Zhejiang, China.
| | - Aihua Chen
- School of Electronic and Information Engineering, Taizhou University, Taizhou, 318000, Zhejiang, China.
| |
Collapse
|
34
|
Tang S, Ran H, Yang S, Wang Z, Li W, Li H, Meng Z. A frequency selection network for medical image segmentation. Heliyon 2024; 10:e35698. [PMID: 39220902 PMCID: PMC11365330 DOI: 10.1016/j.heliyon.2024.e35698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 07/18/2024] [Accepted: 08/01/2024] [Indexed: 09/04/2024] Open
Abstract
Existing medical image segmentation methods may only consider feature extraction and information processing in spatial domain, or lack the design of interaction between frequency information and spatial information, or ignore the semantic gaps between shallow and deep features, and lead to inaccurate segmentation results. Therefore, in this paper, we propose a novel frequency selection segmentation network (FSSN), which achieves more accurate lesion segmentation by fusing local spatial features and global frequency information, better design of feature interactions, and suppressing low correlation frequency components for mitigating semantic gaps. Firstly, we propose a global-local feature aggregation module (GLAM) to simultaneously capture multi-scale local features in the spatial domain and exploits global frequency information in the frequency domain, and achieves complementary fusion of local details features and global frequency information. Secondly, we propose a feature filter module (FFM) to mitigate semantic gaps when we conduct cross-level features fusion, and makes FSSN discriminatively determine which frequency information should be preserved for accurate lesion segmentation. Finally, in order to make better use of local information, especially the boundary of lesion region, we employ deformable convolution (DC) to extract pertinent features in the local range, and makes our FSSN can focus on relevant image contents better. Extensive experiments on two public benchmark datasets show that compared with representative medical image segmentation methods, our FSSN can obtain more accurate lesion segmentation results in terms of both objective evaluation indicators and subjective visual effects with fewer parameters and lower computational complexity.
Collapse
Affiliation(s)
- Shu Tang
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Haiheng Ran
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Shuli Yang
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Zhaoxia Wang
- Chongqing Emergency Medical Center, Chongqing University Central Hospital, School of Medicine, Chongqing University, Chongqing, China
| | - Wei Li
- Children’s Hospital of Chongqing Medical University, China
| | - Haorong Li
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Zihao Meng
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| |
Collapse
|
35
|
ELKarazle K, Raman V, Chua C, Then P. A Hessian-Based Technique for Specular Reflection Detection and Inpainting in Colonoscopy Images. IEEE J Biomed Health Inform 2024; 28:4724-4736. [PMID: 38787660 DOI: 10.1109/jbhi.2024.3404955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
In the field of Computer-Aided Detection (CADx), the use of AI-based algorithms for disease detection in endoscopy images, especially colonoscopy images, is on the rise. However, these algorithms often encounter performance issues due to obstructions like specular reflection, resulting in false positives. This paper presents a novel algorithm specifically designed to tackle the challenges posed by high specular reflection regions in colonoscopy images. The proposed algorithm identifies these regions and applies precise inpainting for restoration. The process entails converting the input image from RGB to HSV color space and focusing on the Saturation (S) component in convex regions detected using a Hessian-based method. This step creates a binary mask that pinpoints areas of specular reflection. The inpainting function then uses this mask to guide the restoration of these identified regions and their borders. To ensure a seamless blend of the restored regions with the background and adjacent pixels, a feathering process is applied to the repaired regions. This enhances both the accuracy and aesthetic coherence of the inpainted images. The performance of our algorithm was rigorously tested on five unique colonoscopy datasets and various endoscopy images from the Kvasir dataset, using an extensive set of evaluation metrics and a comparative analysis with existing methods consistently highlighted the superior performance of our algorithm.
Collapse
|
36
|
Rajasekar D, Theja G, Prusty MR, Chinara S. Efficient colorectal polyp segmentation using wavelet transformation and AdaptUNet: A hybrid U-Net. Heliyon 2024; 10:e33655. [PMID: 39040380 PMCID: PMC11261057 DOI: 10.1016/j.heliyon.2024.e33655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 03/06/2024] [Accepted: 06/25/2024] [Indexed: 07/24/2024] Open
Abstract
The prevalence of colorectal cancer, primarily emerging from polyps, underscores the importance of their early detection in colonoscopy images. Due to the inherent complexity and variability of polyp appearances, the task stands difficult despite recent advances in medical technology. To tackle these challenges, a deep learning model featuring a customized U-Net architecture, AdaptUNet is proposed. Attention mechanisms and skip connections facilitate the effective combination of low-level details and high-level contextual information for accurate polyp segmentation. Further, wavelet transformations are used to extract useful features overlooked in conventional image processing. The model achieves benchmark results with a Dice coefficient of 0.9104, an Intersection over Union (IoU) coefficient of 0.8368, and a Balanced Accuracy of 0.9880 on the CVC-300 dataset. Additionally, it shows exceptional performance on other datasets, including Kvasir-SEG and Etis-LaribDB. Training was performed using the Hyper Kvasir segmented images dataset, further evidencing the model's ability to handle diverse data inputs. The proposed method offers a comprehensive and efficient implementation for polyp detection without compromising performance, thus promising an improved precision and reduction in manual labour for colorectal polyp detection.
Collapse
Affiliation(s)
- Devika Rajasekar
- School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
| | - Girish Theja
- School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
| | - Manas Ranjan Prusty
- Centre for Cyber Physical Systems, Vellore Institute of Technology, Chennai, India
| | - Suchismita Chinara
- Department of Computer Science and Engineering, National Institute of Technology, Rourkela, India
| |
Collapse
|
37
|
Huang C, Shi Y, Zhang B, Lyu K. Uncertainty-aware prototypical learning for anomaly detection in medical images. Neural Netw 2024; 175:106284. [PMID: 38593560 DOI: 10.1016/j.neunet.2024.106284] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 03/14/2024] [Accepted: 03/29/2024] [Indexed: 04/11/2024]
Abstract
Anomalous object detection (AOD) in medical images aims to recognize the anomalous lesions, and is crucial for early clinical diagnosis of various cancers. However, it is a difficult task because of two reasons: (1) the diversity of the anomalous lesions and (2) the ambiguity of the boundary between anomalous lesions and their normal surroundings. Unlike existing single-modality AOD models based on deterministic mapping, we constructed a probabilistic and deterministic AOD model. Specifically, we designed an uncertainty-aware prototype learning framework, which considers the diversity and ambiguity of anomalous lesions. A prototypical learning transformer (Pformer) is established to extract and store the prototype features of different anomalous lesions. Moreover, Bayesian neural uncertainty quantizer, a probabilistic model, is designed to model the distributions over the outputs of the model to measure the uncertainty of the model's detection results for each pixel. Essentially, the uncertainty of the model's anomaly detection result for a pixel can reflect the anomalous ambiguity of this pixel. Furthermore, an uncertainty-guided reasoning transformer (Uformer) is devised to employ the anomalous ambiguity, encouraging the proposed model to focus on pixels with high uncertainty. Notably, prototypical representations stored in Pformer are also utilized in anomaly reasoning that enables the model to perceive diversities of the anomalous objects. Extensive experiments on five benchmark datasets demonstrate the superiority of our proposed method. The source code will be available in github.com/umchaohuang/UPformer.
Collapse
Affiliation(s)
- Chao Huang
- PAMI Research Group, Department of Computer and Information Science, University of Macau, Taipa, 519000, Macao Special Administrative Region of China; Shenzhen Campus of Sun Yat-sen University, School of Cyber Science and Technology, Shenzhen, 518107, China
| | - Yushu Shi
- Shenzhen Campus of Sun Yat-sen University, School of Cyber Science and Technology, Shenzhen, 518107, China
| | - Bob Zhang
- PAMI Research Group, Department of Computer and Information Science, University of Macau, Taipa, 519000, Macao Special Administrative Region of China.
| | - Ke Lyu
- School of Engineering Sciences, University of the Chinese Academy of Sciences, Beijing, 100049, China; Pengcheng Laboratory, Shenzhen, 518055, China
| |
Collapse
|
38
|
Li Z, Yi M, Uneri A, Niu S, Jones C. RTA-Former: Reverse Transformer Attention for Polyp Segmentation. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-5. [PMID: 40031481 DOI: 10.1109/embc53108.2024.10782181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Polyp segmentation is a key aspect of colorectal cancer prevention, enabling early detection and guiding subsequent treatments. Intelligent diagnostic tools, including deep learning solutions, are widely explored to streamline and potentially automate this process. However, even with many powerful network architectures, there still comes the problem of producing accurate edge segmentation. In this paper, we introduce a novel network, namely RTA-Former, that employs a transformer model as the encoder backbone and innovatively adapts Reverse Attention (RA) with a transformer stage in the decoder for enhanced edge segmentation. The results of the experiments illustrate that RTA-Former achieves state-of-the-art (SOTA) performance in five polyp segmentation datasets. The strong capability of RTA-Former holds promise in improving the accuracy of Transformer-based polyp segmentation, potentially leading to better clinical decisions and patient outcomes. Our code is publicly available on ${\color{Magenta}{\text{GitHub}}}$.
Collapse
|
39
|
Wan L, Chen Z, Xiao Y, Zhao J, Feng W, Fu H. Iterative feedback-based models for image and video polyp segmentation. Comput Biol Med 2024; 177:108569. [PMID: 38781640 DOI: 10.1016/j.compbiomed.2024.108569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 03/27/2024] [Accepted: 05/05/2024] [Indexed: 05/25/2024]
Abstract
Accurate segmentation of polyps in colonoscopy images has gained significant attention in recent years, given its crucial role in automated colorectal cancer diagnosis. Many existing deep learning-based methods follow a one-stage processing pipeline, often involving feature fusion across different levels or utilizing boundary-related attention mechanisms. Drawing on the success of applying Iterative Feedback Units (IFU) in image polyp segmentation, this paper proposes FlowICBNet by extending the IFU to the domain of video polyp segmentation. By harnessing the unique capabilities of IFU to propagate and refine past segmentation results, our method proves effective in mitigating challenges linked to the inherent limitations of endoscopic imaging, notably the presence of frequent camera shake and frame defocusing. Furthermore, in FlowICBNet, we introduce two pivotal modules: Reference Frame Selection (RFS) and Flow Guided Warping (FGW). These modules play a crucial role in filtering and selecting the most suitable historical reference frames for the task at hand. The experimental results on a large video polyp segmentation dataset demonstrate that our method can significantly outperform state-of-the-art methods by notable margins achieving an average metrics improvement of 7.5% on SUN-SEG-Easy and 7.4% on SUN-SEG-Hard. Our code is available at https://github.com/eraserNut/ICBNet.
Collapse
Affiliation(s)
- Liang Wan
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Zhihao Chen
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Yefan Xiao
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Junting Zhao
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Wei Feng
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Huazhu Fu
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Singapore, 138632, Republic of Singapore.
| |
Collapse
|
40
|
Cao J, Wang X, Qu Z, Zhuo L, Li X, Zhang H, Yang Y, Wei W. WDFF-Net: Weighted Dual-Branch Feature Fusion Network for Polyp Segmentation With Object-Aware Attention Mechanism. IEEE J Biomed Health Inform 2024; 28:4118-4131. [PMID: 38536686 DOI: 10.1109/jbhi.2024.3381891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]
Abstract
Colon polyps in colonoscopy images exhibit significant differences in color, size, shape, appearance, and location, posing significant challenges to accurate polyp segmentation. In this paper, a Weighted Dual-branch Feature Fusion Network is proposed for Polyp Segmentation, named WDFF-Net, which adopts HarDNet68 as the backbone network. First, a dual-branch feature fusion network architecture is constructed, which includes a shared feature extractor and two feature fusion branches, i.e. Progressive Feature Fusion (PFF) branch and Scale-aware Feature Fusion (SFF) branch. The branches fuse the deep features of multiple layers for different purposes and with different fusion ways. The PFF branch is to address the under-segmentation or over-segmentation problems of flat polyps with low-edge contrast by iteratively fusing the features from low, medium, and high layers. The SFF branch is to tackle the the problem of drastic variations in polyp size and shape, especially the missed segmentation problem for small polyps. These two branches are complementary and play different roles, in improving segmentation accuracy. Second, an Object-aware Attention Mechanism (OAM) is proposed to enhance the features of the target regions and suppress those of the background regions, to interfere with the segmentation performance. Third, a weighted dual-branch the segmentation loss function is specifically designed, which dynamically assigns the weight factors of the loss functions for two branches to optimize their collaborative training. Experimental results on five public colon polyp datasets demonstrate that, the proposed WDFF-Net can achieve a superior segmentation performance with lower model complexity and faster inference speed, while maintaining good generalization ability.
Collapse
|
41
|
Huang X, Gong H, Zhang J. HST-MRF: Heterogeneous Swin Transformer With Multi-Receptive Field for Medical Image Segmentation. IEEE J Biomed Health Inform 2024; 28:4048-4061. [PMID: 38709610 DOI: 10.1109/jbhi.2024.3397047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
The Transformer has been successfully used in medical image segmentation due to its excellent long-range modeling capabilities. However, patch segmentation is necessary when building a Transformer class model. This process ignores the tissue structure features within patch, resulting in the loss of shallow representation information. In this study, we propose a Heterogeneous Swin Transformer with Multi-Receptive Field (HST-MRF) model that fuses patch information from different receptive fields to solve the problem of loss of feature information caused by patch segmentation. The heterogeneous Swin Transformer (HST) is the core module, which achieves the interaction of multi-receptive field patch information through heterogeneous attention and passes it to the next stage for progressive learning, thus complementing the patch structure information. We also designed a two-stage fusion module, multimodal bilinear pooling (MBP), to assist HST in further fusing multi-receptive field information and combining low-level and high-level semantic information for accurate localization of lesion regions. In addition, we developed adaptive patch embedding (APE) and soft channel attention (SCA) modules to retain more valuable information when acquiring patch embedding and filtering channel features, respectively, thereby improving model segmentation quality. We evaluated HST-MRF on multiple datasets for polyp, skin lesion and breast ultrasound segmentation tasks. Experimental results show that our proposed method outperforms state-of-the-art models and can achieve superior performance. Furthermore, we verified the effectiveness of each module and the benefits of multi-receptive field segmentation in reducing the loss of structural information through ablation experiments and qualitative analysis.
Collapse
|
42
|
Liu J, Zhang W, Liu Y, Zhang Q. Polyp segmentation based on implicit edge-guided cross-layer fusion networks. Sci Rep 2024; 14:11678. [PMID: 38778219 PMCID: PMC11111678 DOI: 10.1038/s41598-024-62331-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 05/15/2024] [Indexed: 05/25/2024] Open
Abstract
Polyps are abnormal tissue clumps growing primarily on the inner linings of the gastrointestinal tract. While such clumps are generally harmless, they can potentially evolve into pathological tumors, and thus require long-term observation and monitoring. Polyp segmentation in gastrointestinal endoscopy images is an important stage for polyp monitoring and subsequent treatment. However, this segmentation task faces multiple challenges: the low contrast of the polyp boundaries, the varied polyp appearance, and the co-occurrence of multiple polyps. So, in this paper, an implicit edge-guided cross-layer fusion network (IECFNet) is proposed for polyp segmentation. The codec pair is used to generate an initial saliency map, the implicit edge-enhanced context attention module aggregates the feature graph output from the encoding and decoding to generate the rough prediction, and the multi-scale feature reasoning module is used to generate final predictions. Polyp segmentation experiments have been conducted on five popular polyp image datasets (Kvasir, CVC-ClinicDB, ETIS, CVC-ColonDB, and CVC-300), and the experimental results show that the proposed method significantly outperforms a conventional method, especially with an accuracy margin of 7.9% on the ETIS dataset.
Collapse
Affiliation(s)
- Junqing Liu
- Hubei Engineering and Technology Research Center for Construction Quality Inspection Equipment, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
- College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
| | - Weiwei Zhang
- Hubei Engineering and Technology Research Center for Construction Quality Inspection Equipment, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China.
- College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China.
| | - Yong Liu
- Hubei Engineering and Technology Research Center for Construction Quality Inspection Equipment, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
- College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
| | - Qinghe Zhang
- Hubei Engineering and Technology Research Center for Construction Quality Inspection Equipment, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
- College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
| |
Collapse
|
43
|
Daneshpajooh V, Ahmad D, Toth J, Bascom R, Higgins WE. Automatic lesion detection for narrow-band imaging bronchoscopy. J Med Imaging (Bellingham) 2024; 11:036002. [PMID: 38827776 PMCID: PMC11138083 DOI: 10.1117/1.jmi.11.3.036002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 04/04/2024] [Accepted: 05/14/2024] [Indexed: 06/05/2024] Open
Abstract
Purpose Early detection of cancer is crucial for lung cancer patients, as it determines disease prognosis. Lung cancer typically starts as bronchial lesions along the airway walls. Recent research has indicated that narrow-band imaging (NBI) bronchoscopy enables more effective bronchial lesion detection than other bronchoscopic modalities. Unfortunately, NBI video can be hard to interpret because physicians currently are forced to perform a time-consuming subjective visual search to detect bronchial lesions in a long airway-exam video. As a result, NBI bronchoscopy is not regularly used in practice. To alleviate this problem, we propose an automatic two-stage real-time method for bronchial lesion detection in NBI video and perform a first-of-its-kind pilot study of the method using NBI airway exam video collected at our institution. Approach Given a patient's NBI video, the first method stage entails a deep-learning-based object detection network coupled with a multiframe abnormality measure to locate candidate lesions on each video frame. The second method stage then draws upon a Siamese network and a Kalman filter to track candidate lesions over multiple frames to arrive at final lesion decisions. Results Tests drawing on 23 patient NBI airway exam videos indicate that the method can process an incoming video stream at a real-time frame rate, thereby making the method viable for real-time inspection during a live bronchoscopic airway exam. Furthermore, our studies showed a 93% sensitivity and 86% specificity for lesion detection; this compares favorably to a sensitivity and specificity of 80% and 84% achieved over a series of recent pooled clinical studies using the current time-consuming subjective clinical approach. Conclusion The method shows potential for robust lesion detection in NBI video at a real-time frame rate. Therefore, it could help enable more common use of NBI bronchoscopy for bronchial lesion detection.
Collapse
Affiliation(s)
- Vahid Daneshpajooh
- The Pennsylvania State University, School of Electrical Engineering and Computer Science, University Park, Pennsylvania, United States
| | - Danish Ahmad
- The Pennsylvania State University, College of Medicine, Hershey, Pennsylvania, United States
| | - Jennifer Toth
- The Pennsylvania State University, College of Medicine, Hershey, Pennsylvania, United States
| | - Rebecca Bascom
- The Pennsylvania State University, College of Medicine, Hershey, Pennsylvania, United States
| | - William E. Higgins
- The Pennsylvania State University, School of Electrical Engineering and Computer Science, University Park, Pennsylvania, United States
| |
Collapse
|
44
|
Su D, Luo J, Fei C. An Efficient and Rapid Medical Image Segmentation Network. IEEE J Biomed Health Inform 2024; 28:2979-2990. [PMID: 38457317 DOI: 10.1109/jbhi.2024.3374780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
Accurate medical image segmentation is an essential part of the medical image analysis process that provides detailed quantitative metrics. In recent years, extensions of classical networks such as UNet have achieved state-of-the-art performance on medical image segmentation tasks. However, the high model complexity of these networks limits their applicability to devices with constrained computational resources. To alleviate this problem, we propose a shallow hierarchical Transformer for medical image segmentation, called SHFormer. By decreasing the number of transformer blocks utilized, the model complexity of SHFormer can be reduced to an acceptable level. To improve the learned attention while keeping the structure lightweight, we propose a spatial-channel connection module. This module separately learns attention in the spatial and channel dimensions of the feature while interconnecting them to produce more focused attention. To keep the decoder lightweight, the MLP-D module is proposed to progressively fuse multi-scale features in which channels are aligned using Multi-Layer Perceptron (MLP) and spatial information is fused by convolutional blocks. We first validated the performance of SHFormer on the ISIC-2018 dataset. Compared to the latest network, SHFormer exhibits comparable performance with 15 times fewer parameters, 30 times lower computational complexity and 5 times higher inference efficiency. To test the generalizability of SHFormer, we introduced the polyp dataset for additional testing. SHFormer achieves comparable segmentation accuracy to the latest network while having lower computational overhead.
Collapse
|
45
|
Zhang K, Hu D, Li X, Wang X, Hu X, Wang C, Yang J, Rao N. BFE-Net: bilateral fusion enhanced network for gastrointestinal polyp segmentation. BIOMEDICAL OPTICS EXPRESS 2024; 15:2977-2999. [PMID: 38855696 PMCID: PMC11161362 DOI: 10.1364/boe.522441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 03/17/2024] [Accepted: 03/17/2024] [Indexed: 06/11/2024]
Abstract
Accurate segmentation of polyp regions in gastrointestinal endoscopic images is pivotal for diagnosis and treatment. Despite advancements, challenges persist, like accurately segmenting small polyps and maintaining accuracy when polyps resemble surrounding tissues. Recent studies show the effectiveness of the pyramid vision transformer (PVT) in capturing global context, yet it may lack detailed information. Conversely, U-Net excels in semantic extraction. Hence, we propose the bilateral fusion enhanced network (BFE-Net) to address these challenges. Our model integrates U-Net and PVT features via a deep feature enhancement fusion module (FEF) and attention decoder module (AD). Experimental results demonstrate significant improvements, validating our model's effectiveness across various datasets and modalities, promising advancements in gastrointestinal polyp diagnosis and treatment.
Collapse
Affiliation(s)
- Kaixuan Zhang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Dingcan Hu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiang Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiaotong Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiaoming Hu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Chunyang Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Jinlin Yang
- Digestive Endoscopic Center of West China Hospital, Sichuan University, Chengdu 610017, China
| | - Nini Rao
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
46
|
Li B, Xu Y, Wang Y, Zhang B. DECTNet: Dual Encoder Network combined convolution and Transformer architecture for medical image segmentation. PLoS One 2024; 19:e0301019. [PMID: 38573957 PMCID: PMC10994332 DOI: 10.1371/journal.pone.0301019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 03/09/2024] [Indexed: 04/06/2024] Open
Abstract
Automatic and accurate segmentation of medical images plays an essential role in disease diagnosis and treatment planning. Convolution neural networks have achieved remarkable results in medical image segmentation in the past decade. Meanwhile, deep learning models based on Transformer architecture also succeeded tremendously in this domain. However, due to the ambiguity of the medical image boundary and the high complexity of physical organization structures, implementing effective structure extraction and accurate segmentation remains a problem requiring a solution. In this paper, we propose a novel Dual Encoder Network named DECTNet to alleviate this problem. Specifically, the DECTNet embraces four components, which are a convolution-based encoder, a Transformer-based encoder, a feature fusion decoder, and a deep supervision module. The convolutional structure encoder can extract fine spatial contextual details in images. Meanwhile, the Transformer structure encoder is designed using a hierarchical Swin Transformer architecture to model global contextual information. The novel feature fusion decoder integrates the multi-scale representation from two encoders and selects features that focus on segmentation tasks by channel attention mechanism. Further, a deep supervision module is used to accelerate the convergence of the proposed method. Extensive experiments demonstrate that, compared to the other seven models, the proposed method achieves state-of-the-art results on four segmentation tasks: skin lesion segmentation, polyp segmentation, Covid-19 lesion segmentation, and MRI cardiac segmentation.
Collapse
Affiliation(s)
- Boliang Li
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yaming Xu
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yan Wang
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Bo Zhang
- Sergeant Schools of Army Academy of Armored Forces, Changchun, Jilin, China
| |
Collapse
|
47
|
Goceri E. Polyp Segmentation Using a Hybrid Vision Transformer and a Hybrid Loss Function. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:851-863. [PMID: 38343250 PMCID: PMC11031515 DOI: 10.1007/s10278-023-00954-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 09/16/2023] [Accepted: 10/02/2023] [Indexed: 04/20/2024]
Abstract
Accurate and early detection of precursor adenomatous polyps and their removal at the early stage can significantly decrease the mortality rate and the occurrence of the disease since most colorectal cancer evolve from adenomatous polyps. However, accurate detection and segmentation of the polyps by doctors are difficult mainly these factors: (i) quality of the screening of the polyps with colonoscopy depends on the imaging quality and the experience of the doctors; (ii) visual inspection by doctors is time-consuming, burdensome, and tiring; (iii) prolonged visual inspections can lead to polyps being missed even when the physician is experienced. To overcome these problems, computer-aided methods have been proposed. However, they have some disadvantages or limitations. Therefore, in this work, a new architecture based on residual transformer layers has been designed and used for polyp segmentation. In the proposed segmentation, both high-level semantic features and low-level spatial features have been utilized. Also, a novel hybrid loss function has been proposed. The loss function designed with focal Tversky loss, binary cross-entropy, and Jaccard index reduces image-wise and pixel-wise differences as well as improves regional consistencies. Experimental works have indicated the effectiveness of the proposed approach in terms of dice similarity (0.9048), recall (0.9041), precision (0.9057), and F2 score (0.8993). Comparisons with the state-of-the-art methods have shown its better performance.
Collapse
|
48
|
Li F, Huang Z, Zhou L, Chen Y, Tang S, Ding P, Peng H, Chu Y. Improved dual-aggregation polyp segmentation network combining a pyramid vision transformer with a fully convolutional network. BIOMEDICAL OPTICS EXPRESS 2024; 15:2590-2621. [PMID: 38633077 PMCID: PMC11019695 DOI: 10.1364/boe.510908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/26/2024] [Accepted: 03/08/2024] [Indexed: 04/19/2024]
Abstract
Automatic and precise polyp segmentation in colonoscopy images is highly valuable for diagnosis at an early stage and surgery of colorectal cancer. Nevertheless, it still posed a major challenge due to variations in the size and intricate morphological characteristics of polyps coupled with the indistinct demarcation between polyps and mucosas. To alleviate these challenges, we proposed an improved dual-aggregation polyp segmentation network, dubbed Dua-PSNet, for automatic and accurate full-size polyp prediction by combining both the transformer branch and a fully convolutional network (FCN) branch in a parallel style. Concretely, in the transformer branch, we adopted the B3 variant of pyramid vision transformer v2 (PVTv2-B3) as an image encoder for capturing multi-scale global features and modeling long-distant interdependencies between them whilst designing an innovative multi-stage feature aggregation decoder (MFAD) to highlight critical local feature details and effectively integrate them into global features. In the decoder, the adaptive feature aggregation (AFA) block was constructed for fusing high-level feature representations of different scales generated by the PVTv2-B3 encoder in a stepwise adaptive manner for refining global semantic information, while the ResidualBlock module was devised to mine detailed boundary cues disguised in low-level features. With the assistance of the selective global-to-local fusion head (SGLFH) module, the resulting boundary details were aggregated selectively with these global semantic features, strengthening these hierarchical features to cope with scale variations of polyps. The FCN branch embedded in the designed ResidualBlock module was used to encourage extraction of highly merged fine features to match the outputs of the Transformer branch into full-size segmentation maps. In this way, both branches were reciprocally influenced and complemented to enhance the discrimination capability of polyp features and enable a more accurate prediction of a full-size segmentation map. Extensive experiments on five challenging polyp segmentation benchmarks demonstrated that the proposed Dua-PSNet owned powerful learning and generalization ability and advanced the state-of-the-art segmentation performance among existing cutting-edge methods. These excellent results showed our Dua-PSNet had great potential to be a promising solution for practical polyp segmentation tasks in which wide variations of data typically occurred.
Collapse
Affiliation(s)
- Feng Li
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Zetao Huang
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Lu Zhou
- Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai 200336, China
| | - Yuyang Chen
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Shiqing Tang
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Pengchao Ding
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Haixia Peng
- Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai 200336, China
| | - Yimin Chu
- Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai 200336, China
| |
Collapse
|
49
|
Li G, Xie J, Zhang L, Sun M, Li Z, Sun Y. MCAFNet: multiscale cross-layer attention fusion network for honeycomb lung lesion segmentation. Med Biol Eng Comput 2024; 62:1121-1137. [PMID: 38150110 DOI: 10.1007/s11517-023-02995-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 12/07/2023] [Indexed: 12/28/2023]
Abstract
Accurate segmentation of honeycomb lung lesions from lung CT images plays a crucial role in the diagnosis and treatment of various lung diseases. However, the availability of algorithms for automatic segmentation of honeycomb lung lesions remains limited. In this study, we propose a novel multi-scale cross-layer attention fusion network (MCAFNet) specifically designed for the segmentation of honeycomb lung lesions, taking into account their shape specificity and similarity to surrounding vascular shadows. The MCAFNet incorporates several key modules to enhance the segmentation performance. Firstly, a multiscale aggregation (MIA) module is introduced in the input part to preserve spatial information during downsampling. Secondly, a cross-layer attention fusion (CAF) module is proposed to capture multiscale features by integrating channel information and spatial information from different layers of the feature maps. Lastly, a bidirectional attention gate (BAG) module is constructed within the skip connection to enhance the model's ability to filter out background information and focus on the segmentation target. Experimental results demonstrate the effectiveness of the proposed MCAFNet. On the honeycomb lung segmentation dataset, the network achieves an Intersection over Union (IoU) of 0.895, mean IoU (mIoU) of 0.921, and mean Dice coefficient (mDice) of 0.949, outperforming existing medical image segmentation algorithms. Furthermore, experiments conducted on additional datasets confirm the generalizability and robustness of the proposed model. The contribution of this study lies in the development of the MCAFNet, which addresses the lack of automated segmentation algorithms for honeycomb lung lesions. The proposed network demonstrates superior performance in accurately segmenting honeycomb lung lesions, thereby facilitating the diagnosis and treatment of lung diseases. This work contributes to the existing literature by presenting a novel approach that effectively combines multi-scale features and attention mechanisms for lung lesion segmentation. The code is available at https://github.com/Oran9er/MCAFNet .
Collapse
Affiliation(s)
- Gang Li
- Taiyuan University of Technology Software College, Taiyuan, China
| | - Jinjie Xie
- Taiyuan University of Technology Software College, Taiyuan, China
| | - Ling Zhang
- Taiyuan University of Technology Software College, Taiyuan, China.
| | - Mengxia Sun
- Taiyuan University of Technology Software College, Taiyuan, China
| | - Zhichao Li
- Taiyuan University of Technology Software College, Taiyuan, China
| | - Yuanjin Sun
- Taiyuan University of Technology Software College, Taiyuan, China
| |
Collapse
|
50
|
Du H, Wang J, Liu M, Wang Y, Meijering E. SwinPA-Net: Swin Transformer-Based Multiscale Feature Pyramid Aggregation Network for Medical Image Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5355-5366. [PMID: 36121961 DOI: 10.1109/tnnls.2022.3204090] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The precise segmentation of medical images is one of the key challenges in pathology research and clinical practice. However, many medical image segmentation tasks have problems such as large differences between different types of lesions and similar shapes as well as colors between lesions and surrounding tissues, which seriously affects the improvement of segmentation accuracy. In this article, a novel method called Swin Pyramid Aggregation network (SwinPA-Net) is proposed by combining two designed modules with Swin Transformer to learn more powerful and robust features. The two modules, named dense multiplicative connection (DMC) module and local pyramid attention (LPA) module, are proposed to aggregate the multiscale context information of medical images. The DMC module cascades the multiscale semantic feature information through dense multiplicative feature fusion, which minimizes the interference of shallow background noise to improve the feature expression and solves the problem of excessive variation in lesion size and type. Moreover, the LPA module guides the network to focus on the region of interest by merging the global attention and the local attention, which helps to solve similar problems. The proposed network is evaluated on two public benchmark datasets for polyp segmentation task and skin lesion segmentation task as well as a clinical private dataset for laparoscopic image segmentation task. Compared with existing state-of-the-art (SOTA) methods, the SwinPA-Net achieves the most advanced performance and can outperform the second-best method on the mean Dice score by 1.68%, 0.8%, and 1.2% on the three tasks, respectively.
Collapse
|