1
|
Gupta A, Bajaj S, Nema P, Purohit A, Kashaw V, Soni V, Kashaw SK. Potential of AI and ML in oncology research including diagnosis, treatment and future directions: A comprehensive prospective. Comput Biol Med 2025; 189:109918. [PMID: 40037170 DOI: 10.1016/j.compbiomed.2025.109918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 02/21/2025] [Accepted: 02/23/2025] [Indexed: 03/06/2025]
Abstract
Artificial intelligence (AI) and machine learning (ML) have emerged as transformative tools in cancer research, offering the ability to process huge data rapidly and make precise therapeutic decisions. Over the last decade, AI, particularly deep learning (DL) and machine learning (ML), has significantly enhanced cancer prediction, diagnosis, and treatment by leveraging algorithms such as convolutional neural networks (CNNs) and multi-layer perceptrons (MLPs). These technologies provide reliable, efficient solutions for managing aggressive diseases like cancer, which have high recurrence and mortality rates. This review prospective highlights the applications of AI in oncology, a long with FDA-approved technologies like EFAI RTSuite CT HN-Segmentation System, Quantib Prostate, and Paige Prostate, and explore their role in advancing cancer detection, personalized care, and treatment. Furthermore, we also explored broader applications of AI in healthcare, addressing challenges, limitations, regulatory considerations, and ethical implications. By presenting these advancements, we underscore AI's potential to revolutionize cancer care, management and treatment.
Collapse
Affiliation(s)
- Akanksha Gupta
- Integrated Drug Discovery Research Laboratory, Department of Pharmaceutical Sciences, Dr. Harisingh Gour University (A Central University), Sagar, Madya Pradesh, 470003, India.
| | - Samyak Bajaj
- Integrated Drug Discovery Research Laboratory, Department of Pharmaceutical Sciences, Dr. Harisingh Gour University (A Central University), Sagar, Madya Pradesh, 470003, India.
| | - Priyanshu Nema
- Integrated Drug Discovery Research Laboratory, Department of Pharmaceutical Sciences, Dr. Harisingh Gour University (A Central University), Sagar, Madya Pradesh, 470003, India.
| | - Arpana Purohit
- Integrated Drug Discovery Research Laboratory, Department of Pharmaceutical Sciences, Dr. Harisingh Gour University (A Central University), Sagar, Madya Pradesh, 470003, India.
| | - Varsha Kashaw
- Sagar Institute of Pharmaceutical Sciences, Sagar, M.P., India.
| | - Vandana Soni
- Integrated Drug Discovery Research Laboratory, Department of Pharmaceutical Sciences, Dr. Harisingh Gour University (A Central University), Sagar, Madya Pradesh, 470003, India.
| | - Sushil K Kashaw
- Integrated Drug Discovery Research Laboratory, Department of Pharmaceutical Sciences, Dr. Harisingh Gour University (A Central University), Sagar, Madya Pradesh, 470003, India.
| |
Collapse
|
2
|
Ren X, Zhou W, Yuan N, Li F, Ruan Y, Zhou H. Prompt-based polyp segmentation during endoscopy. Med Image Anal 2025; 102:103510. [PMID: 40073580 DOI: 10.1016/j.media.2025.103510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 12/26/2024] [Accepted: 02/15/2025] [Indexed: 03/14/2025]
Abstract
Accurate judgment and identification of polyp size is crucial in endoscopic diagnosis. However, the indistinct boundaries of polyps lead to missegmentation and missed cancer diagnoses. In this paper, a prompt-based polyp segmentation method (PPSM) is proposed to assist in early-stage cancer diagnosis during endoscopy. It combines endoscopists' experience and artificial intelligence technology. Firstly, a prompt-based polyp segmentation network (PPSN) is presented, which contains the prompt encoding module (PEM), the feature extraction encoding module (FEEM), and the mask decoding module (MDM). The PEM encodes prompts to guide the FEEM for feature extracting and the MDM for mask generating. So that PPSN can segment polyps efficiently. Secondly, endoscopists' ocular attention data (gazes) are used as prompts, which can enhance PPSN's accuracy for segmenting polyps and obtain prompt data effectively in real-world. To reinforce the PPSN's stability, non-uniform dot matrix prompts are generated to compensate for frame loss during the eye-tracking. Moreover, a data augmentation method based on the segment anything model (SAM) is introduced to enrich the prompt dataset and improve the PPSN's adaptability. Experiments demonstrate the PPSM's accuracy and real-time capability. The results from cross-training and cross-testing on four datasets show the PPSM's generalization. Based on the research results, a disposable electronic endoscope with the real-time auxiliary diagnosis function for early cancer and an image processor have been developed. Part of the code and the method for generating the prompts dataset are available at https://github.com/XinZhenRen/PPSM.
Collapse
Affiliation(s)
- Xinzhen Ren
- Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, CO 200444, China
| | - Wenju Zhou
- Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, CO 200444, China.
| | - Naitong Yuan
- Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, CO 200444, China
| | - Fang Li
- Department of Obstetrics and Gynecology, Shanghai East Hospital, School of Medicine, Tongji University, Shanghai, CO 200120, China.
| | - Yetian Ruan
- Department of Obstetrics and Gynecology, Shanghai East Hospital, School of Medicine, Tongji University, Shanghai, CO 200120, China
| | - Huiyu Zhou
- School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK
| |
Collapse
|
3
|
Du X, Zhang X, Chen J, Li L. Boosting polyp screening with improved point-teacher weakly semi-supervised. Comput Biol Med 2025; 191:109998. [PMID: 40198989 DOI: 10.1016/j.compbiomed.2025.109998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2024] [Revised: 02/19/2025] [Accepted: 03/04/2025] [Indexed: 04/10/2025]
Abstract
Polyps, like a silent time bomb in the gut, are always lurking and can explode into deadly colorectal cancer at any time. Many methods are attempted to maximize the early detection of colon polyps by screening, however, there are still face some challenges: (i) the scarcity of per-pixel annotation data and clinical features such as the blurred boundary and low contrast of polyps result in poor performance. (ii) existing weakly semi-supervised methods directly using pseudo-labels to supervise student tend to ignore the value brought by intermediate features in the teacher. To adapt the point-prompt teacher model to the challenging scenarios of complex medical images and limited annotation data, we creatively leverage the diverse inductive biases of CNN and Transformer to extract robust and complementary representation of polyp features (boundary and context). At the same time, a novel designed teacher-student intermediate feature distillation method is introduced rather than just using pseudo-labels to guide student learning. Comprehensive experiments demonstrate that our proposed method effectively handles scenarios with limited annotations and exhibits good segmentation performance. All code is available at https://github.com/dxqllp/WSS-Polyp.
Collapse
Affiliation(s)
- Xiuquan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, China; School of Computer Science and Technology, Anhui University, Hefei, China
| | - Xuejun Zhang
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Jiajia Chen
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Lei Li
- Department of Neurology, Shuyang Affiliated Hospital of Nanjing University of Traditional Chinese Medicine, Suqian, China.
| |
Collapse
|
4
|
Wang H, Wang KN, Hua J, Tang Y, Chen Y, Zhou GQ, Li S. Dynamic spectrum-driven hierarchical learning network for polyp segmentation. Med Image Anal 2025; 101:103449. [PMID: 39847953 DOI: 10.1016/j.media.2024.103449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 12/06/2024] [Accepted: 12/26/2024] [Indexed: 01/25/2025]
Abstract
Accurate automatic polyp segmentation in colonoscopy is crucial for the prompt prevention of colorectal cancer. However, the heterogeneous nature of polyps and differences in lighting and visibility conditions present significant challenges in achieving reliable and consistent segmentation across different cases. Therefore, this study proposes a novel dynamic spectrum-driven hierarchical learning model (DSHNet), the first to specifically leverage image frequency domain information to explore region-level salience differences among and within polyps for precise segmentation. A novel spectral decoupler is advanced to separate low-frequency and high-frequency components, leveraging their distinct characteristics to guide the model in learning valuable frequency features without bias through automatic masking. The low-frequency driven region-level saliency modeling then generates dynamic convolution kernels with individual frequency-aware features, which regulate region-level saliency modeling together with the supervision of the hierarchy of labels, thus enabling adaptation to polyp heterogeneous and illumination variation simultaneously. Meanwhile, the high-frequency attention module is designed to preserve the detailed information at the skip connections, which complements the focus on spatial features at various stages. Experimental results demonstrate that the proposed method outperforms other state-of-the-art polyp segmentation techniques, achieving robust and superior results on five diverse datasets. Codes are available at https://github.com/gardnerzhou/DSHNet.
Collapse
Affiliation(s)
- Haolin Wang
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China; Jiangsu Key Laboratory of Biomaterials and Devices, Southeast University, Nanjing, China
| | - Kai-Ni Wang
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China; Jiangsu Key Laboratory of Biomaterials and Devices, Southeast University, Nanjing, China
| | - Jie Hua
- The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Yi Tang
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China; Jiangsu Key Laboratory of Biomaterials and Devices, Southeast University, Nanjing, China
| | - Yang Chen
- Laboratory of Image Science and Technology, Southeast University, Nanjing, China; Key Laboratory of Computer Network and Information Integration, Southeast University, Nanjing, China
| | - Guang-Quan Zhou
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China; Jiangsu Key Laboratory of Biomaterials and Devices, Southeast University, Nanjing, China.
| | - Shuo Li
- Department of Computer and Data Science and Department of Biomedical Engineering, Case Western Reserve University, Cleveland, USA
| |
Collapse
|
5
|
Wang Z, Guo L, Zhao S, Zhang S, Zhao X, Fang J, Wang G, Lu H, Yu J, Tian Q. Multi-Scale Group Agent Attention-Based Graph Convolutional Decoding Networks for 2D Medical Image Segmentation. IEEE J Biomed Health Inform 2025; 29:2718-2730. [PMID: 40030822 DOI: 10.1109/jbhi.2024.3523112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Automated medical image segmentation plays a crucial role in assisting doctors in diagnosing diseases. Feature decoding is a critical yet challenging issue for medical image segmentation. To address this issue, this work proposes a novel feature decoding network, called multi-scale group agent attention-based graph convolutional decoding networks (MSGAA-GCDN), to learn local-global features in graph structures for 2D medical image segmentation. The proposed MSGAA-GCDN combines graph convolutional network (GCN) and a lightweight multi-scale group agent attention (MSGAA) mechanism to represent features globally and locally within a graph structure. Moreover, in skip connections a simple yet efficient attention-based upsampling convolution fusion (AUCF) module is designed to enhance encoder-decoder feature fusion in both channel and spatial dimensions. Extensive experiments are conducted on three typical medical image segmentation tasks, namely Synapse abdominal multi-organs, Cardiac organs, and Polyp lesions. Experimental results demonstrate that the proposed MSGAA-GCDN outperforms the state-of-the-art methods, and the designed MSGAA is a lightweight yet effective attention architecture. The proposed MSGAA-GCDN can be easily taken as a plug-and-play decoder cascaded with other encoders for general medical image segmentation tasks.
Collapse
|
6
|
Tong Y, Chai J, Chen Z, Zhou Z, Hu Y, Li X, Qiao X, Hu K. Dynamic Frequency-Decoupled Refinement Network for Polyp Segmentation. Bioengineering (Basel) 2025; 12:277. [PMID: 40150740 PMCID: PMC11939780 DOI: 10.3390/bioengineering12030277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2025] [Revised: 03/01/2025] [Accepted: 03/09/2025] [Indexed: 03/29/2025] Open
Abstract
Polyp segmentation is crucial for early colorectal cancer detection, but accurately delineating polyps is challenging due to their variations in size, shape, and texture and low contrast with surrounding tissues. Existing methods often rely solely on spatial-domain processing, which struggles to separate high-frequency features (edges, textures) from low-frequency ones (global structures), leading to suboptimal segmentation performance. We propose the Dynamic Frequency-Decoupled Refinement Network (DFDRNet), a novel segmentation framework that integrates frequency-domain and spatial-domain processing. DFDRNet introduces the Frequency Adaptive Decoupling (FAD) module, which dynamically separates high- and low-frequency components, and the Frequency Adaptive Refinement (FAR) module, which refines these components before fusing them with spatial features to enhance segmentation accuracy. Embedded within a U-shaped encoder-decoder framework, DFDRNet achieves state-of-the-art performance across three benchmark datasets, demonstrating superior robustness and efficiency. Our extensive evaluations and ablation studies confirm the effectiveness of DFDRNet in balancing segmentation accuracy with computational efficiency.
Collapse
Affiliation(s)
- Yao Tong
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China; (Y.T.); (J.C.); (Y.H.)
- Jiangsu Province Engineering Research Center of TCM Intelligence Health Service, Nanjing University of Chinese Medicine, Nanjing 210023, China;
| | - Jingxian Chai
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China; (Y.T.); (J.C.); (Y.H.)
| | - Ziqi Chen
- Vanke School of Public Health, Tsinghua University, Beijing 100084, China;
| | - Zuojian Zhou
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China; (Y.T.); (J.C.); (Y.H.)
- Jiangsu Province Engineering Research Center of TCM Intelligence Health Service, Nanjing University of Chinese Medicine, Nanjing 210023, China;
| | - Yun Hu
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China; (Y.T.); (J.C.); (Y.H.)
- Jiangsu Province Engineering Research Center of TCM Intelligence Health Service, Nanjing University of Chinese Medicine, Nanjing 210023, China;
| | - Xin Li
- College of Computer Science and Software Engineering, Hohai University, Nanjing 211100, China;
| | - Xuebin Qiao
- Jiangsu Province Engineering Research Center of TCM Intelligence Health Service, Nanjing University of Chinese Medicine, Nanjing 210023, China;
- School of Elderly Care Services and Management, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Kongfa Hu
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China; (Y.T.); (J.C.); (Y.H.)
- Jiangsu Province Engineering Research Center of TCM Intelligence Health Service, Nanjing University of Chinese Medicine, Nanjing 210023, China;
| |
Collapse
|
7
|
Elamin S, Johri S, Rajpurkar P, Geisler E, Berzin TM. From data to artificial intelligence: evaluating the readiness of gastrointestinal endoscopy datasets. J Can Assoc Gastroenterol 2025; 8:S81-S86. [PMID: 39990508 PMCID: PMC11842897 DOI: 10.1093/jcag/gwae041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/25/2025] Open
Abstract
The incorporation of artificial intelligence (AI) into gastrointestinal (GI) endoscopy represents a promising advancement in gastroenterology. With over 40 published randomized controlled trials and numerous ongoing clinical trials, gastroenterology leads other medical disciplines in AI research. Computer-aided detection algorithms for identifying colorectal polyps have achieved regulatory approval and are in routine clinical use, while other AI applications for GI endoscopy are in advanced development stages. Near-term opportunities include the potential for computer-aided diagnosis to replace conventional histopathology for diagnosing small colon polyps and increased AI automation in capsule endoscopy. Despite significant development in research settings, the generalizability and robustness of AI models in real clinical practice remain inconsistent. The GI field lags behind other medical disciplines in the breadth of novel AI algorithms, with only 13 out of 882 Food and Drug Administration (FDA)-approved AI models focussed on GI endoscopy as of June 2024. Additionally, existing GI endoscopy image databases are disproportionately focussed on colon polyps, lacking representation of the diversity of other endoscopic findings. High-quality datasets, encompassing a wide range of patient demographics, endoscopic equipment types, and disease states, are crucial for developing effective AI models for GI endoscopy. This article reviews the current state of GI endoscopy datasets, barriers to progress, including dataset size, data diversity, annotation quality, and ethical issues in data collection and usage, and future needs for advancing AI in GI endoscopy.
Collapse
Affiliation(s)
- Sami Elamin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Shreya Johri
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Pranav Rajpurkar
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Enrik Geisler
- Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02115, USA
| | - Tyler M Berzin
- Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
8
|
Ke X, Chen G, Liu H, Guo W. MEFA-Net: A mask enhanced feature aggregation network for polyp segmentation. Comput Biol Med 2025; 186:109601. [PMID: 39740513 DOI: 10.1016/j.compbiomed.2024.109601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 11/30/2024] [Accepted: 12/18/2024] [Indexed: 01/02/2025]
Abstract
Accurate polyp segmentation is crucial for early diagnosis and treatment of colorectal cancer. This is a challenging task for three main reasons: (i) the problem of model overfitting and weak generalization due to the multi-center distribution of data; (ii) the problem of interclass ambiguity caused by motion blur and overexposure to endoscopic light; and (iii) the problem of intraclass inconsistency caused by the variety of morphologies and sizes of the same type of polyps. To address these challenges, we propose a new high-precision polyp segmentation framework, MEFA-Net, which consists of three modules, including the plug-and-play Mask Enhancement Module (MEG), Separable Path Attention Enhancement Module (SPAE), and Dynamic Global Attention Pool Module (DGAP). Specifically, firstly, the MEG module regionally masks the high-energy regions of the environment and polyps through a mask, which guides the model to rely on only a small amount of information to distinguish between polyps and background features, avoiding the model from overfitting the environmental information, and improving the robustness of the model. At the same time, this module can effectively counteract the "dark corner phenomenon" in the dataset and further improve the generalization performance of the model. Next, the SPAE module can effectively alleviate the inter-class fuzzy problem by strengthening the feature expression. Then, the DGAP module solves the intra-class inconsistency problem by extracting the invariance of scale, shape and position. Finally, we propose a new evaluation metric, MultiColoScore, for comprehensively evaluating the segmentation performance of the model on five datasets with different domains. We evaluated the new method quantitatively and qualitatively on five datasets using four metrics. Experimental results show that MEFA-Net significantly improves the accuracy of polyp segmentation and outperforms current state-of-the-art algorithms. Code posted on https://github.com/847001315/MEFA-Net.
Collapse
Affiliation(s)
- Xiao Ke
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China
| | - Guanhong Chen
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China
| | - Hao Liu
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China
| | - Wenzhong Guo
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China.
| |
Collapse
|
9
|
Mao X, Li H, Li X, Bai C, Ming W. C 2E-Net: Cascade attention and context-aware cross-level fusion network via edge learning guidance for polyp segmentation. Comput Biol Med 2025; 185:108770. [PMID: 39653624 DOI: 10.1016/j.compbiomed.2024.108770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 05/15/2024] [Accepted: 06/15/2024] [Indexed: 01/26/2025]
Abstract
Colorectal polyps are one of the most direct causes of colorectal cancer. Polypectomy can effectively block the process of colorectal cancer, but accurate polyp segmentation methods are required as an auxiliary means. However, there are several challenges associated with achieving accurate polyp segmentation, such as the large semantic gap between the encoder and decoder, the incomplete edges, and the potential confusion between folds in uncertain areas and target objects. To address the aforementioned challenges, an advanced polyp segmentation network (C2E-Net) is proposed, leveraging a cascaded attention mechanism and context-aware cross-level fusion guided by edge learning. Firstly, a cascade attention (CA) module is proposed to capture local feature details and increase the receptive field by setting different dilation rates in different convolutional layers, and combines the criss-cross attention mechanism for bridging the semantic gap between codecs. Subsequently, an edge learning guidance (ELG) module is designed that employs parallel axial attention operations to capture complementary edge information with sufficient detail to enrich feature details and edge features. Ultimately, to effectively integrate cross-level features and obtain rich global contextual information, a context-aware cross-level fusion (CCF) module is introduced through a multi-scale channel attention mechanism to minimize potential confusion between folds in uncertain areas and target objects. A plethora of experimental results has shown that C2E-Net is superior over the state-of-the-art methods, with average Dice coefficients on five polyp datasets of 94.54 %, 92.23 %, 82.24 %, 79.53 % and 89.84 %.
Collapse
Affiliation(s)
- Xu Mao
- School of Information, Yunnan University, Kunming, 650504, China
| | - Haiyan Li
- School of Information, Yunnan University, Kunming, 650504, China.
| | - Xiangxian Li
- School of Software, Shandong University, Jinan, 250101, China
| | - Chongbin Bai
- Otolaryngology Department, Honghe Prefecture Second People's Hospital, Jianshui, 654300, China
| | - Wenjun Ming
- The Primary School Affiliated to Yunnan University, Kunming, 650000, China
| |
Collapse
|
10
|
Chu J, Liu W, Tian Q, Lu W. PFPRNet: A Phase-Wise Feature Pyramid With Retention Network for Polyp Segmentation. IEEE J Biomed Health Inform 2025; 29:1137-1150. [PMID: 40030242 DOI: 10.1109/jbhi.2024.3500026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Abstract
Early detection of colonic polyps is crucial for the prevention and diagnosis of colorectal cancer. Currently, deep learning-based polyp segmentation methods have become mainstream and achieved remarkable results. Acquiring a large number of labeled data is time-consuming and labor-intensive, and meanwhile the presence of numerous similar wrinkles in polyp images also hampers model prediction performance. In this paper, we propose a novel approach called Phase-wise Feature Pyramid with Retention Network (PFPRNet), which leverages a pre-trained Transformer-based Encoder to obtain multi-scale feature maps. A Phase-wise Feature Pyramid with Retention Decoder is designed to gradually integrate global features into local features and guide the model's attention towards key regions. Additionally, our custom Enhance Perception module enables capturing image information from a broader perspective. Finally, we introduce an innovative Low-layer Retention module as an alternative to Transformer for more efficient global attention modeling. Evaluation results on several widely-used polyp segmentation datasets demonstrate that our proposed method has strong learning ability and generalization capability, and outperforms the state-of-the-art approaches.
Collapse
|
11
|
Du Y, Jiang Y, Tan S, Liu SQ, Li Z, Li G, Wan X. Highlighted Diffusion Model as Plug-In Priors for Polyp Segmentation. IEEE J Biomed Health Inform 2025; 29:1209-1220. [PMID: 39446534 DOI: 10.1109/jbhi.2024.3485767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2024]
Abstract
Automated polyp segmentation from colonoscopy images is crucial for colorectal cancer diagnosis. The accuracy of such segmentation, however, is challenged by two main factors. First, the variability in polyps' size, shape, and color, coupled with the scarcity of well-annotated data due to the need for specialized manual annotation, hampers the efficacy of existing deep learning methods. Second, concealed polyps often blend with adjacent intestinal tissues, leading to poor contrast that challenges segmentation models. Recently, diffusion models have been explored and adapted for polyp segmentation tasks. However, the significant domain gap between RGB-colonoscopy images and grayscale segmentation masks, along with the low efficiency of the diffusion generation process, hinders the practical implementation of these models. To mitigate these challenges, we introduce the Highlighted Diffusion Model Plus (HDM+), a two-stage polyp segmentation framework. This framework incorporates the Highlighted Diffusion Model (HDM) to provide explicit semantic guidance, thereby enhancing segmentation accuracy. In the initial stage, the HDM is trained using highlighted ground-truth data, which emphasizes polyp regions while suppressing the background in the images. This approach reduces the domain gap by focusing on the image itself rather than on the segmentation mask. In the subsequent second stage, we employ the highlighted features from the trained HDM's U-Net model as plug-in priors for polyp segmentation, rather than generating highlighted images, thereby increasing efficiency. Extensive experiments conducted on six polyp segmentation benchmarks demonstrate the effectiveness of our approach.
Collapse
|
12
|
Wang KN, Wang H, Zhou GQ, Wang Y, Yang L, Chen Y, Li S. TSdetector: Temporal-Spatial self-correction collaborative learning for colonoscopy video detection. Med Image Anal 2025; 100:103384. [PMID: 39579624 DOI: 10.1016/j.media.2024.103384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 09/24/2024] [Accepted: 10/28/2024] [Indexed: 11/25/2024]
Abstract
CNN-based object detection models that strike a balance between performance and speed have been gradually used in polyp detection tasks. Nevertheless, accurately locating polyps within complex colonoscopy video scenes remains challenging since existing methods ignore two key issues: intra-sequence distribution heterogeneity and precision-confidence discrepancy. To address these challenges, we propose a novel Temporal-Spatial self-correction detector (TSdetector), which first integrates temporal-level consistency learning and spatial-level reliability learning to detect objects continuously. Technically, we first propose a global temporal-aware convolution, assembling the preceding information to dynamically guide the current convolution kernel to focus on global features between sequences. In addition, we designed a hierarchical queue integration mechanism to combine multi-temporal features through a progressive accumulation manner, fully leveraging contextual consistency information together with retaining long-sequence-dependency features. Meanwhile, at the spatial level, we advance a position-aware clustering to explore the spatial relationships among candidate boxes for recalibrating prediction confidence adaptively, thus eliminating redundant bounding boxes efficiently. The experimental results on three publicly available polyp video dataset show that TSdetector achieves the highest polyp detection rate and outperforms other state-of-the-art methods. The code can be available at https://github.com/soleilssss/TSdetector.
Collapse
Affiliation(s)
- Kai-Ni Wang
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China; Jiangsu Key Laboratory of Biomaterials and Devices, Southeast University, Nanjing, China
| | - Haolin Wang
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China; Jiangsu Key Laboratory of Biomaterials and Devices, Southeast University, Nanjing, China
| | - Guang-Quan Zhou
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China; Jiangsu Key Laboratory of Biomaterials and Devices, Southeast University, Nanjing, China.
| | | | - Ling Yang
- Institute of Medical Technology, Peking University Health Science Center, China
| | - Yang Chen
- Laboratory of Image Science and Technology, Southeast University, Nanjing, China; Key Laboratory of Computer Network and Information Integration, Southeast University, Nanjing, China
| | - Shuo Li
- Department of Computer and Data Science and Department of Biomedical Engineering, Case Western Reserve University, USA
| |
Collapse
|
13
|
Oukdach Y, Garbaz A, Kerkaou Z, Ansari ME, Koutti L, Ouafdi AFE, Salihoun M. InCoLoTransNet: An Involution-Convolution and Locality Attention-Aware Transformer for Precise Colorectal Polyp Segmentation in GI Images. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01389-7. [PMID: 39825142 DOI: 10.1007/s10278-025-01389-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 12/18/2024] [Accepted: 12/19/2024] [Indexed: 01/20/2025]
Abstract
Gastrointestinal (GI) disease examination presents significant challenges to doctors due to the intricate structure of the human digestive system. Colonoscopy and wireless capsule endoscopy are the most commonly used tools for GI examination. However, the large amount of data generated by these technologies requires the expertise and intervention of doctors for disease identification, making manual analysis a very time-consuming task. Thus, the development of a computer-assisted system is highly desirable to assist clinical professionals in making decisions in a low-cost and effective way. In this paper, we introduce a novel framework called InCoLoTransNet, designed for polyp segmentation. The study is based on a transformer and convolution-involution neural network, following the encoder-decoder architecture. We employed the vision transformer in the encoder section to focus on the global context, while the decoder involves a convolution-involution collaboration for resampling the polyp features. Involution enhances the model's ability to adaptively capture spatial and contextual information, while convolution focuses on local information, leading to more accurate feature extraction. The essential features captured by the transformer encoder are passed to the decoder through two skip connection pathways. The CBAM module refines the features and passes them to the convolution block, leveraging attention mechanisms to emphasize relevant information. Meanwhile, locality self-attention is employed to pass essential features to the involution block, reinforcing the model's ability to capture more global features in the polyp regions. Experiments were conducted on five public datasets: CVC-ClinicDB, CVC-ColonDB, Kvasir-SEG, Etis-LaribPolypDB, and CVC-300. The results obtained by InCoLoTransNet are optimal when compared with 15 state-of-the-art methods for polyp segmentation, achieving the highest mean dice score of 93% on CVC-ColonDB and 90% on mean intersection over union, outperforming the state-of-the-art methods. Additionally, InCoLoTransNet distinguishes itself in terms of polyp segmentation generalization performance. It achieved high scores in mean dice coefficient and mean intersection over union on unseen datasets as follows: 85% and 79% on CVC-ColonDB, 91% and 87% on CVC-300, and 79% and 70% on Etis-LaribPolypDB, respectively.
Collapse
Affiliation(s)
- Yassine Oukdach
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco.
| | - Anass Garbaz
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Zakaria Kerkaou
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Mohamed El Ansari
- Informatics and Applications Laboratory, Department of Computer Sciences, Faculty of Science, Moulay Ismail University, B.P 11201, Meknès, 52000, Morocco
| | - Lahcen Koutti
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Ahmed Fouad El Ouafdi
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Mouna Salihoun
- Faculty of Medicine and Pharmacy of Rabat, Mohammed V University of Rabat, Rabat, 10000, Morocco
| |
Collapse
|
14
|
Du X, Xu X, Chen J, Zhang X, Li L, Liu H, Li S. UM-Net: Rethinking ICGNet for polyp segmentation with uncertainty modeling. Med Image Anal 2025; 99:103347. [PMID: 39316997 DOI: 10.1016/j.media.2024.103347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 05/26/2024] [Accepted: 09/10/2024] [Indexed: 09/26/2024]
Abstract
Automatic segmentation of polyps from colonoscopy images plays a critical role in the early diagnosis and treatment of colorectal cancer. Nevertheless, some bottlenecks still exist. In our previous work, we mainly focused on polyps with intra-class inconsistency and low contrast, using ICGNet to solve them. Due to the different equipment, specific locations and properties of polyps, the color distribution of the collected images is inconsistent. ICGNet was designed primarily with reverse-contour guide information and local-global context information, ignoring this inconsistent color distribution, which leads to overfitting problems and makes it difficult to focus only on beneficial image content. In addition, a trustworthy segmentation model should not only produce high-precision results but also provide a measure of uncertainty to accompany its predictions so that physicians can make informed decisions. However, ICGNet only gives the segmentation result and lacks the uncertainty measure. To cope with these novel bottlenecks, we further extend the original ICGNet to a comprehensive and effective network (UM-Net) with two main contributions that have been proved by experiments to have substantial practical value. Firstly, we employ a color transfer operation to weaken the relationship between color and polyps, making the model more concerned with the shape of the polyps. Secondly, we provide the uncertainty to represent the reliability of the segmentation results and use variance to rectify uncertainty. Our improved method is evaluated on five polyp datasets, which shows competitive results compared to other advanced methods in both learning ability and generalization capability. The source code is available at https://github.com/dxqllp/UM-Net.
Collapse
Affiliation(s)
- Xiuquan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, China; School of Computer Science and Technology, Anhui University, Hefei, China
| | - Xuebin Xu
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Jiajia Chen
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Xuejun Zhang
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Lei Li
- Department of Neurology, Shuyang Affiliated Hospital of Nanjing University of Traditional Chinese Medicine, Suqian, China.
| | - Heng Liu
- Department of Gastroenterology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Shuo Li
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, USA
| |
Collapse
|
15
|
Kusters CHJ, Jaspers TJM, Boers TGW, Jong MR, Jukema JB, Fockens KN, de Groof AJ, Bergman JJ, van der Sommen F, De With PHN. Will Transformers change gastrointestinal endoscopic image analysis? A comparative analysis between CNNs and Transformers, in terms of performance, robustness and generalization. Med Image Anal 2025; 99:103348. [PMID: 39298861 DOI: 10.1016/j.media.2024.103348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 07/10/2024] [Accepted: 09/10/2024] [Indexed: 09/22/2024]
Abstract
Gastrointestinal endoscopic image analysis presents significant challenges, such as considerable variations in quality due to the challenging in-body imaging environment, the often-subtle nature of abnormalities with low interobserver agreement, and the need for real-time processing. These challenges pose strong requirements on the performance, generalization, robustness and complexity of deep learning-based techniques in such safety-critical applications. While Convolutional Neural Networks (CNNs) have been the go-to architecture for endoscopic image analysis, recent successes of the Transformer architecture in computer vision raise the possibility to update this conclusion. To this end, we evaluate and compare clinically relevant performance, generalization and robustness of state-of-the-art CNNs and Transformers for neoplasia detection in Barrett's esophagus. We have trained and validated several top-performing CNNs and Transformers on a total of 10,208 images (2,079 patients), and tested on a total of 7,118 images (998 patients) across multiple test sets, including a high-quality test set, two internal and two external generalization test sets, and a robustness test set. Furthermore, to expand the scope of the study, we have conducted the performance and robustness comparisons for colonic polyp segmentation (Kvasir-SEG) and angiodysplasia detection (Giana). The results obtained for featured models across a wide range of training set sizes demonstrate that Transformers achieve comparable performance as CNNs on various applications, show comparable or slightly improved generalization capabilities and offer equally strong resilience and robustness against common image corruptions and perturbations. These findings confirm the viability of the Transformer architecture, particularly suited to the dynamic nature of endoscopic video analysis, characterized by fluctuating image quality, appearance and equipment configurations in transition from hospital to hospital. The code is made publicly available at: https://github.com/BONS-AI-VCA-AMC/Endoscopy-CNNs-vs-Transformers.
Collapse
Affiliation(s)
- Carolus H J Kusters
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands.
| | - Tim J M Jaspers
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Tim G W Boers
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Martijn R Jong
- Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Jelmer B Jukema
- Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Kiki N Fockens
- Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Albert J de Groof
- Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Jacques J Bergman
- Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Fons van der Sommen
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Peter H N De With
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| |
Collapse
|
16
|
Nguyen DC, Nguyen HL. ColonNeXt: Fully Convolutional Attention for Polyp Segmentation. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01342-0. [PMID: 39658740 DOI: 10.1007/s10278-024-01342-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 10/21/2024] [Accepted: 11/09/2024] [Indexed: 12/12/2024]
Abstract
This study introduces ColonNeXt, a novel fully convolutional attention-based model for polyp segmentation from colonoscopy images, aimed at the enhancing early detection of colorectal cancer. Utilizing a purely convolutional neural network (CNN), ColonNeXt integrates an encoder-decoder structure with a hierarchical multi-scale context-aware network (MSCAN) in the encoder and a convolutional block attention module (CBAM) in the decoder. The decoder further includes a proposed CNN-based feature attention mechanism for selective feature enhancement, ensuring precise segmentation. A new refinement module effectively improves boundary accuracy, addressing challenges such as variable polyp size, complex textures, and inconsistent illumination. Evaluations on standard datasets show that ColonNeXt achieves high accuracy and efficiency, significantly outperforming competing methods. These results confirm its robustness and precision, establishing ColonNeXt as a state-of-the-art model for polyp segmentation. The code is available at: https://github.com/long-nguyen12/colonnext-pytorch .
Collapse
Affiliation(s)
- Dinh Cong Nguyen
- Hong Duc University, 565 Quang Trung, Dong Ve Ward, Thanh Hoa, 40000, Thanh Hoa, Viet Nam.
| | - Hoang Long Nguyen
- Hong Duc University, 565 Quang Trung, Dong Ve Ward, Thanh Hoa, 40000, Thanh Hoa, Viet Nam.
| |
Collapse
|
17
|
Song Z, Kang X, Wei X, Li S. Pixel-Centric Context Perception Network for Camouflaged Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18576-18589. [PMID: 37819817 DOI: 10.1109/tnnls.2023.3319323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Camouflaged object detection (COD) aims to identify object pixels visually embedded in the background environment. Existing deep learning methods fail to utilize the context information around different pixels adequately and efficiently. In order to solve this problem, a novel pixel-centric context perception network (PCPNet) is proposed, the core of which is to customize the personalized context of each pixel based on the automatic estimation of its surroundings. Specifically, PCPNet first employs an elegant encoder equipped with the designed vital component generation (VCG) module to obtain a set of compact features rich in low-level spatial and high-level semantic information across multiple subspaces. Then, we present a parameter-free pixel importance estimation (PIE) function based on multiwindow information fusion. Object pixels with complex backgrounds will be assigned with higher PIE values. Subsequently, PIE is utilized to regularize the optimization loss. In this way, the network can pay more attention to those pixels with higher PIE values in the decoding stage. Finally, a local continuity refinement module (LCRM) is used to refine the detection results. Extensive experiments on four COD benchmarks, five salient object detection (SOD) benchmarks, and five polyp segmentation benchmarks demonstrate the superiority of PCPNet with respect to other state-of-the-art methods.
Collapse
|
18
|
Erol T, Sarikaya D. PlutoNet: An efficient polyp segmentation network with modified partial decoder and decoder consistency training. Healthc Technol Lett 2024; 11:365-373. [PMID: 39720760 PMCID: PMC11665777 DOI: 10.1049/htl2.12105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Accepted: 11/25/2024] [Indexed: 12/26/2024] Open
Abstract
Deep learning models are used to minimize the number of polyps that goes unnoticed by the experts and to accurately segment the detected polyps during interventions. Although state-of-the-art models are proposed, it remains a challenge to define representations that are able to generalize well and that mediate between capturing low-level features and higher-level semantic details without being redundant. Another challenge with these models is that they are computation and memory intensive, which can pose a problem with real-time applications. To address these problems, PlutoNet is proposed for polyp segmentation which requires only 9 FLOPs and 2,626,537 parameters, less than 10% of the parameters required by its counterparts. With PlutoNet, a novel decoder consistency training approach is proposed that consists of a shared encoder, the modified partial decoder, which is a combination of the partial decoder and full-scale connections that capture salient features at different scales without redundancy, and the auxiliary decoder which focuses on higher-level semantic features. The modified partial decoder and the auxiliary decoder are trained with a combined loss to enforce consistency, which helps strengthen learned representations. Ablation studies and experiments are performed which show that PlutoNet performs significantly better than the state-of-the-art models, particularly on unseen datasets.
Collapse
Affiliation(s)
- Tugberk Erol
- Computer EngineeringGraduate School of Natural and Applied SciencesGazi UniversityAnkaraTürkiye
| | - Duygu Sarikaya
- School of Computer ScienceUniversity of LeedsLeedsUnited Kingdom
| |
Collapse
|
19
|
Xu Z, Miao Y, Chen G, Liu S, Chen H. GLGFormer: Global Local Guidance Network for Mucosal Lesion Segmentation in Gastrointestinal Endoscopy Images. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:2983-2995. [PMID: 38940891 PMCID: PMC11612111 DOI: 10.1007/s10278-024-01162-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 05/05/2024] [Accepted: 06/03/2024] [Indexed: 06/29/2024]
Abstract
Automatic mucosal lesion segmentation is a critical component in computer-aided clinical support systems for endoscopic image analysis. Image segmentation networks currently rely mainly on convolutional neural networks (CNNs) and Transformers, which have demonstrated strong performance in various applications. However, they cannot cope with blurred lesion boundaries and lesions of different scales in gastrointestinal endoscopy images. To address these challenges, we propose a new Transformer-based network, named GLGFormer, for the task of mucosal lesion segmentation. Specifically, we design the global guidance module to guide single-scale features patch-wise, enabling them to incorporate global information from the global map without information loss. Furthermore, a partial decoder is employed to fuse these enhanced single-scale features, achieving single-scale to multi-scale enhancement. Additionally, the local guidance module is designed to refocus attention on the neighboring patch, thus enhancing local features and refining lesion boundary segmentation. We conduct experiments on a private atrophic gastritis segmentation dataset and four public gastrointestinal polyp segmentation datasets. Compared to the current lesion segmentation networks, our proposed GLGFormer demonstrates outstanding learning and generalization capabilities. On the public dataset ClinicDB, GLGFormer achieved a mean intersection over union (mIoU) of 91.0% and a mean dice coefficient (mDice) of 95.0%. On the private dataset Gastritis-Seg, GLGFormer achieved an mIoU of 90.6% and an mDice of 94.6%.
Collapse
Affiliation(s)
- Zhiyang Xu
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, School of Information and Control Engineering, Advanced Robotics Research Center, China University of Mining and Technology, Xuzhou, Jiangsu, 221116, P. R. China
| | - Yanzi Miao
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, School of Information and Control Engineering, Advanced Robotics Research Center, China University of Mining and Technology, Xuzhou, Jiangsu, 221116, P. R. China.
| | - Guangxia Chen
- Department of Gastroenterology, Xuzhou Municipal Hospital Affiliated to Xuzhou Medical University, Xuzhou, Jiangsu, 221002, P. R. China
| | - Shiyu Liu
- Department of Gastroenterology, Xuzhou Municipal Hospital Affiliated to Xuzhou Medical University, Xuzhou, Jiangsu, 221002, P. R. China
| | - Hu Chen
- The First Clinical Medical School of Xuzhou Medical University, Xuzhou, Jiangsu, 221002, P. R. China
| |
Collapse
|
20
|
Peng C, Qian Z, Wang K, Zhang L, Luo Q, Bi Z, Zhang W. MugenNet: A Novel Combined Convolution Neural Network and Transformer Network with Application in Colonic Polyp Image Segmentation. SENSORS (BASEL, SWITZERLAND) 2024; 24:7473. [PMID: 39686010 DOI: 10.3390/s24237473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 11/18/2024] [Accepted: 11/21/2024] [Indexed: 12/18/2024]
Abstract
Accurate polyp image segmentation is of great significance, because it can help in the detection of polyps. Convolutional neural network (CNN) is a common automatic segmentation method, but its main disadvantage is the long training time. Transformer is another method that can be adapted to the automatic segmentation method by employing a self-attention mechanism, which essentially assigns different importance weights to each piece of information, thus achieving high computational efficiency during segmentation. However, a potential drawback with Transformer is the risk of information loss. The study reported in this paper employed the well-known hybridization principle to propose a method to combine CNN and Transformer to retain the strengths of both. Specifically, this study applied this method to the early detection of colonic polyps and to implement a model called MugenNet for colonic polyp image segmentation. We conducted a comprehensive experiment to compare MugenNet with other CNN models on five publicly available datasets. An ablation experiment on MugenNet was conducted as well. The experimental results showed that MugenNet can achieve a mean Dice of 0.714 on the ETIS dataset, which is the optimal performance on this dataset compared to other models, with an inference speed of 56 FPS. The overall outcome of this study is a method to optimally combine two methods of machine learning which are complementary to each other.
Collapse
Affiliation(s)
- Chen Peng
- School of Mechanical and Power Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Zhiqin Qian
- School of Mechanical and Power Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Kunyu Wang
- School of Mechanical and Power Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Lanzhu Zhang
- School of Mechanical and Power Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Qi Luo
- School of Mechanical and Power Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Zhuming Bi
- Department of Engineering, Purdue University, West Lafayette, IN 47907, USA
| | - Wenjun Zhang
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| |
Collapse
|
21
|
Wang L, Wan J, Meng X, Chen B, Shao W. MCH-PAN: gastrointestinal polyp detection model integrating multi-scale feature information. Sci Rep 2024; 14:23382. [PMID: 39379452 PMCID: PMC11461898 DOI: 10.1038/s41598-024-74609-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Accepted: 09/27/2024] [Indexed: 10/10/2024] Open
Abstract
The rise of object detection models has brought new breakthroughs to the development of clinical decision support systems. However, in the field of gastrointestinal polyp detection, there are still challenges such as uncertainty in polyp identification and inadequate coping with polyp scale variations. To address these challenges, this paper proposes a novel gastrointestinal polyp object detection model. The model can automatically identify polyp regions in gastrointestinal images and accurately label them. In terms of design, the model integrates multi-channel information to enhance the ability and robustness of channel feature expression, thus better coping with the complexity of polyp structures. At the same time, a hierarchical structure is constructed in the model to enhance the model's adaptability to multi-scale targets, effectively addressing the problem of large-scale variations in polyps. Furthermore, a channel attention mechanism is designed in the model to improve the accuracy of target positioning and reduce uncertainty in diagnosis. By integrating these strategies, the proposed gastrointestinal polyp object detection model can achieve accurate polyp detection, providing clinicians with reliable and valuable references. Experimental results show that the model exhibits superior performance in gastrointestinal polyp detection, which helps improve the diagnostic level of digestive system diseases and provides useful references for related research fields.
Collapse
Affiliation(s)
- Ling Wang
- Faculty of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian, 223003, China.
| | - Jingjing Wan
- Department of Gastroenterology, The Second People's Hospital of Huai'an, The Affiliated Huai'an Hospital of Xuzhou Medical University, Huaian, 223002, China.
| | - Xianchun Meng
- Faculty of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian, 223003, China
| | - Bolun Chen
- Faculty of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian, 223003, China
| | - Wei Shao
- Nanjing University of Aeronautics and Astronautics Shenzhen Research Institute, Shenzhen, 518038, China.
| |
Collapse
|
22
|
Cai L, Chen L, Huang J, Wang Y, Zhang Y. Know your orientation: A viewpoint-aware framework for polyp segmentation. Med Image Anal 2024; 97:103288. [PMID: 39096844 DOI: 10.1016/j.media.2024.103288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 07/23/2024] [Accepted: 07/24/2024] [Indexed: 08/05/2024]
Abstract
Automatic polyp segmentation in endoscopic images is critical for the early diagnosis of colorectal cancer. Despite the availability of powerful segmentation models, two challenges still impede the accuracy of polyp segmentation algorithms. Firstly, during a colonoscopy, physicians frequently adjust the orientation of the colonoscope tip to capture underlying lesions, resulting in viewpoint changes in the colonoscopy images. These variations increase the diversity of polyp visual appearance, posing a challenge for learning robust polyp features. Secondly, polyps often exhibit properties similar to the surrounding tissues, leading to indistinct polyp boundaries. To address these problems, we propose a viewpoint-aware framework named VANet for precise polyp segmentation. In VANet, polyps are emphasized as a discriminative feature and thus can be localized by class activation maps in a viewpoint classification process. With these polyp locations, we design a viewpoint-aware Transformer (VAFormer) to alleviate the erosion of attention by the surrounding tissues, thereby inducing better polyp representations. Additionally, to enhance the polyp boundary perception of the network, we develop a boundary-aware Transformer (BAFormer) to encourage self-attention towards uncertain regions. As a consequence, the combination of the two modules is capable of calibrating predictions and significantly improving polyp segmentation performance. Extensive experiments on seven public datasets across six metrics demonstrate the state-of-the-art results of our method, and VANet can handle colonoscopy images in real-world scenarios effectively. The source code is available at https://github.com/1024803482/Viewpoint-Aware-Network.
Collapse
Affiliation(s)
- Linghan Cai
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China; Department of Electronic Information Engineering, Beihang University, Beijing, 100191, China.
| | - Lijiang Chen
- Department of Electronic Information Engineering, Beihang University, Beijing, 100191, China
| | - Jianhao Huang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China
| | - Yifeng Wang
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China
| | - Yongbing Zhang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China.
| |
Collapse
|
23
|
Xu W, Xu R, Wang C, Li X, Xu S, Guo L. PSTNet: Enhanced Polyp Segmentation With Multi-Scale Alignment and Frequency Domain Integration. IEEE J Biomed Health Inform 2024; 28:6042-6053. [PMID: 38954569 DOI: 10.1109/jbhi.2024.3421550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
Accurate segmentation of colorectal polyps in colonoscopy images is crucial for effective diagnosis and management of colorectal cancer (CRC). However, current deep learning-based methods primarily rely on fusing RGB information across multiple scales, leading to limitations in accurately identifying polyps due to restricted RGB domain information and challenges in feature misalignment during multi-scale aggregation. To address these limitations, we propose the Polyp Segmentation Network with Shunted Transformer (PSTNet), a novel approach that integrates both RGB and frequency domain cues present in the images. PSTNet comprises three key modules: the Frequency Characterization Attention Module (FCAM) for extracting frequency cues and capturing polyp characteristics, the Feature Supplementary Alignment Module (FSAM) for aligning semantic information and reducing misalignment noise, and the Cross Perception localization Module (CPM) for synergizing frequency cues with high-level semantics to achieve efficient polyp segmentation. Extensive experiments on challenging datasets demonstrate PSTNet's significant improvement in polyp segmentation accuracy across various metrics, consistently outperforming state-of-the-art methods. The integration of frequency domain cues and the novel architectural design of PSTNet contribute to advancing computer-assisted polyp segmentation, facilitating more accurate diagnosis and management of CRC.
Collapse
|
24
|
Manan MA, Feng J, Yaqub M, Ahmed S, Imran SMA, Chuhan IS, Khan HA. Multi-scale and multi-path cascaded convolutional network for semantic segmentation of colorectal polyps. ALEXANDRIA ENGINEERING JOURNAL 2024; 105:341-359. [DOI: 10.1016/j.aej.2024.06.095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
|
25
|
Dai D, Dong C, Yan Q, Sun Y, Zhang C, Li Z, Xu S. I 2U-Net: A dual-path U-Net with rich information interaction for medical image segmentation. Med Image Anal 2024; 97:103241. [PMID: 38897032 DOI: 10.1016/j.media.2024.103241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 04/27/2024] [Accepted: 06/10/2024] [Indexed: 06/21/2024]
Abstract
Although the U-shape networks have achieved remarkable performances in many medical image segmentation tasks, they rarely model the sequential relationship of hierarchical layers. This weakness makes it difficult for the current layer to effectively utilize the historical information of the previous layer, leading to unsatisfactory segmentation results for lesions with blurred boundaries and irregular shapes. To solve this problem, we propose a novel dual-path U-Net, dubbed I2U-Net. The newly proposed network encourages historical information re-usage and re-exploration through rich information interaction among the dual paths, allowing deep layers to learn more comprehensive features that contain both low-level detail description and high-level semantic abstraction. Specifically, we introduce a multi-functional information interaction module (MFII), which can model cross-path, cross-layer, and cross-path-and-layer information interactions via a unified design, making the proposed I2U-Net behave similarly to an unfolded RNN and enjoying its advantage of modeling time sequence information. Besides, to further selectively and sensitively integrate the information extracted by the encoder of the dual paths, we propose a holistic information fusion and augmentation module (HIFA), which can efficiently bridge the encoder and the decoder. Extensive experiments on four challenging tasks, including skin lesion, polyp, brain tumor, and abdominal multi-organ segmentation, consistently show that the proposed I2U-Net has superior performance and generalization ability over other state-of-the-art methods. The code is available at https://github.com/duweidai/I2U-Net.
Collapse
Affiliation(s)
- Duwei Dai
- National-Local Joint Engineering Research Center of Biodiagnosis & Biotherapy, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China; Institute of Medical Artificial Intelligence, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China
| | - Caixia Dong
- Institute of Medical Artificial Intelligence, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China
| | - Qingsen Yan
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Yongheng Sun
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Chunyan Zhang
- National-Local Joint Engineering Research Center of Biodiagnosis & Biotherapy, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China
| | - Zongfang Li
- National-Local Joint Engineering Research Center of Biodiagnosis & Biotherapy, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China; Institute of Medical Artificial Intelligence, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China.
| | - Songhua Xu
- Institute of Medical Artificial Intelligence, the Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China.
| |
Collapse
|
26
|
Paderno A, Bedi N, Rau A, Holsinger CF. Computer Vision and Videomics in Otolaryngology-Head and Neck Surgery: Bridging the Gap Between Clinical Needs and the Promise of Artificial Intelligence. Otolaryngol Clin North Am 2024; 57:703-718. [PMID: 38981809 DOI: 10.1016/j.otc.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
This article discusses the role of computer vision in otolaryngology, particularly through endoscopy and surgery. It covers recent applications of artificial intelligence (AI) in nonradiologic imaging within otolaryngology, noting the benefits and challenges, such as improving diagnostic accuracy and optimizing therapeutic outcomes, while also pointing out the necessity for enhanced data curation and standardized research methodologies to advance clinical applications. Technical aspects are also covered, providing a detailed view of the progression from manual feature extraction to more complex AI models, including convolutional neural networks and vision transformers and their potential application in clinical settings.
Collapse
Affiliation(s)
- Alberto Paderno
- IRCCS Humanitas Research Hospital, via Manzoni 56, Rozzano, Milan 20089, Italy; Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, Pieve Emanuele, Milan 20072, Italy.
| | - Nikita Bedi
- Division of Head and Neck Surgery, Department of Otolaryngology, Stanford University, Palo Alto, CA, USA
| | - Anita Rau
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | | |
Collapse
|
27
|
Oukdach Y, Garbaz A, Kerkaou Z, El Ansari M, Koutti L, El Ouafdi AF, Salihoun M. UViT-Seg: An Efficient ViT and U-Net-Based Framework for Accurate Colorectal Polyp Segmentation in Colonoscopy and WCE Images. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:2354-2374. [PMID: 38671336 PMCID: PMC11522253 DOI: 10.1007/s10278-024-01124-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/01/2024] [Accepted: 04/13/2024] [Indexed: 04/28/2024]
Abstract
Colorectal cancer (CRC) stands out as one of the most prevalent global cancers. The accurate localization of colorectal polyps in endoscopy images is pivotal for timely detection and removal, contributing significantly to CRC prevention. The manual analysis of images generated by gastrointestinal screening technologies poses a tedious task for doctors. Therefore, computer vision-assisted cancer detection could serve as an efficient tool for polyp segmentation. Numerous efforts have been dedicated to automating polyp localization, with the majority of studies relying on convolutional neural networks (CNNs) to learn features from polyp images. Despite their success in polyp segmentation tasks, CNNs exhibit significant limitations in precisely determining polyp location and shape due to their sole reliance on learning local features from images. While gastrointestinal images manifest significant variation in their features, encompassing both high- and low-level ones, a framework that combines the ability to learn both features of polyps is desired. This paper introduces UViT-Seg, a framework designed for polyp segmentation in gastrointestinal images. Operating on an encoder-decoder architecture, UViT-Seg employs two distinct feature extraction methods. A vision transformer in the encoder section captures long-range semantic information, while a CNN module, integrating squeeze-excitation and dual attention mechanisms, captures low-level features, focusing on critical image regions. Experimental evaluations conducted on five public datasets, including CVC clinic, ColonDB, Kvasir-SEG, ETIS LaribDB, and Kvasir Capsule-SEG, demonstrate UViT-Seg's effectiveness in polyp localization. To confirm its generalization performance, the model is tested on datasets not used in training. Benchmarking against common segmentation methods and state-of-the-art polyp segmentation approaches, the proposed model yields promising results. For instance, it achieves a mean Dice coefficient of 0.915 and a mean intersection over union of 0.902 on the CVC Colon dataset. Furthermore, UViT-Seg has the advantage of being efficient, requiring fewer computational resources for both training and testing. This feature positions it as an optimal choice for real-world deployment scenarios.
Collapse
Affiliation(s)
- Yassine Oukdach
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco.
| | - Anass Garbaz
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Zakaria Kerkaou
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Mohamed El Ansari
- Informatics and Applications Laboratory, Department of Computer Sciences, Faculty of Science, Moulay Ismail University, B.P 11201, Meknès, 52000, Morocco
| | - Lahcen Koutti
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Ahmed Fouad El Ouafdi
- LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, Agadir, 80000, Morocco
| | - Mouna Salihoun
- Faculty of Medicine and Pharmacy of Rabat, Mohammed V University of Rabat, Rabat, 10000, Morocco
| |
Collapse
|
28
|
Tudela Y, Majó M, de la Fuente N, Galdran A, Krenzer A, Puppe F, Yamlahi A, Tran TN, Matuszewski BJ, Fitzgerald K, Bian C, Pan J, Liu S, Fernández-Esparrach G, Histace A, Bernal J. A complete benchmark for polyp detection, segmentation and classification in colonoscopy images. Front Oncol 2024; 14:1417862. [PMID: 39381041 PMCID: PMC11458519 DOI: 10.3389/fonc.2024.1417862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 07/11/2024] [Indexed: 10/10/2024] Open
Abstract
Introduction Colorectal cancer (CRC) is one of the main causes of deaths worldwide. Early detection and diagnosis of its precursor lesion, the polyp, is key to reduce its mortality and to improve procedure efficiency. During the last two decades, several computational methods have been proposed to assist clinicians in detection, segmentation and classification tasks but the lack of a common public validation framework makes it difficult to determine which of them is ready to be deployed in the exploration room. Methods This study presents a complete validation framework and we compare several methodologies for each of the polyp characterization tasks. Results Results show that the majority of the approaches are able to provide good performance for the detection and segmentation task, but that there is room for improvement regarding polyp classification. Discussion While studied show promising results in the assistance of polyp detection and segmentation tasks, further research should be done in classification task to obtain reliable results to assist the clinicians during the procedure. The presented framework provides a standarized method for evaluating and comparing different approaches, which could facilitate the identification of clinically prepared assisting methods.
Collapse
Affiliation(s)
- Yael Tudela
- Computer Vision Center and Computer Science Department, Universitat Autònoma de Cerdanyola del Valles, Barcelona, Spain
| | - Mireia Majó
- Computer Vision Center and Computer Science Department, Universitat Autònoma de Cerdanyola del Valles, Barcelona, Spain
| | - Neil de la Fuente
- Computer Vision Center and Computer Science Department, Universitat Autònoma de Cerdanyola del Valles, Barcelona, Spain
| | - Adrian Galdran
- Department of Information and Communication Technologies, SymBioSys Research Group, BCNMedTech, Barcelona, Spain
| | - Adrian Krenzer
- Artificial Intelligence and Knowledge Systems, Institute for Computer Science, Julius-Maximilians University of Würzburg, Würzburg, Germany
| | - Frank Puppe
- Artificial Intelligence and Knowledge Systems, Institute for Computer Science, Julius-Maximilians University of Würzburg, Würzburg, Germany
| | - Amine Yamlahi
- Division of Intelligent Medical Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Thuy Nuong Tran
- Division of Intelligent Medical Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Bogdan J. Matuszewski
- Computer Vision and Machine Learning (CVML) Research Group, University of Central Lancashir (UCLan), Preston, United Kingdom
| | - Kerr Fitzgerald
- Computer Vision and Machine Learning (CVML) Research Group, University of Central Lancashir (UCLan), Preston, United Kingdom
| | - Cheng Bian
- Hebei University of Technology, Baoding, China
| | | | - Shijle Liu
- Hebei University of Technology, Baoding, China
| | | | - Aymeric Histace
- ETIS UMR 8051, École Nationale Supérieure de l'Électronique et de ses Applications (ENSEA), Centre national de la recherche scientifique (CNRS), CY Paris Cergy University, Cergy, France
| | - Jorge Bernal
- Computer Vision Center and Computer Science Department, Universitat Autònoma de Cerdanyola del Valles, Barcelona, Spain
| |
Collapse
|
29
|
Meng L, Li Y, Duan W. Three-stage polyp segmentation network based on reverse attention feature purification with Pyramid Vision Transformer. Comput Biol Med 2024; 179:108930. [PMID: 39067285 DOI: 10.1016/j.compbiomed.2024.108930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 06/30/2024] [Accepted: 07/18/2024] [Indexed: 07/30/2024]
Abstract
Colorectal polyps serve as potential precursors of colorectal cancer and automating polyp segmentation aids physicians in accurately identifying potential polyp regions, thereby reducing misdiagnoses and missed diagnoses. However, existing models often fall short in accurately segmenting polyps due to the high degree of similarity between polyp regions and surrounding tissue in terms of color, texture, and shape. To address this challenge, this study proposes a novel three-stage polyp segmentation network, named Reverse Attention Feature Purification with Pyramid Vision Transformer (RAFPNet), which adopts an iterative feedback UNet architecture to refine polyp saliency maps for precise segmentation. Initially, a Multi-Scale Feature Aggregation (MSFA) module is introduced to generate preliminary polyp saliency maps. Subsequently, a Reverse Attention Feature Purification (RAFP) module is devised to effectively suppress low-level surrounding tissue features while enhancing high-level semantic polyp information based on the preliminary saliency maps. Finally, the UNet architecture is leveraged to further refine the feature maps in a coarse-to-fine approach. Extensive experiments conducted on five widely used polyp segmentation datasets and three video polyp segmentation datasets demonstrate the superior performance of RAFPNet over state-of-the-art models across multiple evaluation metrics.
Collapse
Affiliation(s)
- Lingbing Meng
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China
| | - Yuting Li
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China
| | - Weiwei Duan
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China.
| |
Collapse
|
30
|
Arsa DMS, Ilyas T, Park SH, Chua L, Kim H. Efficient multi-stage feedback attention for diverse lesion in cancer image segmentation. Comput Med Imaging Graph 2024; 116:102417. [PMID: 39067303 DOI: 10.1016/j.compmedimag.2024.102417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 04/11/2024] [Accepted: 07/10/2024] [Indexed: 07/30/2024]
Abstract
In the domain of Computer-Aided Diagnosis (CAD) systems, the accurate identification of cancer lesions is paramount, given the life-threatening nature of cancer and the complexities inherent in its manifestation. This task is particularly arduous due to the often vague boundaries of cancerous regions, compounded by the presence of noise and the heterogeneity in the appearance of lesions, making precise segmentation a critical yet challenging endeavor. This study introduces an innovative, an iterative feedback mechanism tailored for the nuanced detection of cancer lesions in a variety of medical imaging modalities, offering a refining phase to adjust detection results. The core of our approach is the elimination of the need for an initial segmentation mask, a common limitation in iterative-based segmentation methods. Instead, we utilize a novel system where the feedback for refining segmentation is derived directly from the encoder-decoder architecture of our neural network model. This shift allows for more dynamic and accurate lesion identification. To further enhance the accuracy of our CAD system, we employ a multi-scale feedback attention mechanism to guide and refine predicted mask subsequent iterations. In parallel, we introduce a sophisticated weighted feedback loss function. This function synergistically combines global and iteration-specific loss considerations, thereby refining parameter estimation and improving the overall precision of the segmentation. We conducted comprehensive experiments across three distinct categories of medical imaging: colonoscopy, ultrasonography, and dermoscopic images. The experimental results demonstrate that our method not only competes favorably with but also surpasses current state-of-the-art methods in various scenarios, including both standard and challenging out-of-domain tasks. This evidences the robustness and versatility of our approach in accurately identifying cancer lesions across a spectrum of medical imaging contexts. Our source code can be found at https://github.com/dewamsa/EfficientFeedbackNetwork.
Collapse
Affiliation(s)
- Dewa Made Sri Arsa
- Division of Electronics and Information Engineering, Jeonbuk National University, Republic of Korea; Department of Information Technology, Universitas Udayana, Indonesia; Core Research Institute of Intelligent Robots, Jeonbuk National University, Republic of Korea.
| | - Talha Ilyas
- Division of Electronics and Information Engineering, Jeonbuk National University, Republic of Korea; Core Research Institute of Intelligent Robots, Jeonbuk National University, Republic of Korea.
| | - Seok-Hwan Park
- Division of Electronic Engineering, Jeonbuk National University, Republic of Korea.
| | - Leon Chua
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA.
| | - Hyongsuk Kim
- Core Research Institute of Intelligent Robots, Jeonbuk National University, Republic of Korea.
| |
Collapse
|
31
|
Tang S, Ran H, Yang S, Wang Z, Li W, Li H, Meng Z. A frequency selection network for medical image segmentation. Heliyon 2024; 10:e35698. [PMID: 39220902 PMCID: PMC11365330 DOI: 10.1016/j.heliyon.2024.e35698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 07/18/2024] [Accepted: 08/01/2024] [Indexed: 09/04/2024] Open
Abstract
Existing medical image segmentation methods may only consider feature extraction and information processing in spatial domain, or lack the design of interaction between frequency information and spatial information, or ignore the semantic gaps between shallow and deep features, and lead to inaccurate segmentation results. Therefore, in this paper, we propose a novel frequency selection segmentation network (FSSN), which achieves more accurate lesion segmentation by fusing local spatial features and global frequency information, better design of feature interactions, and suppressing low correlation frequency components for mitigating semantic gaps. Firstly, we propose a global-local feature aggregation module (GLAM) to simultaneously capture multi-scale local features in the spatial domain and exploits global frequency information in the frequency domain, and achieves complementary fusion of local details features and global frequency information. Secondly, we propose a feature filter module (FFM) to mitigate semantic gaps when we conduct cross-level features fusion, and makes FSSN discriminatively determine which frequency information should be preserved for accurate lesion segmentation. Finally, in order to make better use of local information, especially the boundary of lesion region, we employ deformable convolution (DC) to extract pertinent features in the local range, and makes our FSSN can focus on relevant image contents better. Extensive experiments on two public benchmark datasets show that compared with representative medical image segmentation methods, our FSSN can obtain more accurate lesion segmentation results in terms of both objective evaluation indicators and subjective visual effects with fewer parameters and lower computational complexity.
Collapse
Affiliation(s)
- Shu Tang
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Haiheng Ran
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Shuli Yang
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Zhaoxia Wang
- Chongqing Emergency Medical Center, Chongqing University Central Hospital, School of Medicine, Chongqing University, Chongqing, China
| | - Wei Li
- Children’s Hospital of Chongqing Medical University, China
| | - Haorong Li
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Zihao Meng
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| |
Collapse
|
32
|
Chang Q, Ahmad D, Toth J, Bascom R, Higgins WE. ESFPNet: Efficient Stage-Wise Feature Pyramid on Mix Transformer for Deep Learning-Based Cancer Analysis in Endoscopic Video. J Imaging 2024; 10:191. [PMID: 39194980 DOI: 10.3390/jimaging10080191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 07/19/2024] [Accepted: 08/01/2024] [Indexed: 08/29/2024] Open
Abstract
For patients at risk of developing either lung cancer or colorectal cancer, the identification of suspect lesions in endoscopic video is an important procedure. The physician performs an endoscopic exam by navigating an endoscope through the organ of interest, be it the lungs or intestinal tract, and performs a visual inspection of the endoscopic video stream to identify lesions. Unfortunately, this entails a tedious, error-prone search over a lengthy video sequence. We propose a deep learning architecture that enables the real-time detection and segmentation of lesion regions from endoscopic video, with our experiments focused on autofluorescence bronchoscopy (AFB) for the lungs and colonoscopy for the intestinal tract. Our architecture, dubbed ESFPNet, draws on a pretrained Mix Transformer (MiT) encoder and a decoder structure that incorporates a new Efficient Stage-Wise Feature Pyramid (ESFP) to promote accurate lesion segmentation. In comparison to existing deep learning models, the ESFPNet model gave superior lesion segmentation performance for an AFB dataset. It also produced superior segmentation results for three widely used public colonoscopy databases and nearly the best results for two other public colonoscopy databases. In addition, the lightweight ESFPNet architecture requires fewer model parameters and less computation than other competing models, enabling the real-time analysis of input video frames. Overall, these studies point to the combined superior analysis performance and architectural efficiency of the ESFPNet for endoscopic video analysis. Lastly, additional experiments with the public colonoscopy databases demonstrate the learning ability and generalizability of ESFPNet, implying that the model could be effective for region segmentation in other domains.
Collapse
Affiliation(s)
- Qi Chang
- School of Electrical Engineering and Computer Science, Penn State University, University Park, PA 16802, USA
| | - Danish Ahmad
- Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA
| | - Jennifer Toth
- Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA
| | - Rebecca Bascom
- Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA
| | - William E Higgins
- School of Electrical Engineering and Computer Science, Penn State University, University Park, PA 16802, USA
| |
Collapse
|
33
|
Jiang Y, Zhang Z, Hu Y, Li G, Wan X, Wu S, Cui S, Huang S, Li Z. ECC-PolypDet: Enhanced CenterNet With Contrastive Learning for Automatic Polyp Detection. IEEE J Biomed Health Inform 2024; 28:4785-4796. [PMID: 37983159 DOI: 10.1109/jbhi.2023.3334240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Accurate polyp detection is critical for early colorectal cancer diagnosis. Although remarkable progress has been achieved in recent years, the complex colon environment and concealed polyps with unclear boundaries still pose severe challenges in this area. Existing methods either involve computationally expensive context aggregation or lack prior modeling of polyps, resulting in poor performance in challenging cases. In this paper, we propose the Enhanced CenterNet with Contrastive Learning (ECC-PolypDet), a two-stage training & end-to-end inference framework that leverages images and bounding box annotations to train a general model and fine-tune it based on the inference score to obtain a final robust model. Specifically, we conduct Box-assisted Contrastive Learning (BCL) during training to minimize the intra-class difference and maximize the inter-class difference between foreground polyps and backgrounds, enabling our model to capture concealed polyps. Moreover, to enhance the recognition of small polyps, we design the Semantic Flow-guided Feature Pyramid Network (SFFPN) to aggregate multi-scale features and the Heatmap Propagation (HP) module to boost the model's attention on polyp targets. In the fine-tuning stage, we introduce the IoU-guided Sample Re-weighting (ISR) mechanism to prioritize hard samples by adaptively adjusting the loss weight for each sample during fine-tuning. Extensive experiments on six large-scale colonoscopy datasets demonstrate the superiority of our model compared with previous state-of-the-art detectors.
Collapse
|
34
|
Nerella S, Bandyopadhyay S, Zhang J, Contreras M, Siegel S, Bumin A, Silva B, Sena J, Shickel B, Bihorac A, Khezeli K, Rashidi P. Transformers and large language models in healthcare: A review. Artif Intell Med 2024; 154:102900. [PMID: 38878555 PMCID: PMC11638972 DOI: 10.1016/j.artmed.2024.102900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 05/28/2024] [Accepted: 05/30/2024] [Indexed: 08/09/2024]
Abstract
With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of healthcare data, including clinical NLP, medical imaging, structured Electronic Health Records (EHR), social media, bio-physiological signals, biomolecular sequences. Furthermore, which have also include the articles that used the transformer architecture for generating surgical instructions and predicting adverse outcomes after surgeries under the umbrella of critical care. Under diverse settings, these models have been used for clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. Finally, we also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
Collapse
Affiliation(s)
- Subhash Nerella
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | | | - Jiaqing Zhang
- Department of Electrical and Computer Engineering, University of Florida, Gainesville, United States
| | - Miguel Contreras
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Scott Siegel
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Aysegul Bumin
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Brandon Silva
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Jessica Sena
- Department Of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Benjamin Shickel
- Department of Medicine, University of Florida, Gainesville, United States
| | - Azra Bihorac
- Department of Medicine, University of Florida, Gainesville, United States
| | - Kia Khezeli
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Parisa Rashidi
- Department of Biomedical Engineering, University of Florida, Gainesville, United States.
| |
Collapse
|
35
|
Liu S, Lin Y, Liu D. FreqSNet: a multiaxial integration of frequency and spatial domains for medical image segmentation. Phys Med Biol 2024; 69:145011. [PMID: 38959911 DOI: 10.1088/1361-6560/ad5ef3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Accepted: 07/03/2024] [Indexed: 07/05/2024]
Abstract
Objective.In recent years, convolutional neural networks, which typically focus on extracting spatial domain features, have shown limitations in learning global contextual information. However, frequency domain can offer a global perspective that spatial domain methods often struggle to capture. To address this limitation, we propose FreqSNet, which leverages both frequency and spatial features for medical image segmentation.Approach.To begin, we propose a frequency-space representation aggregation block (FSRAB) to replace conventional convolutions. FSRAB contains three frequency domain branches to capture global frequency information along different axial combinations, while a convolutional branch is designed to interact information across channels in local spatial features. Secondly, the multiplex expansion attention block extracts long-range dependency information using dilated convolutional blocks, while suppressing irrelevant information via attention mechanisms. Finally, the introduced Feature Integration Block enhances feature representation by integrating semantic features that fuse spatial and channel positional information.Main results.We validated our method on 5 public datasets, including BUSI, CVC-ClinicDB, CVC-ColonDB, ISIC-2018, and Luna16. On these datasets, our method achieved Intersection over Union (IoU) scores of 75.46%, 87.81%, 79.08%, 84.04%, and 96.99%, and Hausdorff distance values of 22.22 mm, 13.20 mm, 13.08 mm, 13.51 mm, and 5.22 mm, respectively. Compared to other state-of-the-art methods, our FreqSNet achieves better segmentation results.Significance.Our method can effectively combine frequency domain information with spatial domain features, enhancing the segmentation performance and generalization capability in medical image segmentation tasks.
Collapse
Affiliation(s)
- Shangwang Liu
- The School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, People's Republic of China
- Engineering Lab of Intelligence Business and Internet of Things, Henan Normal University, Xinxiang 453007, People's Republic of China
| | - Yinghai Lin
- The School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, People's Republic of China
- Engineering Lab of Intelligence Business and Internet of Things, Henan Normal University, Xinxiang 453007, People's Republic of China
| | - Danyang Liu
- The School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, People's Republic of China
- Engineering Lab of Intelligence Business and Internet of Things, Henan Normal University, Xinxiang 453007, People's Republic of China
| |
Collapse
|
36
|
Huang C, Shi Y, Zhang B, Lyu K. Uncertainty-aware prototypical learning for anomaly detection in medical images. Neural Netw 2024; 175:106284. [PMID: 38593560 DOI: 10.1016/j.neunet.2024.106284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 03/14/2024] [Accepted: 03/29/2024] [Indexed: 04/11/2024]
Abstract
Anomalous object detection (AOD) in medical images aims to recognize the anomalous lesions, and is crucial for early clinical diagnosis of various cancers. However, it is a difficult task because of two reasons: (1) the diversity of the anomalous lesions and (2) the ambiguity of the boundary between anomalous lesions and their normal surroundings. Unlike existing single-modality AOD models based on deterministic mapping, we constructed a probabilistic and deterministic AOD model. Specifically, we designed an uncertainty-aware prototype learning framework, which considers the diversity and ambiguity of anomalous lesions. A prototypical learning transformer (Pformer) is established to extract and store the prototype features of different anomalous lesions. Moreover, Bayesian neural uncertainty quantizer, a probabilistic model, is designed to model the distributions over the outputs of the model to measure the uncertainty of the model's detection results for each pixel. Essentially, the uncertainty of the model's anomaly detection result for a pixel can reflect the anomalous ambiguity of this pixel. Furthermore, an uncertainty-guided reasoning transformer (Uformer) is devised to employ the anomalous ambiguity, encouraging the proposed model to focus on pixels with high uncertainty. Notably, prototypical representations stored in Pformer are also utilized in anomaly reasoning that enables the model to perceive diversities of the anomalous objects. Extensive experiments on five benchmark datasets demonstrate the superiority of our proposed method. The source code will be available in github.com/umchaohuang/UPformer.
Collapse
Affiliation(s)
- Chao Huang
- PAMI Research Group, Department of Computer and Information Science, University of Macau, Taipa, 519000, Macao Special Administrative Region of China; Shenzhen Campus of Sun Yat-sen University, School of Cyber Science and Technology, Shenzhen, 518107, China
| | - Yushu Shi
- Shenzhen Campus of Sun Yat-sen University, School of Cyber Science and Technology, Shenzhen, 518107, China
| | - Bob Zhang
- PAMI Research Group, Department of Computer and Information Science, University of Macau, Taipa, 519000, Macao Special Administrative Region of China.
| | - Ke Lyu
- School of Engineering Sciences, University of the Chinese Academy of Sciences, Beijing, 100049, China; Pengcheng Laboratory, Shenzhen, 518055, China
| |
Collapse
|
37
|
Wan L, Chen Z, Xiao Y, Zhao J, Feng W, Fu H. Iterative feedback-based models for image and video polyp segmentation. Comput Biol Med 2024; 177:108569. [PMID: 38781640 DOI: 10.1016/j.compbiomed.2024.108569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 03/27/2024] [Accepted: 05/05/2024] [Indexed: 05/25/2024]
Abstract
Accurate segmentation of polyps in colonoscopy images has gained significant attention in recent years, given its crucial role in automated colorectal cancer diagnosis. Many existing deep learning-based methods follow a one-stage processing pipeline, often involving feature fusion across different levels or utilizing boundary-related attention mechanisms. Drawing on the success of applying Iterative Feedback Units (IFU) in image polyp segmentation, this paper proposes FlowICBNet by extending the IFU to the domain of video polyp segmentation. By harnessing the unique capabilities of IFU to propagate and refine past segmentation results, our method proves effective in mitigating challenges linked to the inherent limitations of endoscopic imaging, notably the presence of frequent camera shake and frame defocusing. Furthermore, in FlowICBNet, we introduce two pivotal modules: Reference Frame Selection (RFS) and Flow Guided Warping (FGW). These modules play a crucial role in filtering and selecting the most suitable historical reference frames for the task at hand. The experimental results on a large video polyp segmentation dataset demonstrate that our method can significantly outperform state-of-the-art methods by notable margins achieving an average metrics improvement of 7.5% on SUN-SEG-Easy and 7.4% on SUN-SEG-Hard. Our code is available at https://github.com/eraserNut/ICBNet.
Collapse
Affiliation(s)
- Liang Wan
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Zhihao Chen
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Yefan Xiao
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Junting Zhao
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Wei Feng
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Huazhu Fu
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Singapore, 138632, Republic of Singapore.
| |
Collapse
|
38
|
Cao J, Wang X, Qu Z, Zhuo L, Li X, Zhang H, Yang Y, Wei W. WDFF-Net: Weighted Dual-Branch Feature Fusion Network for Polyp Segmentation With Object-Aware Attention Mechanism. IEEE J Biomed Health Inform 2024; 28:4118-4131. [PMID: 38536686 DOI: 10.1109/jbhi.2024.3381891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]
Abstract
Colon polyps in colonoscopy images exhibit significant differences in color, size, shape, appearance, and location, posing significant challenges to accurate polyp segmentation. In this paper, a Weighted Dual-branch Feature Fusion Network is proposed for Polyp Segmentation, named WDFF-Net, which adopts HarDNet68 as the backbone network. First, a dual-branch feature fusion network architecture is constructed, which includes a shared feature extractor and two feature fusion branches, i.e. Progressive Feature Fusion (PFF) branch and Scale-aware Feature Fusion (SFF) branch. The branches fuse the deep features of multiple layers for different purposes and with different fusion ways. The PFF branch is to address the under-segmentation or over-segmentation problems of flat polyps with low-edge contrast by iteratively fusing the features from low, medium, and high layers. The SFF branch is to tackle the the problem of drastic variations in polyp size and shape, especially the missed segmentation problem for small polyps. These two branches are complementary and play different roles, in improving segmentation accuracy. Second, an Object-aware Attention Mechanism (OAM) is proposed to enhance the features of the target regions and suppress those of the background regions, to interfere with the segmentation performance. Third, a weighted dual-branch the segmentation loss function is specifically designed, which dynamically assigns the weight factors of the loss functions for two branches to optimize their collaborative training. Experimental results on five public colon polyp datasets demonstrate that, the proposed WDFF-Net can achieve a superior segmentation performance with lower model complexity and faster inference speed, while maintaining good generalization ability.
Collapse
|
39
|
Ji Z, Li X, Liu J, Chen R, Liao Q, Lyu T, Zhao L. LightCF-Net: A Lightweight Long-Range Context Fusion Network for Real-Time Polyp Segmentation. Bioengineering (Basel) 2024; 11:545. [PMID: 38927781 PMCID: PMC11201063 DOI: 10.3390/bioengineering11060545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 05/22/2024] [Accepted: 05/24/2024] [Indexed: 06/28/2024] Open
Abstract
Automatically segmenting polyps from colonoscopy videos is crucial for developing computer-assisted diagnostic systems for colorectal cancer. Existing automatic polyp segmentation methods often struggle to fulfill the real-time demands of clinical applications due to their substantial parameter count and computational load, especially those based on Transformer architectures. To tackle these challenges, a novel lightweight long-range context fusion network, named LightCF-Net, is proposed in this paper. This network attempts to model long-range spatial dependencies while maintaining real-time performance, to better distinguish polyps from background noise and thus improve segmentation accuracy. A novel Fusion Attention Encoder (FAEncoder) is designed in the proposed network, which integrates Large Kernel Attention (LKA) and channel attention mechanisms to extract deep representational features of polyps and unearth long-range dependencies. Furthermore, a newly designed Visual Attention Mamba module (VAM) is added to the skip connections, modeling long-range context dependencies in the encoder-extracted features and reducing background noise interference through the attention mechanism. Finally, a Pyramid Split Attention module (PSA) is used in the bottleneck layer to extract richer multi-scale contextual features. The proposed method was thoroughly evaluated on four renowned polyp segmentation datasets: Kvasir-SEG, CVC-ClinicDB, BKAI-IGH, and ETIS. Experimental findings demonstrate that the proposed method delivers higher segmentation accuracy in less time, consistently outperforming the most advanced lightweight polyp segmentation networks.
Collapse
Affiliation(s)
- Zhanlin Ji
- Hebei Key Laboratory of Industrial Intelligent Perception, North China University of Science and Technology, Tangshan 063210, China; (Z.J.); (X.L.); (J.L.)
- College of Mathematics and Computer Science, Zhejiang A&F University, Hangzhou 311300, China
| | - Xiaoyu Li
- Hebei Key Laboratory of Industrial Intelligent Perception, North China University of Science and Technology, Tangshan 063210, China; (Z.J.); (X.L.); (J.L.)
| | - Jianuo Liu
- Hebei Key Laboratory of Industrial Intelligent Perception, North China University of Science and Technology, Tangshan 063210, China; (Z.J.); (X.L.); (J.L.)
| | - Rui Chen
- Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Institute for Precision Medicine, Tsinghua University, Beijing 100084, China; (R.C.); (Q.L.)
| | - Qinping Liao
- Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Institute for Precision Medicine, Tsinghua University, Beijing 100084, China; (R.C.); (Q.L.)
| | - Tao Lyu
- Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Institute for Precision Medicine, Tsinghua University, Beijing 100084, China; (R.C.); (Q.L.)
| | - Li Zhao
- Beijing National Research Center for Information Science and Technology, Institute for Precision Medicine, Tsinghua University, Beijing 100084, China
| |
Collapse
|
40
|
Biffi C, Antonelli G, Bernhofer S, Hassan C, Hirata D, Iwatate M, Maieron A, Salvagnini P, Cherubini A. REAL-Colon: A dataset for developing real-world AI applications in colonoscopy. Sci Data 2024; 11:539. [PMID: 38796533 PMCID: PMC11127922 DOI: 10.1038/s41597-024-03359-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 05/10/2024] [Indexed: 05/28/2024] Open
Abstract
Detection and diagnosis of colon polyps are key to preventing colorectal cancer. Recent evidence suggests that AI-based computer-aided detection (CADe) and computer-aided diagnosis (CADx) systems can enhance endoscopists' performance and boost colonoscopy effectiveness. However, most available public datasets primarily consist of still images or video clips, often at a down-sampled resolution, and do not accurately represent real-world colonoscopy procedures. We introduce the REAL-Colon (Real-world multi-center Endoscopy Annotated video Library) dataset: a compilation of 2.7 M native video frames from sixty full-resolution, real-world colonoscopy recordings across multiple centers. The dataset contains 350k bounding-box annotations, each created under the supervision of expert gastroenterologists. Comprehensive patient clinical data, colonoscopy acquisition information, and polyp histopathological information are also included in each video. With its unprecedented size, quality, and heterogeneity, the REAL-Colon dataset is a unique resource for researchers and developers aiming to advance AI research in colonoscopy. Its openness and transparency facilitate rigorous and reproducible research, fostering the development and benchmarking of more accurate and reliable colonoscopy-related algorithms and models.
Collapse
Affiliation(s)
- Carlo Biffi
- Cosmo Intelligent Medical Devices, Dublin, Ireland.
| | - Giulio Antonelli
- Gastroenterology and Digestive Endoscopy Unit, Ospedale dei Castelli (N.O.C.), Rome, Italy
| | - Sebastian Bernhofer
- Karl Landsteiner University of Health Sciences, Krems, Austria
- Department of Internal Medicine 2, University Hospital St. Pölten, St. Pölten, Austria
| | - Cesare Hassan
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Italy
- Endoscopy Unit, Humanitas Clinical and Research Center IRCCS, Rozzano, Italy
| | - Daizen Hirata
- Gastrointestinal Center, Sano Hospital, Hyogo, Japan
| | - Mineo Iwatate
- Gastrointestinal Center, Sano Hospital, Hyogo, Japan
| | - Andreas Maieron
- Karl Landsteiner University of Health Sciences, Krems, Austria
- Department of Internal Medicine 2, University Hospital St. Pölten, St. Pölten, Austria
| | | | - Andrea Cherubini
- Cosmo Intelligent Medical Devices, Dublin, Ireland.
- Milan Center for Neuroscience, University of Milano-Bicocca, Milano, Italy.
| |
Collapse
|
41
|
Han G, Guo W, Zhang H, Jin J, Gan X, Zhao X. Sample self-selection using dual teacher networks for pathological image classification with noisy labels. Comput Biol Med 2024; 174:108489. [PMID: 38640633 DOI: 10.1016/j.compbiomed.2024.108489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 04/02/2024] [Accepted: 04/15/2024] [Indexed: 04/21/2024]
Abstract
Deep neural networks (DNNs) involve advanced image processing but depend on large quantities of high-quality labeled data. The presence of noisy data significantly degrades the DNN model performance. In the medical field, where model accuracy is crucial and labels for pathological images are scarce and expensive to obtain, the need to handle noisy data is even more urgent. Deep networks exhibit a memorization effect, they tend to prioritize remembering clean labels initially. Therefore, early stopping is highly effective in managing learning with noisy labels. Previous research has often concentrated on developing robust loss functions or implementing training constraints to mitigate the impact of noisy labels; however, such approaches have frequently resulted in underfitting. We propose using knowledge distillation to slow the learning process of the target network rather than preventing late-stage training from being affected by noisy labels. In this paper, we introduce a data sample self-selection strategy based on early stopping to filter out most of the noisy data. Additionally, we employ the distillation training method with dual teacher networks to ensure the steady learning of the student network. The experimental results show that our method outperforms current state-of-the-art methods for handling noisy labels on both synthetic and real-world noisy datasets. In particular, on the real-world pathological image dataset Chaoyang, the highest classification accuracy increased by 2.39 %. Our method leverages the model's predictions based on training history to select cleaner datasets and retrains them using these cleaner datasets, significantly mitigating the impact of noisy labels on model performance.
Collapse
Affiliation(s)
- Gang Han
- School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China; School of Electronic and Information Engineering, Taizhou University, Taizhou 318000, China
| | - Wenping Guo
- School of Electronic and Information Engineering, Taizhou University, Taizhou 318000, China.
| | - Haibo Zhang
- School of Electronic and Information Engineering, Taizhou University, Taizhou 318000, China
| | - Jie Jin
- School of Electronic and Information Engineering, Taizhou University, Taizhou 318000, China
| | - Xingli Gan
- School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China
| | - Xiaoming Zhao
- School of Electronic and Information Engineering, Taizhou University, Taizhou 318000, China
| |
Collapse
|
42
|
Daneshpajooh V, Ahmad D, Toth J, Bascom R, Higgins WE. Automatic lesion detection for narrow-band imaging bronchoscopy. J Med Imaging (Bellingham) 2024; 11:036002. [PMID: 38827776 PMCID: PMC11138083 DOI: 10.1117/1.jmi.11.3.036002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 04/04/2024] [Accepted: 05/14/2024] [Indexed: 06/05/2024] Open
Abstract
Purpose Early detection of cancer is crucial for lung cancer patients, as it determines disease prognosis. Lung cancer typically starts as bronchial lesions along the airway walls. Recent research has indicated that narrow-band imaging (NBI) bronchoscopy enables more effective bronchial lesion detection than other bronchoscopic modalities. Unfortunately, NBI video can be hard to interpret because physicians currently are forced to perform a time-consuming subjective visual search to detect bronchial lesions in a long airway-exam video. As a result, NBI bronchoscopy is not regularly used in practice. To alleviate this problem, we propose an automatic two-stage real-time method for bronchial lesion detection in NBI video and perform a first-of-its-kind pilot study of the method using NBI airway exam video collected at our institution. Approach Given a patient's NBI video, the first method stage entails a deep-learning-based object detection network coupled with a multiframe abnormality measure to locate candidate lesions on each video frame. The second method stage then draws upon a Siamese network and a Kalman filter to track candidate lesions over multiple frames to arrive at final lesion decisions. Results Tests drawing on 23 patient NBI airway exam videos indicate that the method can process an incoming video stream at a real-time frame rate, thereby making the method viable for real-time inspection during a live bronchoscopic airway exam. Furthermore, our studies showed a 93% sensitivity and 86% specificity for lesion detection; this compares favorably to a sensitivity and specificity of 80% and 84% achieved over a series of recent pooled clinical studies using the current time-consuming subjective clinical approach. Conclusion The method shows potential for robust lesion detection in NBI video at a real-time frame rate. Therefore, it could help enable more common use of NBI bronchoscopy for bronchial lesion detection.
Collapse
Affiliation(s)
- Vahid Daneshpajooh
- The Pennsylvania State University, School of Electrical Engineering and Computer Science, University Park, Pennsylvania, United States
| | - Danish Ahmad
- The Pennsylvania State University, College of Medicine, Hershey, Pennsylvania, United States
| | - Jennifer Toth
- The Pennsylvania State University, College of Medicine, Hershey, Pennsylvania, United States
| | - Rebecca Bascom
- The Pennsylvania State University, College of Medicine, Hershey, Pennsylvania, United States
| | - William E. Higgins
- The Pennsylvania State University, School of Electrical Engineering and Computer Science, University Park, Pennsylvania, United States
| |
Collapse
|
43
|
Su D, Luo J, Fei C. An Efficient and Rapid Medical Image Segmentation Network. IEEE J Biomed Health Inform 2024; 28:2979-2990. [PMID: 38457317 DOI: 10.1109/jbhi.2024.3374780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
Accurate medical image segmentation is an essential part of the medical image analysis process that provides detailed quantitative metrics. In recent years, extensions of classical networks such as UNet have achieved state-of-the-art performance on medical image segmentation tasks. However, the high model complexity of these networks limits their applicability to devices with constrained computational resources. To alleviate this problem, we propose a shallow hierarchical Transformer for medical image segmentation, called SHFormer. By decreasing the number of transformer blocks utilized, the model complexity of SHFormer can be reduced to an acceptable level. To improve the learned attention while keeping the structure lightweight, we propose a spatial-channel connection module. This module separately learns attention in the spatial and channel dimensions of the feature while interconnecting them to produce more focused attention. To keep the decoder lightweight, the MLP-D module is proposed to progressively fuse multi-scale features in which channels are aligned using Multi-Layer Perceptron (MLP) and spatial information is fused by convolutional blocks. We first validated the performance of SHFormer on the ISIC-2018 dataset. Compared to the latest network, SHFormer exhibits comparable performance with 15 times fewer parameters, 30 times lower computational complexity and 5 times higher inference efficiency. To test the generalizability of SHFormer, we introduced the polyp dataset for additional testing. SHFormer achieves comparable segmentation accuracy to the latest network while having lower computational overhead.
Collapse
|
44
|
Jaspers TJM, Boers TGW, Kusters CHJ, Jong MR, Jukema JB, de Groof AJ, Bergman JJ, de With PHN, van der Sommen F. Robustness evaluation of deep neural networks for endoscopic image analysis: Insights and strategies. Med Image Anal 2024; 94:103157. [PMID: 38574544 DOI: 10.1016/j.media.2024.103157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 03/19/2024] [Accepted: 03/21/2024] [Indexed: 04/06/2024]
Abstract
Computer-aided detection and diagnosis systems (CADe/CADx) in endoscopy are commonly trained using high-quality imagery, which is not representative for the heterogeneous input typically encountered in clinical practice. In endoscopy, the image quality heavily relies on both the skills and experience of the endoscopist and the specifications of the system used for screening. Factors such as poor illumination, motion blur, and specific post-processing settings can significantly alter the quality and general appearance of these images. This so-called domain gap between the data used for developing the system and the data it encounters after deployment, and the impact it has on the performance of deep neural networks (DNNs) supportive endoscopic CAD systems remains largely unexplored. As many of such systems, for e.g. polyp detection, are already being rolled out in clinical practice, this poses severe patient risks in particularly community hospitals, where both the imaging equipment and experience are subject to considerable variation. Therefore, this study aims to evaluate the impact of this domain gap on the clinical performance of CADe/CADx for various endoscopic applications. For this, we leverage two publicly available data sets (KVASIR-SEG and GIANA) and two in-house data sets. We investigate the performance of commonly-used DNN architectures under synthetic, clinically calibrated image degradations and on a prospectively collected dataset including 342 endoscopic images of lower subjective quality. Additionally, we assess the influence of DNN architecture and complexity, data augmentation, and pretraining techniques for improved robustness. The results reveal a considerable decline in performance of 11.6% (±1.5) as compared to the reference, within the clinically calibrated boundaries of image degradations. Nevertheless, employing more advanced DNN architectures and self-supervised in-domain pre-training effectively mitigate this drop to 7.7% (±2.03). Additionally, these enhancements yield the highest performance on the manually collected test set including images with lower subjective quality. By comprehensively assessing the robustness of popular DNN architectures and training strategies across multiple datasets, this study provides valuable insights into their performance and limitations for endoscopic applications. The findings highlight the importance of including robustness evaluation when developing DNNs for endoscopy applications and propose strategies to mitigate performance loss.
Collapse
Affiliation(s)
- Tim J M Jaspers
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands.
| | - Tim G W Boers
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Carolus H J Kusters
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Martijn R Jong
- Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands
| | - Jelmer B Jukema
- Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands
| | - Albert J de Groof
- Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands
| | - Jacques J Bergman
- Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands
| | - Peter H N de With
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Fons van der Sommen
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| |
Collapse
|
45
|
Zhang K, Hu D, Li X, Wang X, Hu X, Wang C, Yang J, Rao N. BFE-Net: bilateral fusion enhanced network for gastrointestinal polyp segmentation. BIOMEDICAL OPTICS EXPRESS 2024; 15:2977-2999. [PMID: 38855696 PMCID: PMC11161362 DOI: 10.1364/boe.522441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 03/17/2024] [Accepted: 03/17/2024] [Indexed: 06/11/2024]
Abstract
Accurate segmentation of polyp regions in gastrointestinal endoscopic images is pivotal for diagnosis and treatment. Despite advancements, challenges persist, like accurately segmenting small polyps and maintaining accuracy when polyps resemble surrounding tissues. Recent studies show the effectiveness of the pyramid vision transformer (PVT) in capturing global context, yet it may lack detailed information. Conversely, U-Net excels in semantic extraction. Hence, we propose the bilateral fusion enhanced network (BFE-Net) to address these challenges. Our model integrates U-Net and PVT features via a deep feature enhancement fusion module (FEF) and attention decoder module (AD). Experimental results demonstrate significant improvements, validating our model's effectiveness across various datasets and modalities, promising advancements in gastrointestinal polyp diagnosis and treatment.
Collapse
Affiliation(s)
- Kaixuan Zhang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Dingcan Hu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiang Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiaotong Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiaoming Hu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Chunyang Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Jinlin Yang
- Digestive Endoscopic Center of West China Hospital, Sichuan University, Chengdu 610017, China
| | - Nini Rao
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
46
|
Li H, Liu D, Zeng Y, Liu S, Gan T, Rao N, Yang J, Zeng B. Single-Image-Based Deep Learning for Segmentation of Early Esophageal Cancer Lesions. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:2676-2688. [PMID: 38530733 DOI: 10.1109/tip.2024.3379902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]
Abstract
Accurate segmentation of lesions is crucial for diagnosis and treatment of early esophageal cancer (EEC). However, neither traditional nor deep learning-based methods up to today can meet the clinical requirements, with the mean Dice score - the most important metric in medical image analysis - hardly exceeding 0.75. In this paper, we present a novel deep learning approach for segmenting EEC lesions. Our method stands out for its uniqueness, as it relies solely on a single input image from a patient, forming the so-called "You-Only-Have-One" (YOHO) framework. On one hand, this "one-image-one-network" learning ensures complete patient privacy as it does not use any images from other patients as the training data. On the other hand, it avoids nearly all generalization-related problems since each trained network is applied only to the same input image itself. In particular, we can push the training to "over-fitting" as much as possible to increase the segmentation accuracy. Our technical details include an interaction with clinical doctors to utilize their expertise, a geometry-based data augmentation over a single lesion image to generate the training dataset (the biggest novelty), and an edge-enhanced UNet. We have evaluated YOHO over an EEC dataset collected by ourselves and achieved a mean Dice score of 0.888, which is much higher as compared to the existing deep-learning methods, thus representing a significant advance toward clinical applications. The code and dataset are available at: https://github.com/lhaippp/YOHO.
Collapse
|
47
|
Li B, Xu Y, Wang Y, Zhang B. DECTNet: Dual Encoder Network combined convolution and Transformer architecture for medical image segmentation. PLoS One 2024; 19:e0301019. [PMID: 38573957 PMCID: PMC10994332 DOI: 10.1371/journal.pone.0301019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 03/09/2024] [Indexed: 04/06/2024] Open
Abstract
Automatic and accurate segmentation of medical images plays an essential role in disease diagnosis and treatment planning. Convolution neural networks have achieved remarkable results in medical image segmentation in the past decade. Meanwhile, deep learning models based on Transformer architecture also succeeded tremendously in this domain. However, due to the ambiguity of the medical image boundary and the high complexity of physical organization structures, implementing effective structure extraction and accurate segmentation remains a problem requiring a solution. In this paper, we propose a novel Dual Encoder Network named DECTNet to alleviate this problem. Specifically, the DECTNet embraces four components, which are a convolution-based encoder, a Transformer-based encoder, a feature fusion decoder, and a deep supervision module. The convolutional structure encoder can extract fine spatial contextual details in images. Meanwhile, the Transformer structure encoder is designed using a hierarchical Swin Transformer architecture to model global contextual information. The novel feature fusion decoder integrates the multi-scale representation from two encoders and selects features that focus on segmentation tasks by channel attention mechanism. Further, a deep supervision module is used to accelerate the convergence of the proposed method. Extensive experiments demonstrate that, compared to the other seven models, the proposed method achieves state-of-the-art results on four segmentation tasks: skin lesion segmentation, polyp segmentation, Covid-19 lesion segmentation, and MRI cardiac segmentation.
Collapse
Affiliation(s)
- Boliang Li
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yaming Xu
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yan Wang
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Bo Zhang
- Sergeant Schools of Army Academy of Armored Forces, Changchun, Jilin, China
| |
Collapse
|
48
|
Goceri E. Polyp Segmentation Using a Hybrid Vision Transformer and a Hybrid Loss Function. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:851-863. [PMID: 38343250 PMCID: PMC11031515 DOI: 10.1007/s10278-023-00954-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 09/16/2023] [Accepted: 10/02/2023] [Indexed: 04/20/2024]
Abstract
Accurate and early detection of precursor adenomatous polyps and their removal at the early stage can significantly decrease the mortality rate and the occurrence of the disease since most colorectal cancer evolve from adenomatous polyps. However, accurate detection and segmentation of the polyps by doctors are difficult mainly these factors: (i) quality of the screening of the polyps with colonoscopy depends on the imaging quality and the experience of the doctors; (ii) visual inspection by doctors is time-consuming, burdensome, and tiring; (iii) prolonged visual inspections can lead to polyps being missed even when the physician is experienced. To overcome these problems, computer-aided methods have been proposed. However, they have some disadvantages or limitations. Therefore, in this work, a new architecture based on residual transformer layers has been designed and used for polyp segmentation. In the proposed segmentation, both high-level semantic features and low-level spatial features have been utilized. Also, a novel hybrid loss function has been proposed. The loss function designed with focal Tversky loss, binary cross-entropy, and Jaccard index reduces image-wise and pixel-wise differences as well as improves regional consistencies. Experimental works have indicated the effectiveness of the proposed approach in terms of dice similarity (0.9048), recall (0.9041), precision (0.9057), and F2 score (0.8993). Comparisons with the state-of-the-art methods have shown its better performance.
Collapse
|
49
|
Li F, Huang Z, Zhou L, Chen Y, Tang S, Ding P, Peng H, Chu Y. Improved dual-aggregation polyp segmentation network combining a pyramid vision transformer with a fully convolutional network. BIOMEDICAL OPTICS EXPRESS 2024; 15:2590-2621. [PMID: 38633077 PMCID: PMC11019695 DOI: 10.1364/boe.510908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/26/2024] [Accepted: 03/08/2024] [Indexed: 04/19/2024]
Abstract
Automatic and precise polyp segmentation in colonoscopy images is highly valuable for diagnosis at an early stage and surgery of colorectal cancer. Nevertheless, it still posed a major challenge due to variations in the size and intricate morphological characteristics of polyps coupled with the indistinct demarcation between polyps and mucosas. To alleviate these challenges, we proposed an improved dual-aggregation polyp segmentation network, dubbed Dua-PSNet, for automatic and accurate full-size polyp prediction by combining both the transformer branch and a fully convolutional network (FCN) branch in a parallel style. Concretely, in the transformer branch, we adopted the B3 variant of pyramid vision transformer v2 (PVTv2-B3) as an image encoder for capturing multi-scale global features and modeling long-distant interdependencies between them whilst designing an innovative multi-stage feature aggregation decoder (MFAD) to highlight critical local feature details and effectively integrate them into global features. In the decoder, the adaptive feature aggregation (AFA) block was constructed for fusing high-level feature representations of different scales generated by the PVTv2-B3 encoder in a stepwise adaptive manner for refining global semantic information, while the ResidualBlock module was devised to mine detailed boundary cues disguised in low-level features. With the assistance of the selective global-to-local fusion head (SGLFH) module, the resulting boundary details were aggregated selectively with these global semantic features, strengthening these hierarchical features to cope with scale variations of polyps. The FCN branch embedded in the designed ResidualBlock module was used to encourage extraction of highly merged fine features to match the outputs of the Transformer branch into full-size segmentation maps. In this way, both branches were reciprocally influenced and complemented to enhance the discrimination capability of polyp features and enable a more accurate prediction of a full-size segmentation map. Extensive experiments on five challenging polyp segmentation benchmarks demonstrated that the proposed Dua-PSNet owned powerful learning and generalization ability and advanced the state-of-the-art segmentation performance among existing cutting-edge methods. These excellent results showed our Dua-PSNet had great potential to be a promising solution for practical polyp segmentation tasks in which wide variations of data typically occurred.
Collapse
Affiliation(s)
- Feng Li
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Zetao Huang
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Lu Zhou
- Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai 200336, China
| | - Yuyang Chen
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Shiqing Tang
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Pengchao Ding
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Haixia Peng
- Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai 200336, China
| | - Yimin Chu
- Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai 200336, China
| |
Collapse
|
50
|
Du H, Wang J, Liu M, Wang Y, Meijering E. SwinPA-Net: Swin Transformer-Based Multiscale Feature Pyramid Aggregation Network for Medical Image Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5355-5366. [PMID: 36121961 DOI: 10.1109/tnnls.2022.3204090] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The precise segmentation of medical images is one of the key challenges in pathology research and clinical practice. However, many medical image segmentation tasks have problems such as large differences between different types of lesions and similar shapes as well as colors between lesions and surrounding tissues, which seriously affects the improvement of segmentation accuracy. In this article, a novel method called Swin Pyramid Aggregation network (SwinPA-Net) is proposed by combining two designed modules with Swin Transformer to learn more powerful and robust features. The two modules, named dense multiplicative connection (DMC) module and local pyramid attention (LPA) module, are proposed to aggregate the multiscale context information of medical images. The DMC module cascades the multiscale semantic feature information through dense multiplicative feature fusion, which minimizes the interference of shallow background noise to improve the feature expression and solves the problem of excessive variation in lesion size and type. Moreover, the LPA module guides the network to focus on the region of interest by merging the global attention and the local attention, which helps to solve similar problems. The proposed network is evaluated on two public benchmark datasets for polyp segmentation task and skin lesion segmentation task as well as a clinical private dataset for laparoscopic image segmentation task. Compared with existing state-of-the-art (SOTA) methods, the SwinPA-Net achieves the most advanced performance and can outperform the second-best method on the mean Dice score by 1.68%, 0.8%, and 1.2% on the three tasks, respectively.
Collapse
|