51
|
Nerella S, Bandyopadhyay S, Zhang J, Contreras M, Siegel S, Bumin A, Silva B, Sena J, Shickel B, Bihorac A, Khezeli K, Rashidi P. Transformers and large language models in healthcare: A review. Artif Intell Med 2024; 154:102900. [PMID: 38878555 PMCID: PMC11638972 DOI: 10.1016/j.artmed.2024.102900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 05/28/2024] [Accepted: 05/30/2024] [Indexed: 08/09/2024]
Abstract
With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of healthcare data, including clinical NLP, medical imaging, structured Electronic Health Records (EHR), social media, bio-physiological signals, biomolecular sequences. Furthermore, which have also include the articles that used the transformer architecture for generating surgical instructions and predicting adverse outcomes after surgeries under the umbrella of critical care. Under diverse settings, these models have been used for clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. Finally, we also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
Collapse
Affiliation(s)
- Subhash Nerella
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | | | - Jiaqing Zhang
- Department of Electrical and Computer Engineering, University of Florida, Gainesville, United States
| | - Miguel Contreras
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Scott Siegel
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Aysegul Bumin
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Brandon Silva
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Jessica Sena
- Department Of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Benjamin Shickel
- Department of Medicine, University of Florida, Gainesville, United States
| | - Azra Bihorac
- Department of Medicine, University of Florida, Gainesville, United States
| | - Kia Khezeli
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Parisa Rashidi
- Department of Biomedical Engineering, University of Florida, Gainesville, United States.
| |
Collapse
|
52
|
ELKarazle K, Raman V, Chua C, Then P. A Hessian-Based Technique for Specular Reflection Detection and Inpainting in Colonoscopy Images. IEEE J Biomed Health Inform 2024; 28:4724-4736. [PMID: 38787660 DOI: 10.1109/jbhi.2024.3404955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
In the field of Computer-Aided Detection (CADx), the use of AI-based algorithms for disease detection in endoscopy images, especially colonoscopy images, is on the rise. However, these algorithms often encounter performance issues due to obstructions like specular reflection, resulting in false positives. This paper presents a novel algorithm specifically designed to tackle the challenges posed by high specular reflection regions in colonoscopy images. The proposed algorithm identifies these regions and applies precise inpainting for restoration. The process entails converting the input image from RGB to HSV color space and focusing on the Saturation (S) component in convex regions detected using a Hessian-based method. This step creates a binary mask that pinpoints areas of specular reflection. The inpainting function then uses this mask to guide the restoration of these identified regions and their borders. To ensure a seamless blend of the restored regions with the background and adjacent pixels, a feathering process is applied to the repaired regions. This enhances both the accuracy and aesthetic coherence of the inpainted images. The performance of our algorithm was rigorously tested on five unique colonoscopy datasets and various endoscopy images from the Kvasir dataset, using an extensive set of evaluation metrics and a comparative analysis with existing methods consistently highlighted the superior performance of our algorithm.
Collapse
|
53
|
Rajasekar D, Theja G, Prusty MR, Chinara S. Efficient colorectal polyp segmentation using wavelet transformation and AdaptUNet: A hybrid U-Net. Heliyon 2024; 10:e33655. [PMID: 39040380 PMCID: PMC11261057 DOI: 10.1016/j.heliyon.2024.e33655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 03/06/2024] [Accepted: 06/25/2024] [Indexed: 07/24/2024] Open
Abstract
The prevalence of colorectal cancer, primarily emerging from polyps, underscores the importance of their early detection in colonoscopy images. Due to the inherent complexity and variability of polyp appearances, the task stands difficult despite recent advances in medical technology. To tackle these challenges, a deep learning model featuring a customized U-Net architecture, AdaptUNet is proposed. Attention mechanisms and skip connections facilitate the effective combination of low-level details and high-level contextual information for accurate polyp segmentation. Further, wavelet transformations are used to extract useful features overlooked in conventional image processing. The model achieves benchmark results with a Dice coefficient of 0.9104, an Intersection over Union (IoU) coefficient of 0.8368, and a Balanced Accuracy of 0.9880 on the CVC-300 dataset. Additionally, it shows exceptional performance on other datasets, including Kvasir-SEG and Etis-LaribDB. Training was performed using the Hyper Kvasir segmented images dataset, further evidencing the model's ability to handle diverse data inputs. The proposed method offers a comprehensive and efficient implementation for polyp detection without compromising performance, thus promising an improved precision and reduction in manual labour for colorectal polyp detection.
Collapse
Affiliation(s)
- Devika Rajasekar
- School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
| | - Girish Theja
- School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
| | - Manas Ranjan Prusty
- Centre for Cyber Physical Systems, Vellore Institute of Technology, Chennai, India
| | - Suchismita Chinara
- Department of Computer Science and Engineering, National Institute of Technology, Rourkela, India
| |
Collapse
|
54
|
Huang C, Shi Y, Zhang B, Lyu K. Uncertainty-aware prototypical learning for anomaly detection in medical images. Neural Netw 2024; 175:106284. [PMID: 38593560 DOI: 10.1016/j.neunet.2024.106284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 03/14/2024] [Accepted: 03/29/2024] [Indexed: 04/11/2024]
Abstract
Anomalous object detection (AOD) in medical images aims to recognize the anomalous lesions, and is crucial for early clinical diagnosis of various cancers. However, it is a difficult task because of two reasons: (1) the diversity of the anomalous lesions and (2) the ambiguity of the boundary between anomalous lesions and their normal surroundings. Unlike existing single-modality AOD models based on deterministic mapping, we constructed a probabilistic and deterministic AOD model. Specifically, we designed an uncertainty-aware prototype learning framework, which considers the diversity and ambiguity of anomalous lesions. A prototypical learning transformer (Pformer) is established to extract and store the prototype features of different anomalous lesions. Moreover, Bayesian neural uncertainty quantizer, a probabilistic model, is designed to model the distributions over the outputs of the model to measure the uncertainty of the model's detection results for each pixel. Essentially, the uncertainty of the model's anomaly detection result for a pixel can reflect the anomalous ambiguity of this pixel. Furthermore, an uncertainty-guided reasoning transformer (Uformer) is devised to employ the anomalous ambiguity, encouraging the proposed model to focus on pixels with high uncertainty. Notably, prototypical representations stored in Pformer are also utilized in anomaly reasoning that enables the model to perceive diversities of the anomalous objects. Extensive experiments on five benchmark datasets demonstrate the superiority of our proposed method. The source code will be available in github.com/umchaohuang/UPformer.
Collapse
Affiliation(s)
- Chao Huang
- PAMI Research Group, Department of Computer and Information Science, University of Macau, Taipa, 519000, Macao Special Administrative Region of China; Shenzhen Campus of Sun Yat-sen University, School of Cyber Science and Technology, Shenzhen, 518107, China
| | - Yushu Shi
- Shenzhen Campus of Sun Yat-sen University, School of Cyber Science and Technology, Shenzhen, 518107, China
| | - Bob Zhang
- PAMI Research Group, Department of Computer and Information Science, University of Macau, Taipa, 519000, Macao Special Administrative Region of China.
| | - Ke Lyu
- School of Engineering Sciences, University of the Chinese Academy of Sciences, Beijing, 100049, China; Pengcheng Laboratory, Shenzhen, 518055, China
| |
Collapse
|
55
|
Dorjsembe Z, Pao HK, Xiao F. Polyp-DDPM: Diffusion-Based Semantic Polyp Synthesis for Enhanced Segmentation. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-7. [PMID: 40039822 DOI: 10.1109/embc53108.2024.10782077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
This study introduces Polyp-DDPM, a diffusion-based method for generating realistic images of polyps conditioned on masks, aimed at enhancing the segmentation of gastrointestinal (GI) tract polyps. Our approach addresses the challenges of data limitations, high annotation costs, and privacy concerns associated with medical images. By conditioning the diffusion model on segmentation masks-binary masks that represent abnormal areas-Polyp-DDPM outperforms state-of-the-art methods in terms of image quality (achieving a Fréchet Inception Distance (FID) score of 78.47, compared to scores above 95.82) and segmentation performance (achieving an Intersection over Union (IoU) of 0.7156, versus less than 0.6828 for synthetic images from baseline models and 0.7067 for real data). Our method generates a high-quality, diverse synthetic dataset for training, thereby enhancing polyp segmentation models to be comparable with real images and offering greater data augmentation capabilities to improve segmentation models. The source code and pretrained weights for Polyp-DDPM are made publicly available at https://github.com/mobaidoctor/polyp-ddpm.
Collapse
|
56
|
Wan L, Chen Z, Xiao Y, Zhao J, Feng W, Fu H. Iterative feedback-based models for image and video polyp segmentation. Comput Biol Med 2024; 177:108569. [PMID: 38781640 DOI: 10.1016/j.compbiomed.2024.108569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 03/27/2024] [Accepted: 05/05/2024] [Indexed: 05/25/2024]
Abstract
Accurate segmentation of polyps in colonoscopy images has gained significant attention in recent years, given its crucial role in automated colorectal cancer diagnosis. Many existing deep learning-based methods follow a one-stage processing pipeline, often involving feature fusion across different levels or utilizing boundary-related attention mechanisms. Drawing on the success of applying Iterative Feedback Units (IFU) in image polyp segmentation, this paper proposes FlowICBNet by extending the IFU to the domain of video polyp segmentation. By harnessing the unique capabilities of IFU to propagate and refine past segmentation results, our method proves effective in mitigating challenges linked to the inherent limitations of endoscopic imaging, notably the presence of frequent camera shake and frame defocusing. Furthermore, in FlowICBNet, we introduce two pivotal modules: Reference Frame Selection (RFS) and Flow Guided Warping (FGW). These modules play a crucial role in filtering and selecting the most suitable historical reference frames for the task at hand. The experimental results on a large video polyp segmentation dataset demonstrate that our method can significantly outperform state-of-the-art methods by notable margins achieving an average metrics improvement of 7.5% on SUN-SEG-Easy and 7.4% on SUN-SEG-Hard. Our code is available at https://github.com/eraserNut/ICBNet.
Collapse
Affiliation(s)
- Liang Wan
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Zhihao Chen
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Yefan Xiao
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Junting Zhao
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Wei Feng
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Huazhu Fu
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Singapore, 138632, Republic of Singapore.
| |
Collapse
|
57
|
Cao J, Wang X, Qu Z, Zhuo L, Li X, Zhang H, Yang Y, Wei W. WDFF-Net: Weighted Dual-Branch Feature Fusion Network for Polyp Segmentation With Object-Aware Attention Mechanism. IEEE J Biomed Health Inform 2024; 28:4118-4131. [PMID: 38536686 DOI: 10.1109/jbhi.2024.3381891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]
Abstract
Colon polyps in colonoscopy images exhibit significant differences in color, size, shape, appearance, and location, posing significant challenges to accurate polyp segmentation. In this paper, a Weighted Dual-branch Feature Fusion Network is proposed for Polyp Segmentation, named WDFF-Net, which adopts HarDNet68 as the backbone network. First, a dual-branch feature fusion network architecture is constructed, which includes a shared feature extractor and two feature fusion branches, i.e. Progressive Feature Fusion (PFF) branch and Scale-aware Feature Fusion (SFF) branch. The branches fuse the deep features of multiple layers for different purposes and with different fusion ways. The PFF branch is to address the under-segmentation or over-segmentation problems of flat polyps with low-edge contrast by iteratively fusing the features from low, medium, and high layers. The SFF branch is to tackle the the problem of drastic variations in polyp size and shape, especially the missed segmentation problem for small polyps. These two branches are complementary and play different roles, in improving segmentation accuracy. Second, an Object-aware Attention Mechanism (OAM) is proposed to enhance the features of the target regions and suppress those of the background regions, to interfere with the segmentation performance. Third, a weighted dual-branch the segmentation loss function is specifically designed, which dynamically assigns the weight factors of the loss functions for two branches to optimize their collaborative training. Experimental results on five public colon polyp datasets demonstrate that, the proposed WDFF-Net can achieve a superior segmentation performance with lower model complexity and faster inference speed, while maintaining good generalization ability.
Collapse
|
58
|
Ruano J, Gómez M, Romero E, Manzanera A. Leveraging a realistic synthetic database to learn Shape-from-Shading for estimating the colon depth in colonoscopy images. Comput Med Imaging Graph 2024; 115:102390. [PMID: 38714018 DOI: 10.1016/j.compmedimag.2024.102390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 03/30/2024] [Accepted: 04/25/2024] [Indexed: 05/09/2024]
Abstract
Colonoscopy is the choice procedure to diagnose, screening, and treat the colon and rectum cancer, from early detection of small precancerous lesions (polyps), to confirmation of malign masses. However, the high variability of the organ appearance and the complex shape of both the colon wall and structures of interest make this exploration difficult. Learned visuospatial and perceptual abilities mitigate technical limitations in clinical practice by proper estimation of the intestinal depth. This work introduces a novel methodology to estimate colon depth maps in single frames from monocular colonoscopy videos. The generated depth map is inferred from the shading variation of the colon wall with respect to the light source, as learned from a realistic synthetic database. Briefly, a classic convolutional neural network architecture is trained from scratch to estimate the depth map, improving sharp depth estimations in haustral folds and polyps by a custom loss function that minimizes the estimation error in edges and curvatures. The network was trained by a custom synthetic colonoscopy database herein constructed and released, composed of 248400 frames (47 videos), with depth annotations at the level of pixels. This collection comprehends 5 subsets of videos with progressively higher levels of visual complexity. Evaluation of the depth estimation with the synthetic database reached a threshold accuracy of 95.65%, and a mean-RMSE of 0.451cm, while a qualitative assessment with a real database showed consistent depth estimations, visually evaluated by the expert gastroenterologist coauthoring this paper. Finally, the method achieved competitive performance with respect to another state-of-the-art method using a public synthetic database and comparable results in a set of images with other five state-of-the-art methods. Additionally, three-dimensional reconstructions demonstrated useful approximations of the gastrointestinal tract geometry. Code for reproducing the reported results and the dataset are available at https://github.com/Cimalab-unal/ColonDepthEstimation.
Collapse
Affiliation(s)
- Josué Ruano
- Computer Imaging and Medical Applications Laboratory (CIM@LAB), Universidad Nacional de Colombia, 111321, Bogotá, Colombia
| | - Martín Gómez
- Unidad de Gastroenterología, Hospital Universitario Nacional, 111321, Bogotá, Colombia
| | - Eduardo Romero
- Computer Imaging and Medical Applications Laboratory (CIM@LAB), Universidad Nacional de Colombia, 111321, Bogotá, Colombia.
| | - Antoine Manzanera
- Unité d'Informatique et d'Ingénierie des Systémes (U2IS), ENSTA Paris, Institut Polytechnique de Paris, Palaiseau, 91762, Ile de France, France
| |
Collapse
|
59
|
Wang Z, Liu Z, Yu J, Gao Y, Liu M. Multi-scale nested UNet with transformer for colorectal polyp segmentation. J Appl Clin Med Phys 2024; 25:e14351. [PMID: 38551396 PMCID: PMC11163511 DOI: 10.1002/acm2.14351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/13/2024] [Accepted: 02/19/2024] [Indexed: 06/11/2024] Open
Abstract
BACKGROUND Polyp detection and localization are essential tasks for colonoscopy. U-shape network based convolutional neural networks have achieved remarkable segmentation performance for biomedical images, but lack of long-range dependencies modeling limits their receptive fields. PURPOSE Our goal was to develop and test a novel architecture for polyp segmentation, which takes advantage of learning local information with long-range dependencies modeling. METHODS A novel architecture combining with multi-scale nested UNet structure integrated transformer for polyp segmentation was developed. The proposed network takes advantage of both CNN and transformer to extract distinct feature information. The transformer layer is embedded between the encoder and decoder of a U-shape net to learn explicit global context and long-range semantic information. To address the challenging of variant polyp sizes, a MSFF unit was proposed to fuse features with multiple resolution. RESULTS Four public datasets and one in-house dataset were used to train and test the model performance. Ablation study was also conducted to verify each component of the model. For dataset Kvasir-SEG and CVC-ClinicDB, the proposed model achieved mean dice score of 0.942 and 0.950 respectively, which were more accurate than the other methods. To show the generalization of different methods, we processed two cross dataset validations, the proposed model achieved the highest mean dice score. The results demonstrate that the proposed network has powerful learning and generalization capability, significantly improving segmentation accuracy and outperforming state-of-the-art methods. CONCLUSIONS The proposed model produced more accurate polyp segmentation than current methods on four different public and one in-house datasets. Its capability of polyps segmentation in different sizes shows the potential clinical application.
Collapse
Affiliation(s)
- Zenan Wang
- Department of Gastroenterology, Beijing Chaoyang Hospitalthe Third Clinical Medical College of Capital Medical UniversityBeijingChina
| | - Zhen Liu
- Department of Gastroenterology, Beijing Chaoyang Hospitalthe Third Clinical Medical College of Capital Medical UniversityBeijingChina
| | - Jianfeng Yu
- Department of Gastroenterology, Beijing Chaoyang Hospitalthe Third Clinical Medical College of Capital Medical UniversityBeijingChina
| | - Yingxin Gao
- Department of Gastroenterology, Beijing Chaoyang Hospitalthe Third Clinical Medical College of Capital Medical UniversityBeijingChina
| | - Ming Liu
- Hunan Key Laboratory of Nonferrous Resources and Geological Hazard ExplorationChangshaChina
| |
Collapse
|
60
|
Ji Z, Li X, Liu J, Chen R, Liao Q, Lyu T, Zhao L. LightCF-Net: A Lightweight Long-Range Context Fusion Network for Real-Time Polyp Segmentation. Bioengineering (Basel) 2024; 11:545. [PMID: 38927781 PMCID: PMC11201063 DOI: 10.3390/bioengineering11060545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 05/22/2024] [Accepted: 05/24/2024] [Indexed: 06/28/2024] Open
Abstract
Automatically segmenting polyps from colonoscopy videos is crucial for developing computer-assisted diagnostic systems for colorectal cancer. Existing automatic polyp segmentation methods often struggle to fulfill the real-time demands of clinical applications due to their substantial parameter count and computational load, especially those based on Transformer architectures. To tackle these challenges, a novel lightweight long-range context fusion network, named LightCF-Net, is proposed in this paper. This network attempts to model long-range spatial dependencies while maintaining real-time performance, to better distinguish polyps from background noise and thus improve segmentation accuracy. A novel Fusion Attention Encoder (FAEncoder) is designed in the proposed network, which integrates Large Kernel Attention (LKA) and channel attention mechanisms to extract deep representational features of polyps and unearth long-range dependencies. Furthermore, a newly designed Visual Attention Mamba module (VAM) is added to the skip connections, modeling long-range context dependencies in the encoder-extracted features and reducing background noise interference through the attention mechanism. Finally, a Pyramid Split Attention module (PSA) is used in the bottleneck layer to extract richer multi-scale contextual features. The proposed method was thoroughly evaluated on four renowned polyp segmentation datasets: Kvasir-SEG, CVC-ClinicDB, BKAI-IGH, and ETIS. Experimental findings demonstrate that the proposed method delivers higher segmentation accuracy in less time, consistently outperforming the most advanced lightweight polyp segmentation networks.
Collapse
Affiliation(s)
- Zhanlin Ji
- Hebei Key Laboratory of Industrial Intelligent Perception, North China University of Science and Technology, Tangshan 063210, China; (Z.J.); (X.L.); (J.L.)
- College of Mathematics and Computer Science, Zhejiang A&F University, Hangzhou 311300, China
| | - Xiaoyu Li
- Hebei Key Laboratory of Industrial Intelligent Perception, North China University of Science and Technology, Tangshan 063210, China; (Z.J.); (X.L.); (J.L.)
| | - Jianuo Liu
- Hebei Key Laboratory of Industrial Intelligent Perception, North China University of Science and Technology, Tangshan 063210, China; (Z.J.); (X.L.); (J.L.)
| | - Rui Chen
- Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Institute for Precision Medicine, Tsinghua University, Beijing 100084, China; (R.C.); (Q.L.)
| | - Qinping Liao
- Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Institute for Precision Medicine, Tsinghua University, Beijing 100084, China; (R.C.); (Q.L.)
| | - Tao Lyu
- Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Institute for Precision Medicine, Tsinghua University, Beijing 100084, China; (R.C.); (Q.L.)
| | - Li Zhao
- Beijing National Research Center for Information Science and Technology, Institute for Precision Medicine, Tsinghua University, Beijing 100084, China
| |
Collapse
|
61
|
Biffi C, Antonelli G, Bernhofer S, Hassan C, Hirata D, Iwatate M, Maieron A, Salvagnini P, Cherubini A. REAL-Colon: A dataset for developing real-world AI applications in colonoscopy. Sci Data 2024; 11:539. [PMID: 38796533 PMCID: PMC11127922 DOI: 10.1038/s41597-024-03359-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 05/10/2024] [Indexed: 05/28/2024] Open
Abstract
Detection and diagnosis of colon polyps are key to preventing colorectal cancer. Recent evidence suggests that AI-based computer-aided detection (CADe) and computer-aided diagnosis (CADx) systems can enhance endoscopists' performance and boost colonoscopy effectiveness. However, most available public datasets primarily consist of still images or video clips, often at a down-sampled resolution, and do not accurately represent real-world colonoscopy procedures. We introduce the REAL-Colon (Real-world multi-center Endoscopy Annotated video Library) dataset: a compilation of 2.7 M native video frames from sixty full-resolution, real-world colonoscopy recordings across multiple centers. The dataset contains 350k bounding-box annotations, each created under the supervision of expert gastroenterologists. Comprehensive patient clinical data, colonoscopy acquisition information, and polyp histopathological information are also included in each video. With its unprecedented size, quality, and heterogeneity, the REAL-Colon dataset is a unique resource for researchers and developers aiming to advance AI research in colonoscopy. Its openness and transparency facilitate rigorous and reproducible research, fostering the development and benchmarking of more accurate and reliable colonoscopy-related algorithms and models.
Collapse
Affiliation(s)
- Carlo Biffi
- Cosmo Intelligent Medical Devices, Dublin, Ireland.
| | - Giulio Antonelli
- Gastroenterology and Digestive Endoscopy Unit, Ospedale dei Castelli (N.O.C.), Rome, Italy
| | - Sebastian Bernhofer
- Karl Landsteiner University of Health Sciences, Krems, Austria
- Department of Internal Medicine 2, University Hospital St. Pölten, St. Pölten, Austria
| | - Cesare Hassan
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Italy
- Endoscopy Unit, Humanitas Clinical and Research Center IRCCS, Rozzano, Italy
| | - Daizen Hirata
- Gastrointestinal Center, Sano Hospital, Hyogo, Japan
| | - Mineo Iwatate
- Gastrointestinal Center, Sano Hospital, Hyogo, Japan
| | - Andreas Maieron
- Karl Landsteiner University of Health Sciences, Krems, Austria
- Department of Internal Medicine 2, University Hospital St. Pölten, St. Pölten, Austria
| | | | - Andrea Cherubini
- Cosmo Intelligent Medical Devices, Dublin, Ireland.
- Milan Center for Neuroscience, University of Milano-Bicocca, Milano, Italy.
| |
Collapse
|
62
|
Liu J, Zhang W, Liu Y, Zhang Q. Polyp segmentation based on implicit edge-guided cross-layer fusion networks. Sci Rep 2024; 14:11678. [PMID: 38778219 PMCID: PMC11111678 DOI: 10.1038/s41598-024-62331-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 05/15/2024] [Indexed: 05/25/2024] Open
Abstract
Polyps are abnormal tissue clumps growing primarily on the inner linings of the gastrointestinal tract. While such clumps are generally harmless, they can potentially evolve into pathological tumors, and thus require long-term observation and monitoring. Polyp segmentation in gastrointestinal endoscopy images is an important stage for polyp monitoring and subsequent treatment. However, this segmentation task faces multiple challenges: the low contrast of the polyp boundaries, the varied polyp appearance, and the co-occurrence of multiple polyps. So, in this paper, an implicit edge-guided cross-layer fusion network (IECFNet) is proposed for polyp segmentation. The codec pair is used to generate an initial saliency map, the implicit edge-enhanced context attention module aggregates the feature graph output from the encoding and decoding to generate the rough prediction, and the multi-scale feature reasoning module is used to generate final predictions. Polyp segmentation experiments have been conducted on five popular polyp image datasets (Kvasir, CVC-ClinicDB, ETIS, CVC-ColonDB, and CVC-300), and the experimental results show that the proposed method significantly outperforms a conventional method, especially with an accuracy margin of 7.9% on the ETIS dataset.
Collapse
Affiliation(s)
- Junqing Liu
- Hubei Engineering and Technology Research Center for Construction Quality Inspection Equipment, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
- College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
| | - Weiwei Zhang
- Hubei Engineering and Technology Research Center for Construction Quality Inspection Equipment, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China.
- College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China.
| | - Yong Liu
- Hubei Engineering and Technology Research Center for Construction Quality Inspection Equipment, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
- College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
| | - Qinghe Zhang
- Hubei Engineering and Technology Research Center for Construction Quality Inspection Equipment, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
- College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
| |
Collapse
|
63
|
Daneshpajooh V, Ahmad D, Toth J, Bascom R, Higgins WE. Automatic lesion detection for narrow-band imaging bronchoscopy. J Med Imaging (Bellingham) 2024; 11:036002. [PMID: 38827776 PMCID: PMC11138083 DOI: 10.1117/1.jmi.11.3.036002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 04/04/2024] [Accepted: 05/14/2024] [Indexed: 06/05/2024] Open
Abstract
Purpose Early detection of cancer is crucial for lung cancer patients, as it determines disease prognosis. Lung cancer typically starts as bronchial lesions along the airway walls. Recent research has indicated that narrow-band imaging (NBI) bronchoscopy enables more effective bronchial lesion detection than other bronchoscopic modalities. Unfortunately, NBI video can be hard to interpret because physicians currently are forced to perform a time-consuming subjective visual search to detect bronchial lesions in a long airway-exam video. As a result, NBI bronchoscopy is not regularly used in practice. To alleviate this problem, we propose an automatic two-stage real-time method for bronchial lesion detection in NBI video and perform a first-of-its-kind pilot study of the method using NBI airway exam video collected at our institution. Approach Given a patient's NBI video, the first method stage entails a deep-learning-based object detection network coupled with a multiframe abnormality measure to locate candidate lesions on each video frame. The second method stage then draws upon a Siamese network and a Kalman filter to track candidate lesions over multiple frames to arrive at final lesion decisions. Results Tests drawing on 23 patient NBI airway exam videos indicate that the method can process an incoming video stream at a real-time frame rate, thereby making the method viable for real-time inspection during a live bronchoscopic airway exam. Furthermore, our studies showed a 93% sensitivity and 86% specificity for lesion detection; this compares favorably to a sensitivity and specificity of 80% and 84% achieved over a series of recent pooled clinical studies using the current time-consuming subjective clinical approach. Conclusion The method shows potential for robust lesion detection in NBI video at a real-time frame rate. Therefore, it could help enable more common use of NBI bronchoscopy for bronchial lesion detection.
Collapse
Affiliation(s)
- Vahid Daneshpajooh
- The Pennsylvania State University, School of Electrical Engineering and Computer Science, University Park, Pennsylvania, United States
| | - Danish Ahmad
- The Pennsylvania State University, College of Medicine, Hershey, Pennsylvania, United States
| | - Jennifer Toth
- The Pennsylvania State University, College of Medicine, Hershey, Pennsylvania, United States
| | - Rebecca Bascom
- The Pennsylvania State University, College of Medicine, Hershey, Pennsylvania, United States
| | - William E. Higgins
- The Pennsylvania State University, School of Electrical Engineering and Computer Science, University Park, Pennsylvania, United States
| |
Collapse
|
64
|
Su D, Luo J, Fei C. An Efficient and Rapid Medical Image Segmentation Network. IEEE J Biomed Health Inform 2024; 28:2979-2990. [PMID: 38457317 DOI: 10.1109/jbhi.2024.3374780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
Accurate medical image segmentation is an essential part of the medical image analysis process that provides detailed quantitative metrics. In recent years, extensions of classical networks such as UNet have achieved state-of-the-art performance on medical image segmentation tasks. However, the high model complexity of these networks limits their applicability to devices with constrained computational resources. To alleviate this problem, we propose a shallow hierarchical Transformer for medical image segmentation, called SHFormer. By decreasing the number of transformer blocks utilized, the model complexity of SHFormer can be reduced to an acceptable level. To improve the learned attention while keeping the structure lightweight, we propose a spatial-channel connection module. This module separately learns attention in the spatial and channel dimensions of the feature while interconnecting them to produce more focused attention. To keep the decoder lightweight, the MLP-D module is proposed to progressively fuse multi-scale features in which channels are aligned using Multi-Layer Perceptron (MLP) and spatial information is fused by convolutional blocks. We first validated the performance of SHFormer on the ISIC-2018 dataset. Compared to the latest network, SHFormer exhibits comparable performance with 15 times fewer parameters, 30 times lower computational complexity and 5 times higher inference efficiency. To test the generalizability of SHFormer, we introduced the polyp dataset for additional testing. SHFormer achieves comparable segmentation accuracy to the latest network while having lower computational overhead.
Collapse
|
65
|
Zhang K, Hu D, Li X, Wang X, Hu X, Wang C, Yang J, Rao N. BFE-Net: bilateral fusion enhanced network for gastrointestinal polyp segmentation. BIOMEDICAL OPTICS EXPRESS 2024; 15:2977-2999. [PMID: 38855696 PMCID: PMC11161362 DOI: 10.1364/boe.522441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 03/17/2024] [Accepted: 03/17/2024] [Indexed: 06/11/2024]
Abstract
Accurate segmentation of polyp regions in gastrointestinal endoscopic images is pivotal for diagnosis and treatment. Despite advancements, challenges persist, like accurately segmenting small polyps and maintaining accuracy when polyps resemble surrounding tissues. Recent studies show the effectiveness of the pyramid vision transformer (PVT) in capturing global context, yet it may lack detailed information. Conversely, U-Net excels in semantic extraction. Hence, we propose the bilateral fusion enhanced network (BFE-Net) to address these challenges. Our model integrates U-Net and PVT features via a deep feature enhancement fusion module (FEF) and attention decoder module (AD). Experimental results demonstrate significant improvements, validating our model's effectiveness across various datasets and modalities, promising advancements in gastrointestinal polyp diagnosis and treatment.
Collapse
Affiliation(s)
- Kaixuan Zhang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Dingcan Hu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiang Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiaotong Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiaoming Hu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Chunyang Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Jinlin Yang
- Digestive Endoscopic Center of West China Hospital, Sichuan University, Chengdu 610017, China
| | - Nini Rao
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
66
|
Hossain T, Shamrat FMJM, Zhou X, Mahmud I, Mazumder MSA, Sharmin S, Gururajan R. Development of a multi-fusion convolutional neural network (MF-CNN) for enhanced gastrointestinal disease diagnosis in endoscopy image analysis. PeerJ Comput Sci 2024; 10:e1950. [PMID: 38660192 PMCID: PMC11041948 DOI: 10.7717/peerj-cs.1950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 02/29/2024] [Indexed: 04/26/2024]
Abstract
Gastrointestinal (GI) diseases are prevalent medical conditions that require accurate and timely diagnosis for effective treatment. To address this, we developed the Multi-Fusion Convolutional Neural Network (MF-CNN), a deep learning framework that strategically integrates and adapts elements from six deep learning models, enhancing feature extraction and classification of GI diseases from endoscopic images. The MF-CNN architecture leverages truncated and partially frozen layers from existing models, augmented with novel components such as Auxiliary Fusing Layers (AuxFL), Fusion Residual Block (FuRB), and Alpha Dropouts (αDO) to improve precision and robustness. This design facilitates the precise identification of conditions such as ulcerative colitis, polyps, esophagitis, and healthy colons. Our methodology involved preprocessing endoscopic images sourced from open databases, including KVASIR and ETIS-Larib Polyp DB, using adaptive histogram equalization (AHE) to enhance their quality. The MF-CNN framework supports detailed feature mapping for improved interpretability of the model's internal workings. An ablation study was conducted to validate the contribution of each component, demonstrating that the integration of AuxFL, αDO, and FuRB played a crucial part in reducing overfitting and efficiency saturation and enhancing overall model performance. The MF-CNN demonstrated outstanding performance in terms of efficacy, achieving an accuracy rate of 99.25%. It also excelled in other key performance metrics with a precision of 99.27%, a recall of 99.25%, and an F1-score of 99.25%. These metrics confirmed the model's proficiency in accurate classification and its capability to minimize false positives and negatives across all tested GI disease categories. Furthermore, the AUC values were exceptional, averaging 1.00 for both test and validation sets, indicating perfect discriminative ability. The findings of the P-R curve analysis and confusion matrix further confirmed the robust classification performance of the MF-CNN. This research introduces a technique for medical imaging that can potentially transform diagnostics in gastrointestinal healthcare facilities worldwide.
Collapse
Affiliation(s)
- Tanzim Hossain
- Department of Software Engineering, Daffodil International University, Dhaka, Bangladesh
| | | | - Xujuan Zhou
- School of Business, University of Southern Queensland, Springfield, Australia
| | - Imran Mahmud
- Department of Software Engineering, Daffodil International University, Dhaka, Bangladesh
| | - Md. Sakib Ali Mazumder
- Department of Software Engineering, Daffodil International University, Dhaka, Bangladesh
| | - Sharmin Sharmin
- Department of Computer System and Technology, University of Malaya, Kuala Lumpur, Malaysia
| | - Raj Gururajan
- School of Business, University of Southern Queensland, Springfield, Australia
| |
Collapse
|
67
|
Li B, Xu Y, Wang Y, Zhang B. DECTNet: Dual Encoder Network combined convolution and Transformer architecture for medical image segmentation. PLoS One 2024; 19:e0301019. [PMID: 38573957 PMCID: PMC10994332 DOI: 10.1371/journal.pone.0301019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 03/09/2024] [Indexed: 04/06/2024] Open
Abstract
Automatic and accurate segmentation of medical images plays an essential role in disease diagnosis and treatment planning. Convolution neural networks have achieved remarkable results in medical image segmentation in the past decade. Meanwhile, deep learning models based on Transformer architecture also succeeded tremendously in this domain. However, due to the ambiguity of the medical image boundary and the high complexity of physical organization structures, implementing effective structure extraction and accurate segmentation remains a problem requiring a solution. In this paper, we propose a novel Dual Encoder Network named DECTNet to alleviate this problem. Specifically, the DECTNet embraces four components, which are a convolution-based encoder, a Transformer-based encoder, a feature fusion decoder, and a deep supervision module. The convolutional structure encoder can extract fine spatial contextual details in images. Meanwhile, the Transformer structure encoder is designed using a hierarchical Swin Transformer architecture to model global contextual information. The novel feature fusion decoder integrates the multi-scale representation from two encoders and selects features that focus on segmentation tasks by channel attention mechanism. Further, a deep supervision module is used to accelerate the convergence of the proposed method. Extensive experiments demonstrate that, compared to the other seven models, the proposed method achieves state-of-the-art results on four segmentation tasks: skin lesion segmentation, polyp segmentation, Covid-19 lesion segmentation, and MRI cardiac segmentation.
Collapse
Affiliation(s)
- Boliang Li
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yaming Xu
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yan Wang
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Bo Zhang
- Sergeant Schools of Army Academy of Armored Forces, Changchun, Jilin, China
| |
Collapse
|
68
|
Waheed Z, Gui J. An optimized ensemble model bfased on cuckoo search with Levy Flight for automated gastrointestinal disease detection. MULTIMEDIA TOOLS AND APPLICATIONS 2024; 83:89695-89722. [DOI: 10.1007/s11042-024-18937-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/04/2024] [Accepted: 03/13/2024] [Indexed: 01/15/2025]
|
69
|
Goceri E. Polyp Segmentation Using a Hybrid Vision Transformer and a Hybrid Loss Function. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:851-863. [PMID: 38343250 PMCID: PMC11031515 DOI: 10.1007/s10278-023-00954-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 09/16/2023] [Accepted: 10/02/2023] [Indexed: 04/20/2024]
Abstract
Accurate and early detection of precursor adenomatous polyps and their removal at the early stage can significantly decrease the mortality rate and the occurrence of the disease since most colorectal cancer evolve from adenomatous polyps. However, accurate detection and segmentation of the polyps by doctors are difficult mainly these factors: (i) quality of the screening of the polyps with colonoscopy depends on the imaging quality and the experience of the doctors; (ii) visual inspection by doctors is time-consuming, burdensome, and tiring; (iii) prolonged visual inspections can lead to polyps being missed even when the physician is experienced. To overcome these problems, computer-aided methods have been proposed. However, they have some disadvantages or limitations. Therefore, in this work, a new architecture based on residual transformer layers has been designed and used for polyp segmentation. In the proposed segmentation, both high-level semantic features and low-level spatial features have been utilized. Also, a novel hybrid loss function has been proposed. The loss function designed with focal Tversky loss, binary cross-entropy, and Jaccard index reduces image-wise and pixel-wise differences as well as improves regional consistencies. Experimental works have indicated the effectiveness of the proposed approach in terms of dice similarity (0.9048), recall (0.9041), precision (0.9057), and F2 score (0.8993). Comparisons with the state-of-the-art methods have shown its better performance.
Collapse
|
70
|
Li F, Huang Z, Zhou L, Chen Y, Tang S, Ding P, Peng H, Chu Y. Improved dual-aggregation polyp segmentation network combining a pyramid vision transformer with a fully convolutional network. BIOMEDICAL OPTICS EXPRESS 2024; 15:2590-2621. [PMID: 38633077 PMCID: PMC11019695 DOI: 10.1364/boe.510908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/26/2024] [Accepted: 03/08/2024] [Indexed: 04/19/2024]
Abstract
Automatic and precise polyp segmentation in colonoscopy images is highly valuable for diagnosis at an early stage and surgery of colorectal cancer. Nevertheless, it still posed a major challenge due to variations in the size and intricate morphological characteristics of polyps coupled with the indistinct demarcation between polyps and mucosas. To alleviate these challenges, we proposed an improved dual-aggregation polyp segmentation network, dubbed Dua-PSNet, for automatic and accurate full-size polyp prediction by combining both the transformer branch and a fully convolutional network (FCN) branch in a parallel style. Concretely, in the transformer branch, we adopted the B3 variant of pyramid vision transformer v2 (PVTv2-B3) as an image encoder for capturing multi-scale global features and modeling long-distant interdependencies between them whilst designing an innovative multi-stage feature aggregation decoder (MFAD) to highlight critical local feature details and effectively integrate them into global features. In the decoder, the adaptive feature aggregation (AFA) block was constructed for fusing high-level feature representations of different scales generated by the PVTv2-B3 encoder in a stepwise adaptive manner for refining global semantic information, while the ResidualBlock module was devised to mine detailed boundary cues disguised in low-level features. With the assistance of the selective global-to-local fusion head (SGLFH) module, the resulting boundary details were aggregated selectively with these global semantic features, strengthening these hierarchical features to cope with scale variations of polyps. The FCN branch embedded in the designed ResidualBlock module was used to encourage extraction of highly merged fine features to match the outputs of the Transformer branch into full-size segmentation maps. In this way, both branches were reciprocally influenced and complemented to enhance the discrimination capability of polyp features and enable a more accurate prediction of a full-size segmentation map. Extensive experiments on five challenging polyp segmentation benchmarks demonstrated that the proposed Dua-PSNet owned powerful learning and generalization ability and advanced the state-of-the-art segmentation performance among existing cutting-edge methods. These excellent results showed our Dua-PSNet had great potential to be a promising solution for practical polyp segmentation tasks in which wide variations of data typically occurred.
Collapse
Affiliation(s)
- Feng Li
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Zetao Huang
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Lu Zhou
- Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai 200336, China
| | - Yuyang Chen
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Shiqing Tang
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Pengchao Ding
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Haixia Peng
- Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai 200336, China
| | - Yimin Chu
- Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai 200336, China
| |
Collapse
|
71
|
Du H, Wang J, Liu M, Wang Y, Meijering E. SwinPA-Net: Swin Transformer-Based Multiscale Feature Pyramid Aggregation Network for Medical Image Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5355-5366. [PMID: 36121961 DOI: 10.1109/tnnls.2022.3204090] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The precise segmentation of medical images is one of the key challenges in pathology research and clinical practice. However, many medical image segmentation tasks have problems such as large differences between different types of lesions and similar shapes as well as colors between lesions and surrounding tissues, which seriously affects the improvement of segmentation accuracy. In this article, a novel method called Swin Pyramid Aggregation network (SwinPA-Net) is proposed by combining two designed modules with Swin Transformer to learn more powerful and robust features. The two modules, named dense multiplicative connection (DMC) module and local pyramid attention (LPA) module, are proposed to aggregate the multiscale context information of medical images. The DMC module cascades the multiscale semantic feature information through dense multiplicative feature fusion, which minimizes the interference of shallow background noise to improve the feature expression and solves the problem of excessive variation in lesion size and type. Moreover, the LPA module guides the network to focus on the region of interest by merging the global attention and the local attention, which helps to solve similar problems. The proposed network is evaluated on two public benchmark datasets for polyp segmentation task and skin lesion segmentation task as well as a clinical private dataset for laparoscopic image segmentation task. Compared with existing state-of-the-art (SOTA) methods, the SwinPA-Net achieves the most advanced performance and can outperform the second-best method on the mean Dice score by 1.68%, 0.8%, and 1.2% on the three tasks, respectively.
Collapse
|
72
|
Yin X, Zeng J, Hou T, Tang C, Gan C, Jain DK, García S. RSAFormer: A method of polyp segmentation with region self-attention transformer. Comput Biol Med 2024; 172:108268. [PMID: 38493598 DOI: 10.1016/j.compbiomed.2024.108268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 03/07/2024] [Accepted: 03/07/2024] [Indexed: 03/19/2024]
Abstract
Colonoscopy has attached great importance to early screening and clinical diagnosis of colon cancer. It remains a challenging task to achieve fine segmentation of polyps. However, existing State-of-the-art models still have limited segmentation ability due to the lack of clear and highly similar boundaries between normal tissue and polyps. To deal with this problem, we propose a region self-attention enhancement network (RSAFormer) with a transformer encoder to capture more robust features. Different from other excellent methods, RSAFormer uniquely employs a dual decoder structure to generate various feature maps. Contrasting with traditional methods that typically employ a single decoder, it offers more flexibility and detail in feature extraction. RSAFormer also introduces a region self-attention enhancement module (RSA) to acquire more accurate feature information and foster a stronger interplay between low-level and high-level features. This module enhances uncertain areas to extract more precise boundary information, these areas being signified by regional context. Extensive experiments were conducted on five prevalent polyp datasets to demonstrate RSAFormer's proficiency. It achieves 92.2% and 83.5% mean Dice on Kvasir and ETIS, respectively, which outperformed most of the state-of-the-art models.
Collapse
Affiliation(s)
- Xuehui Yin
- School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
| | - Jun Zeng
- School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
| | - Tianxiao Hou
- School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
| | - Chao Tang
- School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
| | - Chenquan Gan
- School of Cyber Security and Information Law, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
| | - Deepak Kumar Jain
- Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, Dalian University of Technology, Dalian 116024, China; Symbiosis Institute of Technology, Symbiosis International University, Pune 412115, India.
| | - Salvador García
- Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence, University of Granada, Granada 18071, Spain.
| |
Collapse
|
73
|
Shu X, Wang J, Zhang A, Shi J, Wu XJ. CSCA U-Net: A channel and space compound attention CNN for medical image segmentation. Artif Intell Med 2024; 150:102800. [PMID: 38553146 DOI: 10.1016/j.artmed.2024.102800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 12/10/2023] [Accepted: 02/03/2024] [Indexed: 04/02/2024]
Abstract
Image segmentation is one of the vital steps in medical image analysis. A large number of methods based on convolutional neural networks have emerged, which can extract abstract features from multiple-modality medical images, learn valuable information that is difficult to recognize by humans, and obtain more reliable results than traditional image segmentation approaches. U-Net, due to its simple structure and excellent performance, is widely used in medical image segmentation. In this paper, to further improve the performance of U-Net, we propose a channel and space compound attention (CSCA) convolutional neural network, CSCA U-Net in abbreviation, which increases the network depth and employs a double squeeze-and-excitation (DSE) block in the bottleneck layer to enhance feature extraction and obtain more high-level semantic features. Moreover, the characteristics of the proposed method are three-fold: (1) channel and space compound attention (CSCA) block, (2) cross-layer feature fusion (CLFF), and (3) deep supervision (DS). Extensive experiments on several available medical image datasets, including Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, ETIS, CVC-T, 2018 Data Science Bowl (2018 DSB), ISIC 2018, and JSUAH-Cerebellum, show that CSCA U-Net achieves competitive results and significantly improves generalization performance. The codes and trained models are available at https://github.com/xiaolanshu/CSCA-U-Net.
Collapse
Affiliation(s)
- Xin Shu
- School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang, 212100, Jiangsu, China; Development and Related Diseases of Women and Children Key Laboratory of Sichuan Province, Chengdu, 610041, Sichuan, China.
| | - Jiashu Wang
- School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang, 212100, Jiangsu, China
| | - Aoping Zhang
- School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang, 212100, Jiangsu, China
| | - Jinlong Shi
- School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang, 212100, Jiangsu, China
| | - Xiao-Jun Wu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214122, Jiangsu, China
| |
Collapse
|
74
|
Yang C, Zhang Z. PFD-Net: Pyramid Fourier Deformable Network for medical image segmentation. Comput Biol Med 2024; 172:108302. [PMID: 38503092 DOI: 10.1016/j.compbiomed.2024.108302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/26/2024] [Accepted: 03/12/2024] [Indexed: 03/21/2024]
Abstract
Medical image segmentation is crucial for accurately locating lesion regions and assisting doctors in diagnosis. However, most existing methods fail to effectively utilize both local details and global semantic information in medical image segmentation, resulting in the inability to effectively capture fine-grained content such as small targets and irregular boundaries. To address this issue, we propose a novel Pyramid Fourier Deformable Network (PFD-Net) for medical image segmentation, which leverages the strengths of CNN and Transformer. The PFD-Net first utilizes PVTv2-based Transformer as the primary encoder to capture global information and further enhances both local and global feature representations with the Fast Fourier Convolution Residual (FFCR) module. Moreover, PFD-Net further proposes the Dilated Deformable Refinement (DDR) module to enhance the model's capacity to comprehend global semantic structures of shape-diverse targets and their irregular boundaries. Lastly, Cross-Level Fusion Block with deformable convolution (CLFB) is proposed to combine the decoded feature maps from the final Residual Decoder Block (DDR) with local features from the CNN auxiliary encoder branch, improving the network's ability to perceive targets resembling the surrounding structures. Extensive experiments were conducted on nine publicly medical image datasets for five types of segmentation tasks including polyp, abdominal, cardiac, gland cells and nuclei. The qualitative and quantitative results demonstrate that PFD-Net outperforms existing state-of-the-art methods in various evaluation metrics, and achieves the highest performance of mDice with the value of 0.826 on the most challenging dataset (ETIS), which is 1.8% improvement compared to the previous best-performing HSNet and 3.6% improvement compared to the next-best PVT-CASCADE. Codes are available at https://github.com/ChaorongYang/PFD-Net.
Collapse
Affiliation(s)
- Chaorong Yang
- College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang 050024, China; Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics & Data Security, Shijiazhuang 050024, China; Hebei Provincial Key Laboratory of Network & Information Security, Hebei Normal University, Shijiazhuang 050024, China.
| | - Zhaohui Zhang
- College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang 050024, China; Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics & Data Security, Shijiazhuang 050024, China; Hebei Provincial Key Laboratory of Network & Information Security, Hebei Normal University, Shijiazhuang 050024, China.
| |
Collapse
|
75
|
Ahmad B, Floor PA, Farup I, Andersen CF. Single-Image-Based 3D Reconstruction of Endoscopic Images. J Imaging 2024; 10:82. [PMID: 38667980 PMCID: PMC11051210 DOI: 10.3390/jimaging10040082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 03/22/2024] [Accepted: 03/26/2024] [Indexed: 04/28/2024] Open
Abstract
A wireless capsule endoscope (WCE) is a medical device designed for the examination of the human gastrointestinal (GI) tract. Three-dimensional models based on WCE images can assist in diagnostics by effectively detecting pathology. These 3D models provide gastroenterologists with improved visualization, particularly in areas of specific interest. However, the constraints of WCE, such as lack of controllability, and requiring expensive equipment for operation, which is often unavailable, pose significant challenges when it comes to conducting comprehensive experiments aimed at evaluating the quality of 3D reconstruction from WCE images. In this paper, we employ a single-image-based 3D reconstruction method on an artificial colon captured with an endoscope that behaves like WCE. The shape from shading (SFS) algorithm can reconstruct the 3D shape using a single image. Therefore, it has been employed to reconstruct the 3D shapes of the colon images. The camera of the endoscope has also been subjected to comprehensive geometric and radiometric calibration. Experiments are conducted on well-defined primitive objects to assess the method's robustness and accuracy. This evaluation involves comparing the reconstructed 3D shapes of primitives with ground truth data, quantified through measurements of root-mean-square error and maximum error. Afterward, the same methodology is applied to recover the geometry of the colon. The results demonstrate that our approach is capable of reconstructing the geometry of the colon captured with a camera with an unknown imaging pipeline and significant noise in the images. The same procedure is applied on WCE images for the purpose of 3D reconstruction. Preliminary results are subsequently generated to illustrate the applicability of our method for reconstructing 3D models from WCE images.
Collapse
Affiliation(s)
- Bilal Ahmad
- Department of Computer Science, Norwegian University of Science & Technology, 2815 Gjøvik, Norway; (P.A.F.); (I.F.); (C.F.A.)
| | | | | | | |
Collapse
|
76
|
Sikkandar MY, Sundaram SG, Alassaf A, AlMohimeed I, Alhussaini K, Aleid A, Alolayan SA, Ramkumar P, Almutairi MK, Begum SS. Utilizing adaptive deformable convolution and position embedding for colon polyp segmentation with a visual transformer. Sci Rep 2024; 14:7318. [PMID: 38538774 PMCID: PMC11377543 DOI: 10.1038/s41598-024-57993-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 03/24/2024] [Indexed: 09/07/2024] Open
Abstract
Polyp detection is a challenging task in the diagnosis of Colorectal Cancer (CRC), and it demands clinical expertise due to the diverse nature of polyps. The recent years have witnessed the development of automated polyp detection systems to assist the experts in early diagnosis, considerably reducing the time consumption and diagnostic errors. In automated CRC diagnosis, polyp segmentation is an important step which is carried out with deep learning segmentation models. Recently, Vision Transformers (ViT) are slowly replacing these models due to their ability to capture long range dependencies among image patches. However, the existing ViTs for polyp do not harness the inherent self-attention abilities and incorporate complex attention mechanisms. This paper presents Polyp-Vision Transformer (Polyp-ViT), a novel Transformer model based on the conventional Transformer architecture, which is enhanced with adaptive mechanisms for feature extraction and positional embedding. Polyp-ViT is tested on the Kvasir-seg and CVC-Clinic DB Datasets achieving segmentation accuracies of 0.9891 ± 0.01 and 0.9875 ± 0.71 respectively, outperforming state-of-the-art models. Polyp-ViT is a prospective tool for polyp segmentation which can be adapted to other medical image segmentation tasks as well due to its ability to generalize well.
Collapse
Affiliation(s)
- Mohamed Yacin Sikkandar
- Department of Medical Equipment Technology, College of Applied Medical Sciences, Majmaah University, Al Majmaah, 11952, Saudi Arabia.
| | - Sankar Ganesh Sundaram
- Department of Artificial Intelligence and Data Science, KPR Institute of Engineering and Technology, Coimbatore, 641407, India
| | - Ahmad Alassaf
- Department of Medical Equipment Technology, College of Applied Medical Sciences, Majmaah University, Al Majmaah, 11952, Saudi Arabia
| | - Ibrahim AlMohimeed
- Department of Medical Equipment Technology, College of Applied Medical Sciences, Majmaah University, Al Majmaah, 11952, Saudi Arabia
| | - Khalid Alhussaini
- Department of Biomedical Technology, College of Applied Medical Sciences, King Saud University, Riyadh, 12372, Saudi Arabia
| | - Adham Aleid
- Department of Biomedical Technology, College of Applied Medical Sciences, King Saud University, Riyadh, 12372, Saudi Arabia
| | - Salem Ali Alolayan
- Department of Medical Equipment Technology, College of Applied Medical Sciences, Majmaah University, Al Majmaah, 11952, Saudi Arabia
| | - P Ramkumar
- Department of Computer Science and Engineering, Sri Sairam College of Engineering, Anekal, Bengaluru, 562106, Karnataka, India
| | - Meshal Khalaf Almutairi
- Department of Medical Equipment Technology, College of Applied Medical Sciences, Majmaah University, Al Majmaah, 11952, Saudi Arabia
| | - S Sabarunisha Begum
- Department of Biotechnology, P.S.R. Engineering College, Sivakasi, 626140, India
| |
Collapse
|
77
|
Zhang Y, Yang G, Gong C, Zhang J, Wang S, Wang Y. Polyp segmentation with interference filtering and dynamic uncertainty mining. Phys Med Biol 2024; 69:075016. [PMID: 38382099 DOI: 10.1088/1361-6560/ad2b94] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/21/2024] [Indexed: 02/23/2024]
Abstract
Objective.Accurate polyp segmentation from colo-noscopy images plays a crucial role in the early diagnosis and treatment of colorectal cancer. However, existing polyp segmentation methods are inevitably affected by various image noises, such as reflections, motion blur, and feces, which significantly affect the performance and generalization of the model. In addition, coupled with ambiguous boundaries between polyps and surrounding tissue, i.e. small inter-class differences, accurate polyp segmentation remains a challenging problem.Approach.To address these issues, we propose a novel two-stage polyp segmentation method that leverages a preprocessing sub-network (Pre-Net) and a dynamic uncertainty mining network (DUMNet) to improve the accuracy of polyp segmentation. Pre-Net identifies and filters out interference regions before feeding the colonoscopy images to the polyp segmentation network DUMNet. Considering the confusing polyp boundaries, DUMNet employs the uncertainty mining module (UMM) to dynamically focus on foreground, background, and uncertain regions based on different pixel confidences. UMM helps to mine and enhance more detailed context, leading to coarse-to-fine polyp segmentation and precise localization of polyp regions.Main results.We conduct experiments on five popular polyp segmentation benchmarks: ETIS, CVC-ClinicDB, CVC-ColonDB, EndoScene, and Kvasir. Our method achieves state-of-the-art performance. Furthermore, the proposed Pre-Net has strong portability and can improve the accuracy of existing polyp segmentation models.Significance.The proposed method improves polyp segmentation performance by eliminating interference and mining uncertain regions. This aids doctors in making precise and reduces the risk of colorectal cancer. Our code will be released athttps://github.com/zyh5119232/DUMNet.
Collapse
Affiliation(s)
- Yunhua Zhang
- Northeastern University, Shenyang 110819, People's Republic of China
- DUT Artificial Intelligence Institute, Dalian 116024, People's Republic of China
| | - Gang Yang
- Northeastern University, Shenyang 110819, People's Republic of China
| | - Congjin Gong
- Northeastern University, Shenyang 110819, People's Republic of China
| | - Jianhao Zhang
- Northeastern University, Shenyang 110819, People's Republic of China
| | - Shuo Wang
- Northeastern University, Shenyang 110819, People's Republic of China
| | - Yutao Wang
- Northeastern University, Shenyang 110819, People's Republic of China
| |
Collapse
|
78
|
Xu C, Fan K, Mo W, Cao X, Jiao K. Dual ensemble system for polyp segmentation with submodels adaptive selection ensemble. Sci Rep 2024; 14:6152. [PMID: 38485963 PMCID: PMC10940608 DOI: 10.1038/s41598-024-56264-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 03/04/2024] [Indexed: 03/18/2024] Open
Abstract
Colonoscopy is one of the main methods to detect colon polyps, and its detection is widely used to prevent and diagnose colon cancer. With the rapid development of computer vision, deep learning-based semantic segmentation methods for colon polyps have been widely researched. However, the accuracy and stability of some methods in colon polyp segmentation tasks show potential for further improvement. In addition, the issue of selecting appropriate sub-models in ensemble learning for the colon polyp segmentation task still needs to be explored. In order to solve the above problems, we first implement the utilization of multi-complementary high-level semantic features through the Multi-Head Control Ensemble. Then, to solve the sub-model selection problem in training, we propose SDBH-PSO Ensemble for sub-model selection and optimization of ensemble weights for different datasets. The experiments were conducted on the public datasets CVC-ClinicDB, Kvasir, CVC-ColonDB, ETIS-LaribPolypDB and PolypGen. The results show that the DET-Former, constructed based on the Multi-Head Control Ensemble and the SDBH-PSO Ensemble, consistently provides improved accuracy across different datasets. Among them, the Multi-Head Control Ensemble demonstrated superior feature fusion capability in the experiments, and the SDBH-PSO Ensemble demonstrated excellent sub-model selection capability. The sub-model selection capabilities of the SDBH-PSO Ensemble will continue to have significant reference value and practical utility as deep learning networks evolve.
Collapse
Affiliation(s)
- Cun Xu
- Guilin University of Electronic Technology, Guilin, 541000, China
| | - Kefeng Fan
- China Electronics Standardization Institute, Beijing, 100007, China.
| | - Wei Mo
- Guilin University of Electronic Technology, Guilin, 541000, China
| | - Xuguang Cao
- Guilin University of Electronic Technology, Guilin, 541000, China
| | - Kaijie Jiao
- Guilin University of Electronic Technology, Guilin, 541000, China
| |
Collapse
|
79
|
Al-Otaibi S, Rehman A, Mujahid M, Alotaibi S, Saba T. Efficient-gastro: optimized EfficientNet model for the detection of gastrointestinal disorders using transfer learning and wireless capsule endoscopy images. PeerJ Comput Sci 2024; 10:e1902. [PMID: 38660212 PMCID: PMC11041956 DOI: 10.7717/peerj-cs.1902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 01/31/2024] [Indexed: 04/26/2024]
Abstract
Gastrointestinal diseases cause around two million deaths globally. Wireless capsule endoscopy is a recent advancement in medical imaging, but manual diagnosis is challenging due to the large number of images generated. This has led to research into computer-assisted methodologies for diagnosing these images. Endoscopy produces thousands of frames for each patient, making manual examination difficult, laborious, and error-prone. An automated approach is essential to speed up the diagnosis process, reduce costs, and potentially save lives. This study proposes transfer learning-based efficient deep learning methods for detecting gastrointestinal disorders from multiple modalities, aiming to detect gastrointestinal diseases with superior accuracy and reduce the efforts and costs of medical experts. The Kvasir eight-class dataset was used for the experiment, where endoscopic images were preprocessed and enriched with augmentation techniques. An EfficientNet model was optimized via transfer learning and fine tuning, and the model was compared to the most widely used pre-trained deep learning models. The model's efficacy was tested on another independent endoscopic dataset to prove its robustness and reliability.
Collapse
Affiliation(s)
- Shaha Al-Otaibi
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Amjad Rehman
- Artificial Intelligence & Data Analytics Lab CCIS, Prince Sultan University, Riyadh, Saudi Arabia
| | - Muhammad Mujahid
- Artificial Intelligence & Data Analytics Lab CCIS, Prince Sultan University, Riyadh, Saudi Arabia
| | - Sarah Alotaibi
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Tanzila Saba
- Artificial Intelligence & Data Analytics Lab CCIS, Prince Sultan University, Riyadh, Saudi Arabia
| |
Collapse
|
80
|
Zhang Y, Shen Z, Jiao R. Segment anything model for medical image segmentation: Current applications and future directions. Comput Biol Med 2024; 171:108238. [PMID: 38422961 DOI: 10.1016/j.compbiomed.2024.108238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 02/06/2024] [Accepted: 02/25/2024] [Indexed: 03/02/2024]
Abstract
Due to the inherent flexibility of prompting, foundation models have emerged as the predominant force in the fields of natural language processing and computer vision. The recent introduction of the Segment Anything Model (SAM) signifies a noteworthy expansion of the prompt-driven paradigm into the domain of image segmentation, thereby introducing a plethora of previously unexplored capabilities. However, the viability of its application to medical image segmentation remains uncertain, given the substantial distinctions between natural and medical images. In this work, we provide a comprehensive overview of recent endeavors aimed at extending the efficacy of SAM to medical image segmentation tasks, encompassing both empirical benchmarking and methodological adaptations. Additionally, we explore potential avenues for future research directions in SAM's role within medical image segmentation. While direct application of SAM to medical image segmentation does not yield satisfactory performance on multi-modal and multi-target medical datasets so far, numerous insights gleaned from these efforts serve as valuable guidance for shaping the trajectory of foundational models in the realm of medical image analysis. To support ongoing research endeavors, we maintain an active repository that contains an up-to-date paper list and a succinct summary of open-source projects at https://github.com/YichiZhang98/SAM4MIS.
Collapse
Affiliation(s)
- Yichi Zhang
- School of Data Science, Fudan University, Shanghai, China.
| | - Zhenrong Shen
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Rushi Jiao
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
81
|
Mozaffari J, Amirkhani A, Shokouhi SB. ColonGen: an efficient polyp segmentation system for generalization improvement using a new comprehensive dataset. Phys Eng Sci Med 2024; 47:309-325. [PMID: 38224384 DOI: 10.1007/s13246-023-01368-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 12/06/2023] [Indexed: 01/16/2024]
Abstract
Colorectal cancer (CRC) is one of the most common causes of cancer-related deaths. While polyp detection is important for diagnosing CRC, high miss rates for polyps have been reported during colonoscopy. Most deep learning methods extract features from images using convolutional neural networks (CNNs). In recent years, vision transformer (ViT) models have been employed for image processing and have been successful in image segmentation. It is possible to improve image processing by using transformer models that can extract spatial location information, and CNNs that are capable of aggregating local information. Despite this, recent research shows limited effectiveness in increasing data diversity and generalization accuracy. This paper investigates the generalization proficiency of polyp image segmentation based on transformer architecture and proposes a novel approach using two different ViT architectures. This allows the model to learn representations from different perspectives, which can then be combined to create a richer feature representation. Additionally, a more universal and comprehensive dataset has been derived from the datasets presented in the related research, which can be used for improving generalizations. We first evaluated the generalization of our proposed model using three distinct training-testing scenarios. Our experimental results demonstrate that our ColonGen-V1 outperforms other state-of-the-art methods in all scenarios. As a next step, we used the comprehensive dataset for improving the performance of the model against in- and out-of-domain data. The results show that our ColonGen-V2 outperforms state-of-the-art studies by 5.1%, 1.3%, and 1.1% in ETIS-Larib, Kvasir-Seg, and CVC-ColonDB datasets, respectively. The inclusive dataset and the model introduced in this paper are available to the public through this link: https://github.com/javadmozaffari/Polyp_segmentation .
Collapse
Affiliation(s)
- Javad Mozaffari
- School of Electrical Engineering, Iran University of Science and Technology, Tehran, 16846-13114, Iran
| | - Abdollah Amirkhani
- School of Automotive Engineering, Iran University of Science and Technology, Tehran, 16846-13114, Iran.
| | - Shahriar B Shokouhi
- School of Electrical Engineering, Iran University of Science and Technology, Tehran, 16846-13114, Iran
| |
Collapse
|
82
|
Kumari S, Singh P. Deep learning for unsupervised domain adaptation in medical imaging: Recent advancements and future perspectives. Comput Biol Med 2024; 170:107912. [PMID: 38219643 DOI: 10.1016/j.compbiomed.2023.107912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 11/02/2023] [Accepted: 12/24/2023] [Indexed: 01/16/2024]
Abstract
Deep learning has demonstrated remarkable performance across various tasks in medical imaging. However, these approaches primarily focus on supervised learning, assuming that the training and testing data are drawn from the same distribution. Unfortunately, this assumption may not always hold true in practice. To address these issues, unsupervised domain adaptation (UDA) techniques have been developed to transfer knowledge from a labeled domain to a related but unlabeled domain. In recent years, significant advancements have been made in UDA, resulting in a wide range of methodologies, including feature alignment, image translation, self-supervision, and disentangled representation methods, among others. In this paper, we provide a comprehensive literature review of recent deep UDA approaches in medical imaging from a technical perspective. Specifically, we categorize current UDA research in medical imaging into six groups and further divide them into finer subcategories based on the different tasks they perform. We also discuss the respective datasets used in the studies to assess the divergence between the different domains. Finally, we discuss emerging areas and provide insights and discussions on future research directions to conclude this survey.
Collapse
Affiliation(s)
- Suruchi Kumari
- Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, India.
| | - Pravendra Singh
- Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, India.
| |
Collapse
|
83
|
Wang M, An X, Pei Z, Li N, Zhang L, Liu G, Ming D. An Efficient Multi-Task Synergetic Network for Polyp Segmentation and Classification. IEEE J Biomed Health Inform 2024; 28:1228-1239. [PMID: 37155397 DOI: 10.1109/jbhi.2023.3273728] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Colonoscopy is considered the best diagnostic tool for early detection and resection of polyps, which can effectively prevent consequential colorectal cancer. In clinical practice, segmenting and classifying polyps from colonoscopic images have a great significance since they provide precious information for diagnosis and treatment. In this study, we propose an efficient multi-task synergetic network (EMTS-Net) for concurrent polyp segmentation and classification, and we introduce a polyp classification benchmark for exploring the potential correlations of the above-mentioned two tasks. This framework is composed of an enhanced multi-scale network (EMS-Net) for coarse-grained polyp segmentation, an EMTS-Net (Class) for accurate polyp classification, and an EMTS-Net (Seg) for fine-grained polyp segmentation. Specifically, we first obtain coarse segmentation masks by using EMS-Net. Then, we concatenate these rough masks with colonoscopic images to assist EMTS-Net (Class) in locating and classifying polyps precisely. To further enhance the segmentation performance of polyps, we propose a random multi-scale (RMS) training strategy to eliminate the interference caused by redundant information. In addition, we design an offline dynamic class activation mapping (OFLD CAM) generated by the combined effect of EMTS-Net (Class) and RMS strategy, which optimizes bottlenecks between multi-task networks efficiently and elegantly and helps EMTS-Net (Seg) to perform more accurate polyp segmentation. We evaluate the proposed EMTS-Net on the polyp segmentation and classification benchmarks, and it achieves an average mDice of 0.864 in polyp segmentation and an average AUC of 0.913 with an average accuracy of 0.924 in polyp classification. Quantitative and qualitative evaluations on the polyp segmentation and classification benchmarks demonstrate that our EMTS-Net achieves the best performance and outperforms previous state-of-the-art methods in terms of both efficiency and generalization.
Collapse
|
84
|
Jia X, Shen Y, Yang J, Song R, Zhang W, Meng MQH, Liao JC, Xing L. PolypMixNet: Enhancing semi-supervised polyp segmentation with polyp-aware augmentation. Comput Biol Med 2024; 170:108006. [PMID: 38325216 DOI: 10.1016/j.compbiomed.2024.108006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/29/2023] [Accepted: 01/13/2024] [Indexed: 02/09/2024]
Abstract
BACKGROUND AI-assisted polyp segmentation in colonoscopy plays a crucial role in enabling prompt diagnosis and treatment of colorectal cancer. However, the lack of sufficient annotated data poses a significant challenge for supervised learning approaches. Existing semi-supervised learning methods also suffer from performance degradation, mainly due to task-specific characteristics, such as class imbalance in polyp segmentation. PURPOSE The purpose of this work is to develop an effective semi-supervised learning framework for accurate polyp segmentation in colonoscopy, addressing limited annotated data and class imbalance challenges. METHODS We proposed PolypMixNet, a semi-supervised framework, for colorectal polyp segmentation, utilizing novel augmentation techniques and a Mean Teacher architecture to improve model performance. PolypMixNet introduces the polyp-aware mixup (PolypMix) algorithm and incorporates dual-level consistency regularization. PolypMix addresses the class imbalance in colonoscopy datasets and enhances the diversity of training data. By performing a polyp-aware mixup on unlabeled samples, it generates mixed images with polyp context along with their artificial labels. A polyp-directed soft pseudo-labeling (PDSPL) mechanism was proposed to generate high-quality pseudo labels and eliminate the dilution of lesion features caused by mixup operations. To ensure consistency in the training phase, we introduce the PolypMix prediction consistency (PMPC) loss and PolypMix attention consistency (PMAC) loss, enforcing consistency at both image and feature levels. Code is available at https://github.com/YChienHung/PolypMix. RESULTS PolypMixNet was evaluated on four public colonoscopy datasets, achieving 88.97% Dice and 88.85% mIoU on the benchmark dataset of Kvasir-SEG. In scenarios where the labeled training data is limited to 15%, PolypMixNet outperforms the state-of-the-art semi-supervised approaches with a 2.88-point improvement in Dice. It also shows the ability to reach performance comparable to the fully supervised counterpart. Additionally, we conducted extensive ablation studies to validate the effectiveness of each module and highlight the superiority of our proposed approach. CONCLUSION PolypMixNet effectively addresses the challenges posed by limited annotated data and unbalanced class distributions in polyp segmentation. By leveraging unlabeled data and incorporating novel augmentation and consistency regularization techniques, our method achieves state-of-the-art performance. We believe that the insights and contributions presented in this work will pave the way for further advancements in semi-supervised polyp segmentation and inspire future research in the medical imaging domain.
Collapse
Affiliation(s)
- Xiao Jia
- School of Control Science and Engineering, Shandong University, Jinan, China.
| | - Yutian Shen
- Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, China.
| | - Jianhong Yang
- School of Control Science and Engineering, Shandong University, Jinan, China.
| | - Ran Song
- School of Control Science and Engineering, Shandong University, Jinan, China.
| | - Wei Zhang
- School of Control Science and Engineering, Shandong University, Jinan, China.
| | - Max Q-H Meng
- Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, China.
| | - Joseph C Liao
- Department of Urology, Stanford University, Stanford, 94305, CA, USA; VA Palo Alto Health Care System, Palo Alto, 94304, CA, USA.
| | - Lei Xing
- Department of Radiation Oncology, Stanford University, Stanford, 94305, CA, USA.
| |
Collapse
|
85
|
Wang Z, Yu L, Tian S, Huo X. CRMEFNet: A coupled refinement, multiscale exploration and fusion network for medical image segmentation. Comput Biol Med 2024; 171:108202. [PMID: 38402839 DOI: 10.1016/j.compbiomed.2024.108202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 12/22/2023] [Accepted: 02/18/2024] [Indexed: 02/27/2024]
Abstract
Accurate segmentation of target areas in medical images, such as lesions, is essential for disease diagnosis and clinical analysis. In recent years, deep learning methods have been intensively researched and have generated significant progress in medical image segmentation tasks. However, most of the existing methods have limitations in modeling multilevel feature representations and identification of complex textured pixels at contrasting boundaries. This paper proposes a novel coupled refinement and multiscale exploration and fusion network (CRMEFNet) for medical image segmentation, which explores in the optimization and fusion of multiscale features to address the abovementioned limitations. The CRMEFNet consists of three main innovations: a coupled refinement module (CRM), a multiscale exploration and fusion module (MEFM), and a cascaded progressive decoder (CPD). The CRM decouples features into low-frequency body features and high-frequency edge features, and performs targeted optimization of both to enhance intraclass uniformity and interclass differentiation of features. The MEFM performs a two-stage exploration and fusion of multiscale features using our proposed multiscale aggregation attention mechanism, which explores the differentiated information within the cross-level features, and enhances the contextual connections between the features, to achieves adaptive feature fusion. Compared to existing complex decoders, the CPD decoder (consisting of the CRM and MEFM) can perform fine-grained pixel recognition while retaining complete semantic location information. It also has a simple design and excellent performance. The experimental results from five medical image segmentation tasks, ten datasets and twelve comparison models demonstrate the state-of-the-art performance, interpretability, flexibility and versatility of our CRMEFNet.
Collapse
Affiliation(s)
- Zhi Wang
- College of Software, Xinjiang University, Urumqi, 830000, China; Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000, China
| | - Long Yu
- College of Network Center, Xinjiang University, Urumqi, 830000, China; Signal and Signal Processing Laboratory, College of Information Science and Engineering, Xinjiang University, Urumqi, 830000, China.
| | - Shengwei Tian
- College of Software, Xinjiang University, Urumqi, 830000, China; Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000, China
| | - Xiangzuo Huo
- Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000, China; Signal and Signal Processing Laboratory, College of Information Science and Engineering, Xinjiang University, Urumqi, 830000, China
| |
Collapse
|
86
|
Yue G, Zhuo G, Yan W, Zhou T, Tang C, Yang P, Wang T. Boundary uncertainty aware network for automated polyp segmentation. Neural Netw 2024; 170:390-404. [PMID: 38029720 DOI: 10.1016/j.neunet.2023.11.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 07/15/2023] [Accepted: 11/22/2023] [Indexed: 12/01/2023]
Abstract
Recently, leveraging deep neural networks for automated colorectal polyp segmentation has emerged as a hot topic due to the favored advantages in evading the limitations of visual inspection, e.g., overwork and subjectivity. However, most existing methods do not pay enough attention to the uncertain areas of colonoscopy images and often provide unsatisfactory segmentation performance. In this paper, we propose a novel boundary uncertainty aware network (BUNet) for precise and robust colorectal polyp segmentation. Specifically, considering that polyps vary greatly in size and shape, we first adopt a pyramid vision transformer encoder to learn multi-scale feature representations. Then, a simple yet effective boundary exploration module (BEM) is proposed to explore boundary cues from the low-level features. To make the network focus on the ambiguous area where the prediction score is biased to neither the foreground nor the background, we further introduce a boundary uncertainty aware module (BUM) that explores error-prone regions from the high-level features with the assistance of boundary cues provided by the BEM. Through the top-down hybrid deep supervision, our BUNet implements coarse-to-fine polyp segmentation and finally localizes polyp regions precisely. Extensive experiments on five public datasets show that BUNet is superior to thirteen competing methods in terms of both effectiveness and generalization ability.
Collapse
Affiliation(s)
- Guanghui Yue
- National-Reginoal Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Marshall Laboratory of Biomedical Engineering, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen 518060, China
| | - Guibin Zhuo
- National-Reginoal Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Marshall Laboratory of Biomedical Engineering, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen 518060, China
| | - Weiqing Yan
- School of Computer and Control Engineering, Yantai University, Yantai 264005, China
| | - Tianwei Zhou
- College of Management, Shenzhen University, Shenzhen 518060, China.
| | - Chang Tang
- School of Computer Science, China University of Geosciences, Wuhan 430074, China
| | - Peng Yang
- National-Reginoal Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Marshall Laboratory of Biomedical Engineering, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen 518060, China
| | - Tianfu Wang
- National-Reginoal Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Marshall Laboratory of Biomedical Engineering, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen 518060, China
| |
Collapse
|
87
|
Zhang Y, Zhou T, Tao Y, Wang S, Wu Y, Liu B, Gu P, Chen Q, Chen DZ. TestFit: A plug-and-play one-pass test time method for medical image segmentation. Med Image Anal 2024; 92:103069. [PMID: 38154382 DOI: 10.1016/j.media.2023.103069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Revised: 10/16/2023] [Accepted: 12/19/2023] [Indexed: 12/30/2023]
Abstract
Deep learning (DL) based methods have been extensively studied for medical image segmentation, mostly emphasizing the design and training of DL networks. Only few attempts were made on developing methods for applying DL models in test time. In this paper, we study whether a given off-the-shelf segmentation network can be stably improved on-the-fly during test time in an online processing-and-learning fashion. We propose a new online test-time method, called TestFit, to improve results of a given off-the-shelf DL segmentation model in test time by actively fitting the test data distribution. TestFit first creates a supplementary network (SuppNet) from the given trained off-the-shelf segmentation network (this original network is referred to as OGNet) and applies SuppNet together with OGNet for test time inference. OGNet keeps its hypothesis derived from the original training set to prevent the model from collapsing, while SuppNet seeks to fit the test data distribution. Segmentation results and supervision signals (for updating SuppNet) are generated by combining the outputs of OGNet and SuppNet on the fly. TestFit needs only one pass on each test sample - the same as the traditional test model pipeline - and requires no training time preparation. Since it is challenging to look at only one test sample and no manual annotation for model update each time, we develop a series of technical treatments for improving the stability and effectiveness of our proposed online test-time training method. TestFit works in a plug-and-play fashion, requires minimal hyper-parameter tuning, and is easy to use in practice. Experiments on a large collection of 2D and 3D datasets demonstrate the capability of our TestFit method.
Collapse
Affiliation(s)
- Yizhe Zhang
- Nanjing University of Science and Technology, Jiangsu 210094, China.
| | - Tao Zhou
- Nanjing University of Science and Technology, Jiangsu 210094, China
| | - Yuhui Tao
- Nanjing University of Science and Technology, Jiangsu 210094, China
| | - Shuo Wang
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai 200032, China
| | - Ye Wu
- Nanjing University of Science and Technology, Jiangsu 210094, China
| | - Benyuan Liu
- University of Massachusetts Lowell, MA 01854, USA
| | | | - Qiang Chen
- Nanjing University of Science and Technology, Jiangsu 210094, China
| | | |
Collapse
|
88
|
Li W, Huang Z, Li F, Zhao Y, Zhang H. CIFG-Net: Cross-level information fusion and guidance network for Polyp Segmentation. Comput Biol Med 2024; 169:107931. [PMID: 38181608 DOI: 10.1016/j.compbiomed.2024.107931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 12/03/2023] [Accepted: 01/01/2024] [Indexed: 01/07/2024]
Abstract
Colorectal cancer is a common malignant tumor of the digestive tract. Most colorectal cancer is caused by colorectal polyp lesions. Timely detection and removal of colorectal polyps can substantially reduce the incidence of colorectal cancer. Accurate polyp segmentation can provide important polyp information that can aid in the early diagnosis and treatment of colorectal cancer. However, polyps of the same type can vary in texture, color, and even size. Furthermore, some polyps are similar in colour to the surrounding healthy tissue, which makes the boundary between the polyp and the surrounding area unclear. In order to overcome the issues of inaccurate polyp localization and unclear boundary segmentation, we propose a polyp segmentation network based on cross-level information fusion and guidance. We use a Transformer encoder to extract a more robust feature representation. In addition, to refine the processing of feature information from encoders, we propose the edge feature processing module (EFPM) and the cross-level information processing module (CIPM). EFPM is used to focus on the boundary information in polyp features. After processing each feature, EFPM can obtain clear and accurate polyp boundary features, which can mitigate unclear boundary segmentation. CIPM is used to aggregate and process multi-scale features transmitted by various encoder layers and to solve the problem of inaccurate polyp location by using multi-level features to obtain the location information of polyps. In order to better use the processed features to optimise our segmentation effect, we also propose an information guidance module (IGM) to integrate the processed features of EFPM and CIPM to obtain accurate positioning and segmentation of polyps. Through experiments on five public polyp datasets using six metrics, it was demonstrated that the proposed network has better robustness and more accurate segmentation effect. Compared with other advanced algorithms, CIFG-Net has superior performance. Code available at: https://github.com/zspnb/CIFG-Net.
Collapse
Affiliation(s)
- Weisheng Li
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China.
| | - Zhaopeng Huang
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Feiyan Li
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Yinghui Zhao
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Hongchuan Zhang
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China
| |
Collapse
|
89
|
Wu H, Zhang B, Chen C, Qin J. Federated Semi-Supervised Medical Image Segmentation via Prototype-Based Pseudo-Labeling and Contrastive Learning. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:649-661. [PMID: 37703140 DOI: 10.1109/tmi.2023.3314430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]
Abstract
Existing federated learning works mainly focus on the fully supervised training setting. In realistic scenarios, however, most clinical sites can only provide data without annotations due to the lack of resources or expertise. In this work, we are concerned with the practical yet challenging federated semi-supervised segmentation (FSSS), where labeled data are only with several clients and other clients can just provide unlabeled data. We take an early attempt to tackle this problem and propose a novel FSSS method with prototype-based pseudo-labeling and contrastive learning. First, we transmit a labeled-aggregated model, which is obtained based on prototype similarity, to each unlabeled client, to work together with the global model for debiased pseudo labels generation via a consistency- and entropy-aware selection strategy. Second, we transfer image-level prototypes from labeled datasets to unlabeled clients and conduct prototypical contrastive learning on unlabeled models to enhance their discriminative power. Finally, we perform the dynamic model aggregation with a designed consistency-aware aggregation strategy to dynamically adjust the aggregation weights of each local model. We evaluate our method on COVID-19 X-ray infected region segmentation, COVID-19 CT infected region segmentation and colorectal polyp segmentation, and experimental results consistently demonstrate the effectiveness of our proposed method. Codes areavailable at https://github.com/zhangbaiming/FedSemiSeg.
Collapse
|
90
|
Zhang N, Yu L, Zhang D, Wu W, Tian S, Kang X, Li M. CT-Net: Asymmetric compound branch Transformer for medical image segmentation. Neural Netw 2024; 170:298-311. [PMID: 38006733 DOI: 10.1016/j.neunet.2023.11.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 09/08/2023] [Accepted: 11/13/2023] [Indexed: 11/27/2023]
Abstract
The Transformer architecture has been widely applied in the field of image segmentation due to its powerful ability to capture long-range dependencies. However, its ability to capture local features is relatively weak and it requires a large amount of data for training. Medical image segmentation tasks, on the other hand, demand high requirements for local features and are often applied to small datasets. Therefore, existing Transformer networks show a significant decrease in performance when applied directly to this task. To address these issues, we have designed a new medical image segmentation architecture called CT-Net. It effectively extracts local and global representations using an asymmetric asynchronous branch parallel structure, while reducing unnecessary computational costs. In addition, we propose a high-density information fusion strategy that efficiently fuses the features of two branches using a fusion module of only 0.05M. This strategy ensures high portability and provides conditions for directly applying transfer learning to solve dataset dependency issues. Finally, we have designed a parameter-adjustable multi-perceptive loss function for this architecture to optimize the training process from both pixel-level and global perspectives. We have tested this network on 5 different tasks with 9 datasets, and compared to SwinUNet, CT-Net improves the IoU by 7.3% and 1.8% on Glas and MoNuSeg datasets respectively. Moreover, compared to SwinUNet, the average DSC on the Synapse dataset is improved by 3.5%.
Collapse
Affiliation(s)
- Ning Zhang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Long Yu
- College of Information Science and Engineering, Xinjiang University, Urumqi 830000, China; College of Network Center, Xinjiang University, Urumqi 830000, China.
| | - Dezhi Zhang
- People's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang University, China
| | - Weidong Wu
- People's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang University, China
| | - Shengwei Tian
- College of Software, Xinjiang University, Urumqi 830000, China
| | - Xiaojing Kang
- People's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang University, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
91
|
Li C, Liu J, Tang J. Simultaneous segmentation and classification of colon cancer polyp images using a dual branch multi-task learning network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:2024-2049. [PMID: 38454673 DOI: 10.3934/mbe.2024090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Abstract
Accurate classification and segmentation of polyps are two important tasks in the diagnosis and treatment of colorectal cancers. Existing models perform segmentation and classification separately and do not fully make use of the correlation between the two tasks. Furthermore, polyps exhibit random regions and varying shapes and sizes, and they often share similar boundaries and backgrounds. However, existing models fail to consider these factors and thus are not robust because of their inherent limitations. To address these issues, we developed a multi-task network that performs both segmentation and classification simultaneously and can cope with the aforementioned factors effectively. Our proposed network possesses a dual-branch structure, comprising a transformer branch and a convolutional neural network (CNN) branch. This approach enhances local details within the global representation, improving both local feature awareness and global contextual understanding, thus contributing to the improved preservation of polyp-related information. Additionally, we have designed a feature interaction module (FIM) aimed at bridging the semantic gap between the two branches and facilitating the integration of diverse semantic information from both branches. This integration enables the full capture of global context information and local details related to polyps. To prevent the loss of edge detail information crucial for polyp identification, we have introduced a reverse attention boundary enhancement (RABE) module to gradually enhance edge structures and detailed information within polyp regions. Finally, we conducted extensive experiments on five publicly available datasets to evaluate the performance of our method in both polyp segmentation and classification tasks. The experimental results confirm that our proposed method outperforms other state-of-the-art methods.
Collapse
Affiliation(s)
- Chenqian Li
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China
- Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan 430065, China
| | - Jun Liu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China
- Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan 430065, China
| | - Jinshan Tang
- Department of Health Administration and Policy, College of Public Health, George Mason University, Fairfax, VA 22030, USA
| |
Collapse
|
92
|
Fan K, Xu C, Cao X, Jiao K, Mo W. Tri-branch feature pyramid network based on federated particle swarm optimization for polyp segmentation. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:1610-1624. [PMID: 38303480 DOI: 10.3934/mbe.2024070] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Deep learning technology has shown considerable potential in various domains. However, due to privacy issues associated with medical data, legal and ethical constraints often result in smaller datasets. The limitations of smaller datasets hinder the applicability of deep learning technology in the field of medical image processing. To address this challenge, we proposed the Federated Particle Swarm Optimization algorithm, which is designed to increase the efficiency of decentralized data utilization in federated learning and to protect privacy in model training. To stabilize the federated learning process, we introduced Tri-branch feature pyramid network (TFPNet), a multi-branch structure model. TFPNet mitigates instability during the aggregation model deployment and ensures fast convergence through its multi-branch structure. We conducted experiments on four different public datasets:CVC-ClinicDB, Kvasir, CVC-ColonDB and ETIS-LaribPolypDB. The experimental results show that the Federated Particle Swarm Optimization algorithm outperforms single dataset training and the Federated Averaging algorithm when using independent scattered data, and TFPNet converges faster and achieves superior segmentation accuracy compared to other models.
Collapse
Affiliation(s)
- Kefeng Fan
- China Electronics Standardization Institute, Beijing 100007, China
| | - Cun Xu
- School of Electronic and Automation, Guilin University of Electronic Technology, Guilin 541004, China
| | - Xuguang Cao
- School of Electronic and Automation, Guilin University of Electronic Technology, Guilin 541004, China
| | - Kaijie Jiao
- School of Electronic and Automation, Guilin University of Electronic Technology, Guilin 541004, China
| | - Wei Mo
- School of Electronic and Automation, Guilin University of Electronic Technology, Guilin 541004, China
| |
Collapse
|
93
|
Huang Z, Xie F, Qing W, Wang M, Liu M, Sun D. MGF-net: Multi-channel group fusion enhancing boundary attention for polyp segmentation. Med Phys 2024; 51:407-418. [PMID: 37403578 DOI: 10.1002/mp.16584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 05/11/2023] [Accepted: 06/02/2023] [Indexed: 07/06/2023] Open
Abstract
BACKGROUND Colonic polyps are the most prevalent neoplastic lesions detected during colorectal cancer screening, and timely detection and excision of these precursor lesions is crucial for preventing multiple malignancies and reducing mortality rates. PURPOSE The pressing need for intelligent polyp detection has led to the development of a high-precision intelligent polyp segmentation network designed to improve polyp screening rates during colonoscopies. METHODS In this study, we employed ResNet50 as the backbone network and embedded a multi-channel grouping fusion encoding module in the third to fifth stages to extract high-level semantic features of polyps. Receptive field modules were utilized to capture multi-scale features, and grouping fusion modules were employed to capture salient features in different group channels, guiding the decoder to generate an initial global mapping with improved accuracy. To refine the segmentation of the initial global mapping, we introduced an enhanced boundary weight attention module that adaptively thresholds the initial global mapping using learnable parameters. A self-attention mechanism was then utilized to calculate the long-distance dependency relationship of the polyp boundary area, resulting in an output feature map with enhanced boundaries that effectively refines the boundary of the target area. RESULTS We carried out contrast experiments of MGF-Net with mainstream polyp segmentation networks on five public datasets of ColonDB, CVC-ColonDB, CVC-612, Kvasir, and ETIS. The results demonstrate that the segmentation accuracy of MGF-Net is significantly improved on the datasets. Furthermore, a hypothesis test was conducted to assess the statistical significance of the computed results. CONCLUSIONS Our proposed MGF-Net outperforms existing mainstream baseline networks and presents a promising solution to the pressing need for intelligent polyp detection. The proposed model is available at https://github.com/xiefanghhh/MGF-NET.
Collapse
Affiliation(s)
- Zhiyong Huang
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, China
| | - Fang Xie
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, China
| | - Wencheng Qing
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, China
| | - Mengyao Wang
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, China
| | - Man Liu
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, China
| | - Daming Sun
- Chongqing Engineering Research Center of Medical Electronics and Information, Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
| |
Collapse
|
94
|
Azad R, Kazerouni A, Heidari M, Aghdam EK, Molaei A, Jia Y, Jose A, Roy R, Merhof D. Advances in medical image analysis with vision Transformers: A comprehensive review. Med Image Anal 2024; 91:103000. [PMID: 37883822 DOI: 10.1016/j.media.2023.103000] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 09/30/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
The remarkable performance of the Transformer architecture in natural language processing has recently also triggered broad interest in Computer Vision. Among other merits, Transformers are witnessed as capable of learning long-range dependencies and spatial correlations, which is a clear advantage over convolutional neural networks (CNNs), which have been the de facto standard in Computer Vision problems so far. Thus, Transformers have become an integral part of modern medical image analysis. In this review, we provide an encyclopedic review of the applications of Transformers in medical imaging. Specifically, we present a systematic and thorough review of relevant recent Transformer literature for different medical image analysis tasks, including classification, segmentation, detection, registration, synthesis, and clinical report generation. For each of these applications, we investigate the novelty, strengths and weaknesses of the different proposed strategies and develop taxonomies highlighting key properties and contributions. Further, if applicable, we outline current benchmarks on different datasets. Finally, we summarize key challenges and discuss different future research directions. In addition, we have provided cited papers with their corresponding implementations in https://github.com/mindflow-institue/Awesome-Transformer.
Collapse
Affiliation(s)
- Reza Azad
- Faculty of Electrical Engineering and Information Technology, RWTH Aachen University, Aachen, Germany
| | - Amirhossein Kazerouni
- School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
| | - Moein Heidari
- School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
| | | | - Amirali Molaei
- School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
| | - Yiwei Jia
- Faculty of Electrical Engineering and Information Technology, RWTH Aachen University, Aachen, Germany
| | - Abin Jose
- Faculty of Electrical Engineering and Information Technology, RWTH Aachen University, Aachen, Germany
| | - Rijo Roy
- Faculty of Electrical Engineering and Information Technology, RWTH Aachen University, Aachen, Germany
| | - Dorit Merhof
- Faculty of Informatics and Data Science, University of Regensburg, Regensburg, Germany; Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany.
| |
Collapse
|
95
|
Zhang W, Lu F, Su H, Hu Y. Dual-branch multi-information aggregation network with transformer and convolution for polyp segmentation. Comput Biol Med 2024; 168:107760. [PMID: 38064849 DOI: 10.1016/j.compbiomed.2023.107760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 10/21/2023] [Accepted: 11/21/2023] [Indexed: 01/10/2024]
Abstract
Computer-Aided Diagnosis (CAD) for polyp detection offers one of the most notable showcases. By using deep learning technologies, the accuracy of polyp segmentation is surpassing human experts. In such CAD process, a critical step is concerned with segmenting colorectal polyps from colonoscopy images. Despite remarkable successes attained by recent deep learning related works, much improvement is still anticipated to tackle challenging cases. For instance, the effects of motion blur and light reflection can introduce significant noise into the image. The same type of polyps has a diversity of size, color and texture. To address such challenges, this paper proposes a novel dual-branch multi-information aggregation network (DBMIA-Net) for polyp segmentation, which is able to accurately and reliably segment a variety of colorectal polyps with efficiency. Specifically, a dual-branch encoder with transformer and convolutional neural networks (CNN) is employed to extract polyp features, and two multi-information aggregation modules are applied in the decoder to fuse multi-scale features adaptively. Two multi-information aggregation modules include global information aggregation (GIA) module and edge information aggregation (EIA) module. In addition, to enhance the representation learning capability of the potential channel feature association, this paper also proposes a novel adaptive channel graph convolution (ACGC). To validate the effectiveness and advantages of the proposed network, we compare it with several state-of-the-art (SOTA) methods on five public datasets. Experimental results consistently demonstrate that the proposed DBMIA-Net obtains significantly superior segmentation performance across six popularly used evaluation matrices. Especially, we achieve 94.12% mean Dice on CVC-ClinicDB dataset which is 4.22% improvement compared to the previous state-of-the-art method PraNet. Compared with SOTA algorithms, DBMIA-Net has a better fitting ability and stronger generalization ability.
Collapse
Affiliation(s)
- Wenyu Zhang
- School of Information Science and Engineering, Lanzhou University, China
| | - Fuxiang Lu
- School of Information Science and Engineering, Lanzhou University, China.
| | - Hongjing Su
- School of Information Science and Engineering, Lanzhou University, China
| | - Yawen Hu
- School of Information Science and Engineering, Lanzhou University, China
| |
Collapse
|
96
|
Wang J, Jin Y, Stoyanov D, Wang L. FedDP: Dual Personalization in Federated Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:297-308. [PMID: 37494156 DOI: 10.1109/tmi.2023.3299206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]
Abstract
Personalized federated learning (PFL) addresses the data heterogeneity challenge faced by general federated learning (GFL). Rather than learning a single global model, with PFL a collection of models are adapted to the unique feature distribution of each site. However, current PFL methods rarely consider self-attention networks which can handle data heterogeneity by long-range dependency modeling and they do not utilize prediction inconsistencies in local models as an indicator of site uniqueness. In this paper, we propose FedDP, a novel fed erated learning scheme with d ual p ersonalization, which improves model personalization from both feature and prediction aspects to boost image segmentation results. We leverage long-range dependencies by designing a local query (LQ) that decouples the query embedding layer out of each local model, whose parameters are trained privately to better adapt to the respective feature distribution of the site. We then propose inconsistency-guided calibration (IGC), which exploits the inter-site prediction inconsistencies to accommodate the model learning concentration. By encouraging a model to penalize pixels with larger inconsistencies, we better tailor prediction-level patterns to each local site. Experimentally, we compare FedDP with the state-of-the-art PFL methods on two popular medical image segmentation tasks with different modalities, where our results consistently outperform others on both tasks. Our code and models are available at https://github.com/jcwang123/PFL-Seg-Trans.
Collapse
|
97
|
Wu S, Zhang R, Yan J, Li C, Liu Q, Wang L, Wang H. High-Speed and Accurate Diagnosis of Gastrointestinal Disease: Learning on Endoscopy Images Using Lightweight Transformer with Local Feature Attention. Bioengineering (Basel) 2023; 10:1416. [PMID: 38136007 PMCID: PMC10741161 DOI: 10.3390/bioengineering10121416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 12/04/2023] [Accepted: 12/10/2023] [Indexed: 12/24/2023] Open
Abstract
In response to the pressing need for robust disease diagnosis from gastrointestinal tract (GIT) endoscopic images, we proposed FLATer, a fast, lightweight, and highly accurate transformer-based model. FLATer consists of a residual block, a vision transformer module, and a spatial attention block, which concurrently focuses on local features and global attention. It can leverage the capabilities of both convolutional neural networks (CNNs) and vision transformers (ViT). We decomposed the classification of endoscopic images into two subtasks: a binary classification to discern between normal and pathological images and a further multi-class classification to categorize images into specific diseases, namely ulcerative colitis, polyps, and esophagitis. FLATer has exhibited exceptional prowess in these tasks, achieving 96.4% accuracy in binary classification and 99.7% accuracy in ternary classification, surpassing most existing models. Notably, FLATer could maintain impressive performance when trained from scratch, underscoring its robustness. In addition to the high precision, FLATer boasted remarkable efficiency, reaching a notable throughput of 16.4k images per second, which positions FLATer as a compelling candidate for rapid disease identification in clinical practice.
Collapse
Affiliation(s)
- Shibin Wu
- Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China; (S.W.); (R.Z.); (J.Y.)
| | - Ruxin Zhang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China; (S.W.); (R.Z.); (J.Y.)
| | - Jiayi Yan
- Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China; (S.W.); (R.Z.); (J.Y.)
| | - Chengquan Li
- School of Clinical Medicine, Tsinghua University, Beijing 100084, China;
| | - Qicai Liu
- Vanke School of Public Health, Tsinghua University, Beijing 100084, China;
| | - Liyang Wang
- School of Clinical Medicine, Tsinghua University, Beijing 100084, China;
| | - Haoqian Wang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China; (S.W.); (R.Z.); (J.Y.)
| |
Collapse
|
98
|
Jain S, Atale R, Gupta A, Mishra U, Seal A, Ojha A, Jaworek-Korjakowska J, Krejcar O. CoInNet: A Convolution-Involution Network With a Novel Statistical Attention for Automatic Polyp Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3987-4000. [PMID: 37768798 DOI: 10.1109/tmi.2023.3320151] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/30/2023]
Abstract
Polyps are very common abnormalities in human gastrointestinal regions. Their early diagnosis may help in reducing the risk of colorectal cancer. Vision-based computer-aided diagnostic systems automatically identify polyp regions to assist surgeons in their removal. Due to their varying shape, color, size, texture, and unclear boundaries, polyp segmentation in images is a challenging problem. Existing deep learning segmentation models mostly rely on convolutional neural networks that have certain limitations in learning the diversity in visual patterns at different spatial locations. Further, they fail to capture inter-feature dependencies. Vision transformer models have also been deployed for polyp segmentation due to their powerful global feature extraction capabilities. But they too are supplemented by convolution layers for learning contextual local information. In the present paper, a polyp segmentation model CoInNet is proposed with a novel feature extraction mechanism that leverages the strengths of convolution and involution operations and learns to highlight polyp regions in images by considering the relationship between different feature maps through a statistical feature attention unit. To further aid the network in learning polyp boundaries, an anomaly boundary approximation module is introduced that uses recursively fed feature fusion to refine segmentation results. It is indeed remarkable that even tiny-sized polyps with only 0.01% of an image area can be precisely segmented by CoInNet. It is crucial for clinical applications, as small polyps can be easily overlooked even in the manual examination due to the voluminous size of wireless capsule endoscopy videos. CoInNet outperforms thirteen state-of-the-art methods on five benchmark polyp segmentation datasets.
Collapse
|
99
|
Shi Y, Wang H, Ji H, Liu H, Li Y, He N, Wei D, Huang Y, Dai Q, Wu J, Chen X, Zheng Y, Yu H. A deep weakly semi-supervised framework for endoscopic lesion segmentation. Med Image Anal 2023; 90:102973. [PMID: 37757643 DOI: 10.1016/j.media.2023.102973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 07/19/2023] [Accepted: 09/14/2023] [Indexed: 09/29/2023]
Abstract
In the field of medical image analysis, accurate lesion segmentation is beneficial for the subsequent clinical diagnosis and treatment planning. Currently, various deep learning-based methods have been proposed to deal with the segmentation task. Albeit achieving some promising performances, the fully-supervised learning approaches require pixel-level annotations for model training, which is tedious and time-consuming for experienced radiologists to collect. In this paper, we propose a weakly semi-supervised segmentation framework, called Point Segmentation Transformer (Point SEGTR). Particularly, the framework utilizes a small amount of fully-supervised data with pixel-level segmentation masks and a large amount of weakly-supervised data with point-level annotations (i.e., annotating a point inside each object) for network training, which largely reduces the demand of pixel-level annotations significantly. To fully exploit the pixel-level and point-level annotations, we propose two regularization terms, i.e., multi-point consistency and symmetric consistency, to boost the quality of pseudo labels, which are then adopted to train a student model for inference. Extensive experiments are conducted on three endoscopy datasets with different lesion structures and several body sites (e.g., colorectal and nasopharynx). Comprehensive experimental results finely substantiate the effectiveness and the generality of our proposed method, as well as its potential to loosen the requirements of pixel-level annotations, which is valuable for clinical applications.
Collapse
Affiliation(s)
- Yuxuan Shi
- ENT Institute and Department of Otolaryngology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
| | - Hong Wang
- Tencent Jarvis Lab, Shenzhen 518000, China.
| | - Haoqin Ji
- Tencent Jarvis Lab, Shenzhen 518000, China
| | - Haozhe Liu
- Tencent Jarvis Lab, Shenzhen 518000, China
| | | | - Nanjun He
- Tencent Jarvis Lab, Shenzhen 518000, China
| | - Dong Wei
- Tencent Jarvis Lab, Shenzhen 518000, China
| | | | - Qi Dai
- ENT Institute and Department of Otolaryngology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
| | - Jianrong Wu
- Tencent Healthcare (Shenzhen) Co. LTD., Shenzhen 518063, China
| | - Xinrong Chen
- Academy for Engineering and Technology, Fudan University, 220 Handan Road, Shanghai 200033, China.
| | | | - Hongmeng Yu
- ENT Institute and Department of Otolaryngology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China; Research Units of New Technologies of Endoscopic Surgery in Skull Base Tumor, Chinese Academy of Medical Sciences, 2018RU003, China.
| |
Collapse
|
100
|
Song P, Li J, Fan H, Fan L. TGDAUNet: Transformer and GCNN based dual-branch attention UNet for medical image segmentation. Comput Biol Med 2023; 167:107583. [PMID: 37890420 DOI: 10.1016/j.compbiomed.2023.107583] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 09/28/2023] [Accepted: 10/15/2023] [Indexed: 10/29/2023]
Abstract
Accurate and automatic segmentation of medical images is a key step in clinical diagnosis and analysis. Currently, the successful application of Transformers' model in the field of computer vision, researchers have begun to gradually explore the application of Transformers in medical segmentation of images, especially in combination with convolutional neural networks with coding-decoding structure, which have achieved remarkable results in the field of medical segmentation. However, most studies have combined Transformers with CNNs at a single scale or processed only the highest-level semantic feature information, ignoring the rich location information in the lower-level semantic feature information. At the same time, for problems such as blurred structural boundaries and heterogeneous textures in images, most existing methods usually simply connect contour information to capture the boundaries of the target. However, these methods cannot capture the precise outline of the target and ignore the potential relationship between the boundary and the region. In this paper, we propose the TGDAUNet, which consists of a dual-branch backbone network of CNNs and Transformers and a parallel attention mechanism, to achieve accurate segmentation of lesions in medical images. Firstly, high-level semantic feature information of the CNN backbone branches is fused at multiple scales, and the high-level and low-level feature information complement each other's location and spatial information. We further use the polarised self-attentive (PSA) module to reduce the impact of redundant information caused by multiple scales, to better couple with the feature information extracted from the Transformers backbone branch, and to establish global contextual long-range dependencies at multiple scales. In addition, we have designed the Reverse Graph-reasoned Fusion (RGF) module and the Feature Aggregation (FA) module to jointly guide the global context. The FA module aggregates high-level semantic feature information to generate an original global predictive segmentation map. The RGF module captures non-significant features of the boundaries in the original or secondary global prediction segmentation graph through a reverse attention mechanism, establishing a graph reasoning module to explore the potential semantic relationships between boundaries and regions, further refining the target boundaries. Finally, to validate the effectiveness of our proposed method, we compare our proposed method with the current popular methods in the CVC-ClinicDB, Kvasir-SEG, ETIS, CVC-ColonDB, CVC-300,datasets as well as the skin cancer segmentation datasets ISIC-2016 and ISIC-2017. The large number of experimental results show that our method outperforms the currently popular methods. Source code is released at https://github.com/sd-spf/TGDAUNet.
Collapse
Affiliation(s)
- Pengfei Song
- Co-Innovation Center of Shandong Colleges and Universities: Future Intelligent Computing, School of Computer Science and Technology, Shandong Technology and Business University, Laishan District, Yantai, 264005, China
| | - Jinjiang Li
- Co-Innovation Center of Shandong Colleges and Universities: Future Intelligent Computing, School of Computer Science and Technology, Shandong Technology and Business University, Laishan District, Yantai, 264005, China
| | - Hui Fan
- Co-Innovation Center of Shandong Colleges and Universities: Future Intelligent Computing, School of Computer Science and Technology, Shandong Technology and Business University, Laishan District, Yantai, 264005, China.
| | - Linwei Fan
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, Shandong, 250014, China
| |
Collapse
|