1
|
He A, Wu Y, Wang Z, Li T, Fu H. DVPT: Dynamic Visual Prompt Tuning of large pre-trained models for medical image analysis. Neural Netw 2025; 185:107168. [PMID: 39827840 DOI: 10.1016/j.neunet.2025.107168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 11/07/2024] [Accepted: 01/12/2025] [Indexed: 01/22/2025]
Abstract
Pre-training and fine-tuning have become popular due to the rich representations embedded in large pre-trained models, which can be leveraged for downstream medical tasks. However, existing methods typically either fine-tune all parameters or only task-specific layers of pre-trained models, overlooking the variability in input medical images. As a result, these approaches may lack efficiency or effectiveness. In this study, our goal is to explore parameter-efficient fine-tuning (PEFT) for medical image analysis. To address this challenge, we introduce a novel method called Dynamic Visual Prompt Tuning (DVPT). It can extract knowledge beneficial to downstream tasks from large models with only a few trainable parameters. First, the frozen features are transformed by a lightweight bottleneck layer to learn the domain-specific distribution of downstream medical tasks. Then, a few learnable visual prompts are employed as dynamic queries to conduct cross-attention with the transformed features, aiming to acquire sample-specific features. This DVPT module can be shared across different Transformer layers, further reducing the number of trainable parameters. We conduct extensive experiments with various pre-trained models on medical classification and segmentation tasks. We find that this PEFT method not only efficiently adapts pre-trained models to the medical domain but also enhances data efficiency with limited labeled data. For example, with only 0.5% additional trainable parameters, our method not only outperforms state-of-the-art PEFT methods but also surpasses full fine-tuning by more than 2.20% in Kappa score on the medical classification task. It can save up to 60% of labeled data and 99% of storage cost of ViT-B/16.
Collapse
Affiliation(s)
- Along He
- College of Computer Science, Tianjin Key Laboratory of Network and Data Security Technology, Nankai University, Tianjin, 300350, China
| | - Yanlin Wu
- College of Computer Science, Tianjin Key Laboratory of Network and Data Security Technology, Nankai University, Tianjin, 300350, China
| | - Zhihong Wang
- College of Computer Science, Tianjin Key Laboratory of Network and Data Security Technology, Nankai University, Tianjin, 300350, China
| | - Tao Li
- College of Computer Science, Tianjin Key Laboratory of Network and Data Security Technology, Nankai University, Tianjin, 300350, China.
| | - Huazhu Fu
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 138632, Singapore
| |
Collapse
|
2
|
Kuang H, Wang Y, Tan X, Yang J, Sun J, Liu J, Qiu W, Zhang J, Zhang J, Yang C, Wang J, Chen Y. LW-CTrans: A lightweight hybrid network of CNN and Transformer for 3D medical image segmentation. Med Image Anal 2025; 102:103545. [PMID: 40107117 DOI: 10.1016/j.media.2025.103545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2024] [Revised: 02/22/2025] [Accepted: 03/07/2025] [Indexed: 03/22/2025]
Abstract
Recent models based on convolutional neural network (CNN) and Transformer have achieved the promising performance for 3D medical image segmentation. However, these methods cannot segment small targets well even when equipping large parameters. Therefore, We design a novel lightweight hybrid network that combines the strengths of CNN and Transformers (LW-CTrans) and can boost the global and local representation capability at different stages. Specifically, we first design a dynamic stem that can accommodate images of various resolutions. In the first stage of the hybrid encoder, to capture local features with fewer parameters, we propose a multi-path convolution (MPConv) block. In the middle stages of the hybrid encoder, to learn global and local features meantime, we propose a multi-view pooling based Transformer (MVPFormer) which projects the 3D feature map onto three 2D subspaces to deal with small objects, and use the MPConv block for enhancing local representation learning. In the final stage, to mostly capture global features, only the proposed MVPFormer is used. Finally, to reduce the parameters of the decoder, we propose a multi-stage feature fusion module. Extensive experiments on 3 public datasets for three tasks: stroke lesion segmentation, pancreas cancer segmentation and brain tumor segmentation, show that the proposed LW-CTrans achieves Dices of 62.35±19.51%, 64.69±20.58% and 83.75±15.77% on the 3 datasets, respectively, outperforming 16 state-of-the-art methods, and the numbers of parameters (2.08M, 2.14M and 2.21M on 3 datasets, respectively) are smaller than the non-lightweight 3D methods and close to the lightweight methods. Besides, LW-CTrans also achieves the best performance for small lesion segmentation.
Collapse
Affiliation(s)
- Hulin Kuang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410000, China
| | - Yahui Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410000, China
| | - Xianzhen Tan
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410000, China
| | - Jialin Yang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410000, China
| | - Jiarui Sun
- School of Computer Science and Engineering, Southeast University, Nanjing 210096, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, Nanjing 210096, China
| | - Jin Liu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410000, China
| | - Wu Qiu
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430000, China
| | - Jingyang Zhang
- School of Computer Science and Engineering, Southeast University, Nanjing 210096, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, Nanjing 210096, China
| | - Jiulou Zhang
- Department of Radiology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210096, China; Lab for Artificial Intelligence in Medical Imaging (LAIMI), School of Medical Imaging, Nanjing Medical University, Nanjing, 210096, China
| | - Chunfeng Yang
- School of Computer Science and Engineering, Southeast University, Nanjing 210096, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, Nanjing 210096, China.
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410000, China; Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi, 830091, Xinjiang, China.
| | - Yang Chen
- School of Computer Science and Engineering, Southeast University, Nanjing 210096, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, Nanjing 210096, China
| |
Collapse
|
3
|
Bao R, Weiss RJ, Bates SV, Song Y, He S, Li J, Bjornerud A, Hirschtick RL, Grant PE, Ou Y. PARADISE: Personalized and regional adaptation for HIE disease identification and segmentation. Med Image Anal 2025; 102:103419. [PMID: 40147073 DOI: 10.1016/j.media.2024.103419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 09/16/2024] [Accepted: 11/28/2024] [Indexed: 03/29/2025]
Abstract
Hypoxic ischemic encephalopathy (HIE) is a brain dysfunction occurring in approximately 1-5/1000 term-born neonates. Accurate segmentation of HIE lesions in brain MRI is crucial for prognosis and diagnosis but presents a unique challenge due to the diffuse and small nature of these abnormalities, which resulted in a substantial gap between the performance of machine learning-based segmentation methods and clinical expert annotations for HIE. To address this challenge, we introduce ParadiseNet, an algorithm specifically designed for HIE lesion segmentation. ParadiseNet incorporates global-local learning, progressive uncertainty learning, and self-evolution learning modules, all inspired by clinical interpretation of neonatal brain MRIs. These modules target issues such as unbalanced data distribution, boundary uncertainty, and imprecise lesion detection, respectively. Extensive experiments demonstrate that ParadiseNet significantly enhances small lesion detection (<1%) accuracy in HIE, achieving an over 4% improvement in Dice, 6% improvement in NSD compared to U-Net and other general medical image segmentation algorithms.
Collapse
Affiliation(s)
- Rina Bao
- Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA.
| | | | | | | | - Sheng He
- Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Jingpeng Li
- Boston Children's Hospital, Boston, MA, USA; Oslo University Hospital; University of Oslo, Norway
| | | | - Randy L Hirschtick
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, USA; Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - P Ellen Grant
- Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Yangming Ou
- Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
4
|
Yang Z, Woodward MA, Niziol LM, Pawar M, Prajna NV, Krishnamoorthy A, Wang Y, Lu MC, Selvaraj S, Farsiu S. Self-knowledge distillation-empowered directional connectivity transformer for microbial keratitis biomarkers segmentation on slit-lamp photography. Med Image Anal 2025; 102:103533. [PMID: 40117989 PMCID: PMC12004389 DOI: 10.1016/j.media.2025.103533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 12/28/2024] [Accepted: 02/25/2025] [Indexed: 03/23/2025]
Abstract
The lack of standardized, objective tools for measuring biomarker morphology poses a significant obstacle to managing Microbial Keratitis (MK). Previous studies have demonstrated that robust segmentation benefits MK diagnosis, management, and estimation of visual outcomes. However, despite exciting advances, current methods cannot accurately detect biomarker boundaries and differentiate the overlapped regions in challenging cases. In this work, we propose a novel self-knowledge distillation-empowered directional connectivity transformer, called SDCTrans. We utilize the directional connectivity modeling framework to improve biomarker boundary detection. The transformer backbone and the hierarchical self-knowledge distillation scheme in this framework enhance directional representation learning. We also propose an efficient segmentation head design to effectively segment overlapping regions. This is the first work that successfully incorporates directional connectivity modeling with a transformer. SDCTrans trained and tested with a new large-scale MK dataset accurately and robustly segments crucial biomarkers in three types of slit lamp biomicroscopy images. Through comprehensive experiments, we demonstrated the superiority of the proposed SDCTrans over current state-of-the-art models. We also show that our SDCTrans matches, if not outperforms, the performance of expert human graders in MK biomarker identification and visual acuity outcome estimation. Experiments on skin lesion images are also included as an illustrative example of SDCTrans' utility in other segmentation tasks. The new MK dataset and codes are available at https://github.com/Zyun-Y/SDCTrans.
Collapse
Affiliation(s)
- Ziyun Yang
- Duke University, Department of Biomedical Engineering, Durham, 27705, NC, USA.
| | - Maria A Woodward
- University of Michigan, Department of Ophthalmology and Visual Sciences, Ann Arbor, 48105, MI, USA
| | - Leslie M Niziol
- University of Michigan, Department of Ophthalmology and Visual Sciences, Ann Arbor, 48105, MI, USA
| | - Mercy Pawar
- University of Michigan, Department of Ophthalmology and Visual Sciences, Ann Arbor, 48105, MI, USA
| | | | | | - Yiqing Wang
- Duke University, Department of Biomedical Engineering, Durham, 27705, NC, USA
| | - Ming-Chen Lu
- University of Michigan, Department of Ophthalmology and Visual Sciences, Ann Arbor, 48105, MI, USA
| | | | - Sina Farsiu
- Duke University, Department of Biomedical Engineering, Durham, 27705, NC, USA.
| |
Collapse
|
5
|
Zhang X, Zhu Q, Hu T, Guo S, Bian G, Dong W, Hong R, Lin XL, Wu P, Zhou M, Yan Q, Mohi-Ud-Din G, Ai C, Li Z. Joint high-resolution feature learning and vessel-shape aware convolutions for efficient vessel segmentation. Comput Biol Med 2025; 191:109982. [PMID: 40253922 DOI: 10.1016/j.compbiomed.2025.109982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 02/28/2025] [Accepted: 03/03/2025] [Indexed: 04/22/2025]
Abstract
Clear imagery of retinal vessels is one of the critical shreds of evidence in specific disease diagnosis and evaluation, including sophisticated hierarchical topology and plentiful-and-intensive capillaries. In this work, we propose a new topology- and shape-aware model named Multi-branch Vessel-shaped Convolution Network (MVCN) to adaptively learn high-resolution representations from retinal vessel imagery and thereby capture high-quality topology and shape information thereon. Two steps are involved in our pipeline. The former step is proposed as Multiple High-resolution Ensemble Module (MHEM) to enhance high-resolution characteristics of retinal vessel imagery via fusing scale-invariant hierarchical topology thereof. The latter is a novel vessel-shaped convolution that captures the retinal vessel topology to emerge from unrelated fundus structures. Moreover, our MVCN of separating such topology from the fundus is a dynamical multiple sub-label generation via using epistemic uncertainty, instead of manually separating raw labels to distinguish definitive and uncertain vessels. Compared to other existing methods, our method achieves the most advanced AUC values of 98.31%, 98.80%, 98.83%, and 98.65%, and the most advanced ACC of 95.83%, 96.82%, 97.09%,and 96.66% in DRIVE, CHASE_DB1, STARE, and HRF datasets. We also employ correctness, completeness, and quality metrics to evaluate skeletal similarity. Our method's evaluation metrics have doubled compared to previous methods, thereby demonstrating the effectiveness thereof.
Collapse
Affiliation(s)
- Xiang Zhang
- College of Information and Control Engineering, Xi'an University of Architecture and Technology, Xi'an, China
| | - Qiang Zhu
- College of Information and Control Engineering, Xi'an University of Architecture and Technology, Xi'an, China
| | - Tao Hu
- Northwestern Polytechnical University, China
| | - Song Guo
- College of Information and Control Engineering, Xi'an University of Architecture and Technology, Xi'an, China
| | - Genqing Bian
- College of Information and Control Engineering, Xi'an University of Architecture and Technology, Xi'an, China
| | - Wei Dong
- College of Information and Control Engineering, Xi'an University of Architecture and Technology, Xi'an, China.
| | - Rao Hong
- School of Software, Nanchang University, Nanchang, China
| | - Xia Ling Lin
- School of Software, Nanchang University, Nanchang, China
| | - Peng Wu
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an, China
| | - Meili Zhou
- Shaanxi Provincial Key Lab of Bigdata of Energy and Intelligence Processing, School of Physics and Electronic Information, Yanan University, Yanan, China.
| | - Qingsen Yan
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an, China.
| | | | - Chen Ai
- School of Software, Nanchang University, Nanchang, China
| | - Zhou Li
- Department of Basic Education and Research, Jiangxi Police College, Nanchang, China
| |
Collapse
|
6
|
Zhang R, Jiang G. Exploring a multi-path U-net with probability distribution attention and cascade dilated convolution for precise retinal vessel segmentation in fundus images. Sci Rep 2025; 15:13428. [PMID: 40251298 PMCID: PMC12008375 DOI: 10.1038/s41598-025-98021-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Accepted: 04/08/2025] [Indexed: 04/20/2025] Open
Abstract
While deep learning has become the go-to method for image denoising due to its impressive noise removal Retinal blood vessel segmentation presents several challenges, including limited labeled image data, complex multi-scale vessel structures, and susceptibility to interference from lesion areas. To confront these challenges, this work offers a novel technique that integrates attention mechanisms and a cascaded dilated convolution module (CDCM) within a multi-path U-Net architecture. First, a dual-path U-Net is developed to extract both coarse and fine-grained vessel structures through separate texture and structural branches. A CDCM is integrated to gather multi-scale vessel features, enhancing the model's ability to extract deep semantic features. Second, a boosting algorithm that incorporates probability distribution attention (PDA) within the upscaling blocks is employed. This approach adjusts the probability distribution, increasing the contribution of shallow information, thereby enhancing segmentation performance in complex backgrounds and reducing the risk of overfitting. Finally, the output from the dual-path U-Net is processed through a feature refinement module. This step further refines the vessel segmentation by integrating and extracting relevant features. Results from experiments on three benchmark datasets, including CHASEDB1, DRIVE, and STARE, demonstrate that the proposed method delivers improved segmentation accuracy compared to existing techniques.
Collapse
Affiliation(s)
- Ruihong Zhang
- School of Computer, Huanggang Normal University, Huanggang, Hubei, 438000, China
| | - Guosong Jiang
- School of Computer, Huanggang Normal University, Huanggang, Hubei, 438000, China.
| |
Collapse
|
7
|
Kong L, Wei Q, Xu C, Ye X, Liu W, Wang M, Fu Y, Chen H. EFCNet enhances the efficiency of segmenting clinically significant small medical objects. Sci Rep 2025; 15:12813. [PMID: 40229279 PMCID: PMC11997218 DOI: 10.1038/s41598-025-93171-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Accepted: 03/05/2025] [Indexed: 04/16/2025] Open
Abstract
Efficient segmentation of small hyperreflective dots, key biomarkers for diseases like macular edema, is critical for diagnosis and treatment monitoring.However, existing models, including Convolutional Neural Networks (CNNs) and Transformers, struggle with these minute structures due to information loss.To address this, we introduce EFCNet, which integrates the Cross-Stage Axial Attention (CSAA) module for enhanced feature fusion and the Multi-Precision Supervision (MPS) module for improved hierarchical guidance. We evaluated EFCNet on two datasets: S-HRD, comprising 313 retinal OCT scans from patients with macular edema, and S-Polyp, a 229-image subset of the publicly available CVC-ClinicDB colonoscopy dataset. EFCNet outperformed state-of-the-art models, achieving average Dice Similarity Coefficient (DSC) gains of 4.88% on S-HRD and 3.49% on S-Polyp, alongside Intersection over Union (IoU) improvements of 3.77% and 3.25%, respectively. Notably, smaller objects benefit most, highlighting EFCNet's effectiveness where conventional models underperform. Unlike U-Net-Large, which offers marginal gains with increased scale, EFCNet's superior performance is driven by its novel design. These findings demonstrate its effectiveness and potential utility in clinical practice.
Collapse
Affiliation(s)
- Lingjie Kong
- School of Data Science, Fudan University, Shanghai, 200433, China
| | - Qiaoling Wei
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
| | - Chengming Xu
- School of Data Science, Fudan University, Shanghai, 200433, China
| | - Xiaofeng Ye
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
- Diagnosis and Treatment Center of Macular Disease, Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
| | - Wei Liu
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
- Diagnosis and Treatment Center of Macular Disease, Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
| | - Min Wang
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
- Diagnosis and Treatment Center of Macular Disease, Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
| | - Yanwei Fu
- School of Data Science, Fudan University, Shanghai, 200433, China.
| | - Han Chen
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China.
| |
Collapse
|
8
|
Fakhfakh M, Sarry L, Clarysse P. HALSR-Net: Improving CNN Segmentation of Cardiac Left Ventricle MRI with Hybrid Attention and Latent Space Reconstruction. Comput Med Imaging Graph 2025; 123:102546. [PMID: 40245744 DOI: 10.1016/j.compmedimag.2025.102546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2024] [Revised: 02/17/2025] [Accepted: 03/30/2025] [Indexed: 04/19/2025]
Abstract
Accurate cardiac MRI segmentation is vital for detailed cardiac analysis, yet the manual process is labor-intensive and prone to variability. Despite advancements in MRI technology, there remains a significant need for automated methods that can reliably and efficiently segment cardiac structures. This paper introduces HALSR-Net, a novel multi-level segmentation architecture designed to improve the accuracy and reproducibility of cardiac segmentation from Cine-MRI acquisitions, focusing on the left ventricle (LV). The methodology consists of two main phases: first, the extraction of the region of interest (ROI) using a regression model that accurately predicts the location of a bounding box around the LV; second, the semantic segmentation step based on HALSR-Net architecture. This architecture incorporates a Hybrid Attention Pooling Module (HAPM) that merges attention and pooling mechanisms to enhance feature extraction and capture contextual information. Additionally, a reconstruction module leverages latent space features to further improve segmentation accuracy. Experiments conducted on an in-house clinical dataset and two public datasets (ACDC and LVQuan19) demonstrate that HALSR-Net outperforms state-of-the-art architectures, achieving up to 98% accuracy and F1-score for the segmentation of the LV cavity and myocardium. The proposed approach effectively addresses the limitations of existing methods, offering a more accurate and robust solution for cardiac MRI segmentation, thereby likely to improve cardiac function analysis and patient care.
Collapse
Affiliation(s)
- Mohamed Fakhfakh
- Université Clermont Auvergne, CHU Clermont-Ferrand, Clermont Auvergne INP, CNRS, Institut Pascal, F-63000, Clermont-Ferrand, France.
| | - Laurent Sarry
- Université Clermont Auvergne, CHU Clermont-Ferrand, Clermont Auvergne INP, CNRS, Institut Pascal, F-63000, Clermont-Ferrand, France.
| | - Patrick Clarysse
- INSA-Lyon, Université Claude Bernard Lyon 1, CNRS, Inserm, CREATIS UMR 5220, U1294, F-69621, Lyon, France.
| |
Collapse
|
9
|
Li F, Wei H, Sheng X, Chen Y, Zou H, Huang S. Global-Local Transformer Network for Automatic Retinal Pathological Fluid Segmentation in Optical Coherence Tomography Images. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 266:108772. [PMID: 40228373 DOI: 10.1016/j.cmpb.2025.108772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 03/20/2025] [Accepted: 04/09/2025] [Indexed: 04/16/2025]
Abstract
BACKGROUND AND OBJECTIVE As a pivotal biomarker, the accurate segmentation of retinal pathological fluid such as intraretinal fluid (IRF), subretinal fluid (SRF), and pigment epithelial detachment (PED), was a critical task for diagnosis and treatment management in various retinopathy. However, segmenting pathological fluids from optical coherence tomography (OCT) images still faced several challenges, including large variations in location, size and shape, low intensity contrast between fluids and peripheral tissues, speckle noise interference, and high similarity between fluid and background. Further, owing to the intrinsic local nature of convolution operations, most automatic retinal fluid segmentation approaches built upon deep convolutional neural network had limited capacity in capturing pathological features with global dependencies, prone to deviations. Accordingly, it was of great significance to develop automatic methods for accurate segmentation and quantitative analysis on multi-type retinal fluids in OCT images. METHODS In this paper, we developed a novelty global-local Transformer network (GLTNet) based on U-shape architecture for simultaneously segmenting multiple types of pathological fluids from retinal OCT images. In our GLTNet, we designed a global-local attention module (GLAM) and aggregated it into the VGG-19 backbone to learn more pathological fluid related discriminative feature representations and suppress irrelevant noise information in OCT images. At the same time, we constructed multi-scale Transformer module (MSTM) on top of the encoder pathway to explore various scales of non-local characteristics with long-term dependency information from multiple layers of encoder part. By integrating both blocks for serving as a strong encoder of U-Net, our network improved the model's ability to capture finer details, thereby enabling precise segmentation of multi-type retinal fluids within OCT images. RESULTS We evaluated the segmentation performance of the presented GLTNet on Kermany, DUKE and UMN datasets. Comprehensive experimental results on Kermany dataset showed that our model achieved overall 0.8395, 0.7657, 0.8631, and 0.8202, on the Dice coefficient, IoU, Sensitivity and precision, respectively, which remarkably outperformed other state-of-the-art retinal fluid segmentation approaches. The experimental results on DUKE and UMN datasets suggested our model had satisfactory generalizability. CONCLUSIONS By comparison with current cutting-edge methods, the developed GLTNet gained a significantly boost in retinal fluid segmentation performance, manifested good generalization and robustness, which had a great potential of assisting ophthalmologists in diagnosing diversity of eye disorders and developing as-needed therapy regiments.
Collapse
Affiliation(s)
- Feng Li
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China.
| | - Hao Wei
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Xinyu Sheng
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Yuyang Chen
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Haidong Zou
- Shanghai Eye Disease Prevention & Treatment Center, Shanghai 200040, China; Ophthalmology Center, Shanghai General Hospital, Shanghai 200062, China
| | - Song Huang
- Department of Radiology, Seventh People's Hospital of Shanghai University of TCM, Shanghai 200137, China
| |
Collapse
|
10
|
Li G, Li K, Zhang G, Pan K, Ding Y, Wang Z, Fu C, Zhu Z. A landslide area segmentation method based on an improved UNet. Sci Rep 2025; 15:11852. [PMID: 40195381 PMCID: PMC11976986 DOI: 10.1038/s41598-025-94039-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Accepted: 03/11/2025] [Indexed: 04/09/2025] Open
Abstract
As remote sensing technology matures, landslide target segmentation has become increasingly important in disaster prevention, control, and urban construction, playing a crucial role in disaster loss assessment and post-disaster rescue. Therefore, this paper proposes an improved UNet-based landslide segmentation algorithm. Firstly, the feature extraction structure of the model was redesigned by integrating dilated convolution and EMA attention mechanism to enhance the model's ability to extract image features. Additionally, this study introduces the Pag module to replace the original skip connection method, thereby enhancing information fusion between feature maps, reducing pixel information loss, and further improving the model's overall performance. Experimental results show that compared to the original model, our model improves mIoU, Precision, Recall, and F1-score by approximately 2.4%, 2.4%, 3.2%, and 2.8%, respectively. This study not only provides an effective method for landslide segmentation tasks but also offers new perspectives for further research in related fields.
Collapse
Affiliation(s)
- Guangchen Li
- Shandong Jiaotong University, Haitang Road 5001, Jinan, 250357, China
| | - Kefeng Li
- Shandong Jiaotong University, Haitang Road 5001, Jinan, 250357, China
| | - Guangyuan Zhang
- Shandong Jiaotong University, Haitang Road 5001, Jinan, 250357, China.
| | - Ke Pan
- Shandong Jiaotong University, Haitang Road 5001, Jinan, 250357, China
| | - Yuxuan Ding
- Shandong Jiaotong University, Haitang Road 5001, Jinan, 250357, China
| | - Zhenfei Wang
- Shandong Zhengyuan Yeda Environmental Technology Co., Ltd, Jinan, 250101, China
| | - Chen Fu
- Shandong Jiaotong University, Haitang Road 5001, Jinan, 250357, China
| | - Zhenfang Zhu
- Shandong Jiaotong University, Haitang Road 5001, Jinan, 250357, China
| |
Collapse
|
11
|
Zhang H, Yang B, Li S, Zhang X, Li X, Liu T, Higashita R, Liu J. Retinal OCT image segmentation with deep learning: A review of advances, datasets, and evaluation metrics. Comput Med Imaging Graph 2025; 123:102539. [PMID: 40203494 DOI: 10.1016/j.compmedimag.2025.102539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 03/07/2025] [Accepted: 03/22/2025] [Indexed: 04/11/2025]
Abstract
Optical coherence tomography (OCT) is a widely used imaging technology in ophthalmic clinical practice, providing non-invasive access to high-resolution retinal images. Segmentation of anatomical structures and pathological lesions in retinal OCT images, directly impacts clinical decisions. While commercial OCT devices segment multiple retinal layers in healthy eyes, their performance degrades severely under pathological conditions. In recent years, the rapid advancements in deep learning have significantly driven research in OCT image segmentation. This review provides a comprehensive overview of the latest developments in deep learning-based segmentation methods for retinal OCT images. Additionally, it summarizes the medical significance, publicly available datasets, and commonly used evaluation metrics in this field. The review also discusses the current challenges faced by the research community and highlights potential future directions.
Collapse
Affiliation(s)
- Huihong Zhang
- Harbin Institute of Technology, No. 92 West Dazhi Street, Nangang District, Harbin, 150001, Heilongjiang, China; Department of Computer Science and Engineering, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, 518055, Guangdong, China
| | - Bing Yang
- Department of Computer Science and Engineering, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, 518055, Guangdong, China
| | - Sanqian Li
- Department of Computer Science and Engineering, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, 518055, Guangdong, China
| | - Xiaoqing Zhang
- Department of Computer Science and Engineering, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, 518055, Guangdong, China
| | - Xiaoling Li
- Department of Computer Science and Engineering, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, 518055, Guangdong, China
| | - Tianhang Liu
- Department of Computer Science and Engineering, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, 518055, Guangdong, China
| | - Risa Higashita
- Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, 518055, Guangdong, China
| | - Jiang Liu
- Department of Computer Science and Engineering, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, 518055, Guangdong, China; Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, 518055, Guangdong, China; University of Nottingham Ningbo China, 199 Taikang East Road, 315100, Ningbo, China.
| |
Collapse
|
12
|
Zhang H, Lian J, Ma Y. FET-UNet: Merging CNN and transformer architectures for superior breast ultrasound image segmentation. Phys Med 2025; 133:104969. [PMID: 40184647 DOI: 10.1016/j.ejmp.2025.104969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 03/14/2025] [Accepted: 03/25/2025] [Indexed: 04/07/2025] Open
Abstract
PURPOSE Breast cancer remains a significant cause of mortality among women globally, highlighting the critical need for accurate diagnosis. Although Convolutional Neural Networks (CNNs) have shown effectiveness in segmenting breast ultrasound images, they often face challenges in capturing long-range dependencies, particularly for lesions with similar intensity distributions, irregular shapes, and blurred boundaries. To overcome these limitations, we introduce FET-UNet, a novel hybrid framework that integrates CNNs and Swin Transformers within a UNet-like architecture. METHODS FET-UNet features parallel branches for feature extraction: one utilizes ResNet34 blocks, and the other employs Swin Transformer blocks. These branches are fused using an advanced feature aggregation module (AFAM), enabling the network to effectively combine local details and global context. Additionally, we include a multi-scale upsampling mechanism in the decoder to ensure precise segmentation outputs. This design enhances the capture of both local details and long-range dependencies. RESULTS Extensive evaluations on the BUSI, UDIAT, and BLUI datasets demonstrate the superior performance of FET-UNet compared to state-of-the-art methods. The model achieves Dice coefficients of 82.9% on BUSI, 88.9% on UDIAT, and 90.1% on BLUI. CONCLUSION FET-UNet shows great potential to advance breast ultrasound image segmentation and support more precise clinical diagnoses. Further research could explore the application of this framework to other medical imaging modalities and its integration into clinical workflows.
Collapse
Affiliation(s)
- Huaikun Zhang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China
| | - Jing Lian
- School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou, Gansu, China
| | - Yide Ma
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| |
Collapse
|
13
|
Zhong J, Tian W, Xie Y, Liu Z, Ou J, Tian T, Zhang L. PMFSNet: Polarized multi-scale feature self-attention network for lightweight medical image segmentation. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 261:108611. [PMID: 39892086 DOI: 10.1016/j.cmpb.2025.108611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 01/05/2025] [Accepted: 01/19/2025] [Indexed: 02/03/2025]
Abstract
BACKGROUND AND OBJECTIVES Current state-of-the-art medical image segmentation methods prioritize precision but often at the expense of increased computational demands and larger model sizes. Applying these large-scale models to the relatively limited scale of medical image datasets tends to induce redundant computation, complicating the process without the necessary benefits. These approaches increase complexity and pose challenges for integrating and deploying lightweight models on edge devices. For instance, recent transformer-based models have excelled in 2D and 3D medical image segmentation due to their extensive receptive fields and high parameter count. However, their effectiveness comes with the risk of overfitting when applied to small datasets. It often neglects the vital inductive biases of Convolutional Neural Networks (CNNs), essential for local feature representation. METHODS In this work, we propose PMFSNet, a novel medical imaging segmentation model that effectively balances global and local feature processing while avoiding the computational redundancy typical of larger models. PMFSNet streamlines the UNet-based hierarchical structure and simplifies the self-attention mechanism's computational complexity, making it suitable for lightweight applications. It incorporates a plug-and-play PMFS block, a multi-scale feature enhancement module based on attention mechanisms, to capture long-term dependencies. RESULTS The extensive comprehensive results demonstrate that our method achieves superior performance in various segmentation tasks on different data scales even with fewer than a million parameters. Results reveal that our PMFSNet achieves IoU of 84.68%, 82.02%, 78.82%, and 76.48% on public datasets of 3D CBCT Tooth, ovarian tumors ultrasound (MMOTU), skin lesions dermoscopy (ISIC 2018), and gastrointestinal polyp (Kvasir SEG), and yields DSC of 78.29%, 77.45%, and 78.04% on three retinal vessel segmentation datasets, DRIVE, STARE, and CHASE-DB1, respectively. CONCLUSION Our proposed model exhibits competitive performance across various datasets, accomplishing this with significantly fewer model parameters and inference time, demonstrating its value in model integration and deployment. It strikes an optimal compromise between efficiency and performance and can be a highly efficient solution for medical image analysis in resource-constrained clinical environments. The source code is available at https://github.com/yykzjh/PMFSNet.
Collapse
Affiliation(s)
- Jiahui Zhong
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, PR China.
| | - Wenhong Tian
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, PR China.
| | - Yuanlun Xie
- School of Electronic Information and Electrical Engineering, Chengdu University, Chengdu 610106, China.
| | - Zhijia Liu
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, PR China.
| | - Jie Ou
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, PR China.
| | - Taoran Tian
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, West China Hospital of Stomatology, Sichuan University, Chengdu, 610041, PR China.
| | - Lei Zhang
- School of Computer Science, University of Lincoln, LN6 7TS, UK.
| |
Collapse
|
14
|
Ji Z, Chen Z, Ma X. Grouped multi-scale vision transformer for medical image segmentation. Sci Rep 2025; 15:11122. [PMID: 40169823 PMCID: PMC11961587 DOI: 10.1038/s41598-025-95361-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2025] [Accepted: 03/20/2025] [Indexed: 04/03/2025] Open
Abstract
Medical image segmentation plays a pivotal role in clinical diagnosis and pathological research by delineating regions of interest within medical images. While early approaches based on Convolutional Neural Networks (CNNs) have achieved significant success, their limited receptive field constrains their ability to capture long-range dependencies. Recent advances in Vision Transformers (ViTs) have demonstrated remarkable improvements by leveraging self-attention mechanisms. However, existing ViT-based segmentation models often struggle to effectively capture multi-scale variations within a single attention layer, limiting their capacity to model complex anatomical structures. To address this limitation, we propose Grouped Multi-Scale Attention (GMSA), which enhances multi-scale feature representation by grouping channels and performing self-attention at different scales within a single layer. Additionally, we introduce Inter-Scale Attention (ISA) to facilitate cross-scale feature fusion, further improving segmentation performance. Extensive experiments on the Synapse, ACDC, and ISIC2018 datasets demonstrate the effectiveness of our model, achieving state-of-the-art results in medical image segmentation. Our code is available at: https://github.com/Chen2zheng/ScaleFormer .
Collapse
Affiliation(s)
- Zexuan Ji
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
| | - Zheng Chen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| | - Xiao Ma
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
| |
Collapse
|
15
|
Wang Y, Liu W, Yu P, Huang X, Pan J. RRM-TransUNet: Deep-Learning Driven Interactive Model for Precise Pancreas Segmentation in CT Images. Int J Med Robot 2025; 21:e70065. [PMID: 40209153 DOI: 10.1002/rcs.70065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 03/19/2025] [Accepted: 03/31/2025] [Indexed: 04/12/2025]
Abstract
BACKGROUND Pancreatic diseases such as cancer and pancreatitis pose significant health risks. Early detection requires precise segmentation results. Fully automatic segmentation algorithms cannot integrate clinical expertise and correct output errors, while interactive methods can offer a better chance for higher accuracy and reliability. METHODS We proposed a new network-RRM-TransUNet for the interactive pancreas segmentation task in CT images aiming to provide more reliable and precise results. The network incorporates Rotary Position Embedding, Root Mean Square Normalisation, and a Mixture of Experts mechanism. An intuitive interface is constructed for user-aided pancreas segmentation. RESULTS RRM-TransUNet achieves outstanding performance on multiple datasets, with a Dice Similarity Coefficient (DSC) of 93.82% and an Average Symmetric Surface Distance error (ASSD) of 1.12 mm on MSD, 93.79%/1.15 mm on AMOS, and 93.68%/1.18 mm on AbdomenCT-1K. CONCLUSION Our method outperforms previous methods and provides doctors with an efficient and user-friendly interactive pancreas segmentation experience through the intuitive interface.
Collapse
Affiliation(s)
- Yulan Wang
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Weimin Liu
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Peng Yu
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Xin Huang
- Pain Medicine Center, Peking University Third Hospital, Beijing, China
| | - Junjun Pan
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| |
Collapse
|
16
|
Wang Y, Meng C, Tang Z, Bai X, Ji P, Bai X. Unsupervised Domain Adaptation for Cross-Modality Cerebrovascular Segmentation. IEEE J Biomed Health Inform 2025; 29:2871-2884. [PMID: 40030830 DOI: 10.1109/jbhi.2024.3523103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Cerebrovascular segmentation from time-of-flight magnetic resonance angiography (TOF-MRA) and computed tomography angiography (CTA) is essential in providing supportive information for diagnosing and treatment planning of multiple intracranial vascular diseases. Different imaging modalities utilize distinct principles to visualize the cerebral vasculature, which leads to the limitations of expensive annotations and performance degradation while training and deploying deep learning models. In this paper, we propose an unsupervised domain adaptation framework CereTS to perform translation and segmentation of cross-modality unpaired cerebral angiography. Considering the commonality of vascular structures and stylistic textures as domain-invariant and domain-specific features, CereTS adopts a multi-level domain alignment pattern that includes an image-level cyclic geometric consistency constraint, a patch-level masked contrastive constraint and a feature-level semantic perception constraint to shrink domain discrepancy while preserving consistency of vascular structures. Conducted on a publicly available TOF-MRA dataset and a private CTA dataset, our experiment shows that CereTS outperforms current state-of-the-art methods by a large margin.
Collapse
|
17
|
Zhang X, Xiao Z, Wu X, Chen Y, Zhao J, Hu Y, Liu J. Pyramid Pixel Context Adaption Network for Medical Image Classification With Supervised Contrastive Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6802-6815. [PMID: 38829749 DOI: 10.1109/tnnls.2024.3399164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2024]
Abstract
Spatial attention (SA) mechanism has been widely incorporated into deep neural networks (DNNs), significantly lifting the performance in computer vision tasks via long-range dependency modeling. However, it may perform poorly in medical image analysis. Unfortunately, the existing efforts are often unaware that long-range dependency modeling has limitations in highlighting subtle lesion regions. To overcome this limitation, we propose a practical yet lightweight architectural unit, pyramid pixel context adaption (PPCA) module, which exploits multiscale pixel context information to recalibrate pixel position in a pixel-independent manner dynamically. PPCA first applies a well-designed cross-channel pyramid pooling (CCPP) to aggregate multiscale pixel context information, then eliminates the inconsistency among them by the well-designed pixel normalization (PN), and finally estimates per pixel attention weight via a pixel context integration. By embedding PPCA into a DNN with negligible overhead, the PPCA network (PPCANet) is developed for medical image classification. In addition, we introduce supervised contrastive learning to enhance feature representation by exploiting the potential of label information via supervised contrastive loss (CL). The extensive experiments on six medical image datasets show that the PPCANet outperforms state-of-the-art (SOTA) attention-based networks and recent DNNs. We also provide visual analysis and ablation study to explain the behavior of PPCANet in the decision-making process.
Collapse
|
18
|
Cao Y, Liang F, Zhao T, Han J, Wang Y, Wu H, Zhang K, Qiu H, Ding Y, Zhu H. Brain tumor intelligent diagnosis based on Auto-Encoder and U-Net feature extraction. PLoS One 2025; 20:e0315631. [PMID: 40127071 PMCID: PMC11932485 DOI: 10.1371/journal.pone.0315631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 11/27/2024] [Indexed: 03/26/2025] Open
Abstract
Preoperative classification of brain tumors is critical to developing personalized treatment plans, however existing classification methods rely on manual intervention and often have problems with efficiency and accuracy, which may lead to misdiagnosis or delayed diagnosis in clinical practice and affect the therapeutic effect. We propose a fully automated approach to brain tumor magnetic resonance imaging (MRI) classification, consisted by a feature extractor based on the improved U-Net and a classifier based on convolutional recurrent neural network (CRNN). The encoder of the feature extractor based on dense block, is used to enhance feature propagation and reduce the number of parameters. The decoder uses residual block to reduce the weight of some features for improving the effect of MRI spatial sequence reconstruction, and avoid gradient disappearance. Skip connections between the encoder and the decoder effectively merge low-level features and high-level features. The extract feature sequence is input into the CRNN-based classifier for final classification. We assessed the performance of our method for grading glioma, glioma isocitrate dehydrogenase1 (IDH1) mutation status classification and pituitary tumor texture classification on two datasets, glioma or pituitary tumors collected in a local affiliated hospital and glioma imaging data from TCIA. Compared with commonly models and new models, our model achieves higher accuracy, with an accuracy of 90.72%, classified glioma IDH1 mutation status with an accuracy of 94.35%, and classified pituitary tumor texture with an accuracy of 94.64%.
Collapse
Affiliation(s)
- Yaru Cao
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Fengning Liang
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Teng Zhao
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Jinting Han
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Yingchao Wang
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Haowen Wu
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Kexing Zhang
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Huiwen Qiu
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Yizhe Ding
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Hong Zhu
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, New York, United States of America
| |
Collapse
|
19
|
Zhang Z, Liu T, Fan G, Li N, Li B, Pu Y, Feng Q, Zhou S. SpineMamba: Enhancing 3D spinal segmentation in clinical imaging through residual visual Mamba layers and shape priors. Comput Med Imaging Graph 2025; 123:102531. [PMID: 40154009 DOI: 10.1016/j.compmedimag.2025.102531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 03/03/2025] [Accepted: 03/13/2025] [Indexed: 04/01/2025]
Abstract
Accurate segmentation of three-dimensional (3D) clinical medical images is critical for the diagnosis and treatment of spinal diseases. However, the complexity of spinal anatomy and the inherent uncertainties of current imaging technologies pose significant challenges for the semantic segmentation of spinal images. Although convolutional neural networks (CNNs) and Transformer-based models have achieved remarkable progress in spinal segmentation, their limitations in modeling long-range dependencies hinder further improvements in segmentation accuracy. To address these challenges, we propose a novel framework, SpineMamba, which incorporates a residual visual Mamba layer capable of effectively capturing and modeling the deep semantic features and long-range spatial dependencies in 3D spinal data. To further enhance the structural semantic understanding of the vertebrae, we also propose a novel spinal shape prior module that captures specific anatomical information about the spine from medical images, significantly enhancing the model's ability to extract structural semantic information of the vertebrae. Extensive comparative and ablation experiments across three datasets demonstrate that SpineMamba outperforms existing state-of-the-art models. On two computed tomography (CT) datasets, the average Dice similarity coefficients achieved are 94.40±4% and 88.28±3%, respectively, while on a magnetic resonance (MR) dataset, the model achieves a Dice score of 86.95±10%. Notably, SpineMamba surpasses the widely recognized nnU-Net in segmentation accuracy, with a maximum improvement of 3.63 percentage points. These results highlight the precision, robustness, and exceptional generalization capability of SpineMamba.
Collapse
Affiliation(s)
- Zhiqing Zhang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Tianyong Liu
- Institute of Unconventional Oil & Gas Research, Northeast Petroleum University, Street 15, Daqing, 163318, China
| | - Guojia Fan
- College Of Information Science and Engineering, Northeastern University, Liaoning, 110819, China
| | - Na Li
- Department of Biomedical Engineering, Guangdong Medical University, Dongguan, 523808, China
| | - Bin Li
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Yao Pu
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China
| | - Qianjin Feng
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China.
| | - Shoujun Zhou
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| |
Collapse
|
20
|
Qin J, Xu D, Zhang H, Xiong Z, Yuan Y, He K. BTSegDiff: Brain tumor segmentation based on multimodal MRI Dynamically guided diffusion probability model. Comput Biol Med 2025; 186:109694. [PMID: 39842237 DOI: 10.1016/j.compbiomed.2025.109694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 01/11/2025] [Accepted: 01/13/2025] [Indexed: 01/24/2025]
Abstract
In the treatment of brain tumors, accurate diagnosis and treatment heavily rely on reliable brain tumor segmentation, where multimodal Magnetic Resonance Imaging (MRI) plays a pivotal role by providing valuable complementary information. This integration significantly enhances the performance of brain tumor segmentation. However, due to the uneven grayscale distribution, irregular shapes, and significant size variations in brain tumor images, this task remains highly challenging. In order to overcome these obstacles, we have introduced a novel framework for automated segmentation of brain tumors that leverages the diverse information from multi-modal MRI scans. Our proposed method is named BTSegDiff and it is based on a Diffusion Probability Model (DPM). First, we designed a dynamic conditional guidance module consisting of an encoder. This encoder is used to extract information from multimodal MRI images and guide the DPM in generating accurate and realistic segmentation masks. During the guidance process, we need to fuse the diffused generated features with the extracted multimodal features. However, diffusion process itself introduces a significant amount of Gaussian noise, which can affect the fusion results. Therefore, we designed a Fourier domain feature fusion module to transfer this fusion process to Euclidean space and reduce the impact of high-frequency noise on fusion. Lastly, we have taken into account that the DPM, as a generative model, produces non-unique results with each sampling. In the meticulous field of medicine, this is highly detrimental. Therefore, we have designed a Stepwise Uncertainty Sampling module based on Monte Carlo uncertainty calculation to generate unique outcomes and enhance segmentation accuracy simultaneously. To validate the effectiveness of our approach, we perform a validation on the popular BraTs2020 and BraTS2021 benchmarks. The experimental results show that our method outperforms many existing brain tumor segmentation methods. Our code is available at https://github.com/jaceqin/BTSegDiff.
Collapse
Affiliation(s)
- Jiacheng Qin
- School of Information Science and Engineering, Yunnan University, 650500, Kunming, China
| | - Dan Xu
- School of Information Science and Engineering, Yunnan University, 650500, Kunming, China.
| | - Hao Zhang
- School of Information Science and Engineering, Yunnan University, 650500, Kunming, China.
| | | | - Yejing Yuan
- School of Information Science and Engineering, Yunnan University, 650500, Kunming, China.
| | - Kangjian He
- School of Information Science and Engineering, Yunnan University, 650500, Kunming, China.
| |
Collapse
|
21
|
Yu Q, Ning H, Yang J, Li C, Qi Y, Qu M, Li H, Sun S, Cao P, Feng C. CMR-BENet: A confidence map refinement boundary enhancement network for left ventricular myocardium segmentation. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 260:108544. [PMID: 39709745 DOI: 10.1016/j.cmpb.2024.108544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Revised: 11/06/2024] [Accepted: 12/02/2024] [Indexed: 12/24/2024]
Abstract
BACKGROUND AND OBJECTIVE Left ventricular myocardium segmentation is of great significance for clinical diagnosis, treatment, and prognosis. However, myocardium segmentation is challenging as the medical image quality is disturbed by various factors such as motion, artifacts, and noise. Its accuracy largely depends on the accurate identification of edges and structures. Most existing encoder-decoder based segmentation methods capture limited contextual information and ignore the awareness of myocardial shape and structure, often producing unsatisfactory boundary segmentation results in noisy scenes. Moreover, these methods fail to assess the reliability of the predictions, which is crucial for clinical decisions and applications in medical tasks. Therefore, this study explores how to effectively combine contextual information with myocardial edge structure and confidence maps to improve segmentation performance in an end-to-end network. METHODS In this paper, we propose an end-to-end confidence map refinement boundary enhancement network (CMR-BENet) for left ventricular myocardium segmentation. CMR-BENet has three components: a layer semantic-aware module (LSA), an edge information enhancement module (EIE), and a confidence map-based refinement module (CMR). Specifically, LSA first adaptively fuses high- and low-level semantic information across hierarchical layers to mitigate the bias of single-layer features affected by noise. EIE then improves the edge and structure recognition by designing the edge and mask guidance module (EMG) and the edge structure-aware module (ESA). Finally, CMR provides a simple and efficient way to estimate confidence maps and effectively combines the encoder features to refine the segmentation results. RESULTS Experiments on two echocardiography datasets and one cardiac MRI dataset show that the proposed CMR-BENet outperforms its rivals in the left ventricular myocardium segmentation task with Dice (DI) of 87.71%, 79.33%, and 89.11%, respectively. CONCLUSION This paper utilizes edge information to characterize the shape and structure of the myocardium and introduces learnable confidence maps to evaluate and refine the segmentation results. Our findings provide strong support and reference for physicians in diagnosis and treatment.
Collapse
Affiliation(s)
- Qi Yu
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, China
| | - Hongxia Ning
- Department of Cardiovascular Ultrasound, The First Hospital of China Medical University, Shenyang, China; Clinical Medical Research Center of Imaging in Liaoning Province, Shenyang, China
| | - Jinzhu Yang
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, China.
| | - Chen Li
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
| | - Yiqiu Qi
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, China
| | - Mingjun Qu
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, China
| | - Honghe Li
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, China
| | - Song Sun
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, China
| | - Peng Cao
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, China
| | - Chaolu Feng
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, China
| |
Collapse
|
22
|
Zhu W, Liu D, Zhuang X, Gong T, Shi F, Xiang D, Peng T, Zhang X, Chen X. Strip and boundary detection multi-task learning network for segmentation of meibomian glands. Med Phys 2025; 52:1615-1628. [PMID: 39589258 DOI: 10.1002/mp.17542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 10/28/2024] [Accepted: 11/11/2024] [Indexed: 11/27/2024] Open
Abstract
BACKGROUND Automatic segmentation of meibomian glands in near-infrared meibography images is basis of morphological parameter analysis, which plays a crucial role in facilitating the diagnosis of meibomian gland dysfunction (MGD). The special strip shape and the adhesion between glands make the automatic segmentation of meibomian glands very challenging. PURPOSE A strip and boundary detection multi-task learning network (SBD-MTLNet) based on encoder-decoder structure is proposed to realize the automatic segmentation of meibomian glands. METHODS A strip mixed attention module (SMAM) is proposed to enhance the network's ability to recognize the strip shape of glands. To alleviate the problem of adhesion between glands, a boundary detection auxiliary network (BDA-Net) is proposed, which introduces boundary features to assist gland segmentation. A self-adaptive interactive information fusion module (SIIFM) based on reverse attention mechanism is proposed to realize information complementation between meibomian gland segmentation and boundary detection tasks. The proposed SBD-MTLNet has been evaluated on an in-house dataset (453 images) and a public dataset MGD-1K (1000 images). Due to the limited number of images, a five-fold cross validation strategy is adopted. RESULTS Average dice coefficient of the proposed SBD-MTLNet reaches 81.08% and 84.32% on the in-house dataset and the public one, respectively. Comprehensive experimental results demonstrate the effectiveness the proposed SBD-MTLNet, outperforming other state-of-the-art methods. CONCLUSIONS The proposed SBD-MTLNet can focus more on the shape characteristics of the meibomian glands and the boundary contour information between the adjacent glands via multi-task learning strategy. The segmentation results of the proposed method can be used for the quantitative morphological characteristics analysis of meibomian glands, which has potential for the auxiliary diagnosis of MGD in clinic.
Collapse
Affiliation(s)
- Weifang Zhu
- MIPAV Lab, School of Electronic and Information Engineering, Soochow University, Suzhou, China
| | - Dengfeng Liu
- MIPAV Lab, School of Electronic and Information Engineering, Soochow University, Suzhou, China
| | - Xinyu Zhuang
- Department of Ophthalmology, The Fourth Affiliated Hospital of Soochow University, Suzhou, China
| | - Tian Gong
- MIPAV Lab, School of Electronic and Information Engineering, Soochow University, Suzhou, China
| | - Fei Shi
- MIPAV Lab, School of Electronic and Information Engineering, Soochow University, Suzhou, China
| | - Dehui Xiang
- MIPAV Lab, School of Electronic and Information Engineering, Soochow University, Suzhou, China
| | - Tao Peng
- School of Future Science and Engineering, Soochow University, Suzhou, China
| | - Xiaofeng Zhang
- Department of Ophthalmology, The Fourth Affiliated Hospital of Soochow University, Suzhou, China
| | - Xinjian Chen
- MIPAV Lab, School of Electronic and Information Engineering, Soochow University, Suzhou, China
- State Key Laboratory of Radiation Medicine and Protection, Soochow University, Suzhou, China
| |
Collapse
|
23
|
Ke X, Chen G, Liu H, Guo W. MEFA-Net: A mask enhanced feature aggregation network for polyp segmentation. Comput Biol Med 2025; 186:109601. [PMID: 39740513 DOI: 10.1016/j.compbiomed.2024.109601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 11/30/2024] [Accepted: 12/18/2024] [Indexed: 01/02/2025]
Abstract
Accurate polyp segmentation is crucial for early diagnosis and treatment of colorectal cancer. This is a challenging task for three main reasons: (i) the problem of model overfitting and weak generalization due to the multi-center distribution of data; (ii) the problem of interclass ambiguity caused by motion blur and overexposure to endoscopic light; and (iii) the problem of intraclass inconsistency caused by the variety of morphologies and sizes of the same type of polyps. To address these challenges, we propose a new high-precision polyp segmentation framework, MEFA-Net, which consists of three modules, including the plug-and-play Mask Enhancement Module (MEG), Separable Path Attention Enhancement Module (SPAE), and Dynamic Global Attention Pool Module (DGAP). Specifically, firstly, the MEG module regionally masks the high-energy regions of the environment and polyps through a mask, which guides the model to rely on only a small amount of information to distinguish between polyps and background features, avoiding the model from overfitting the environmental information, and improving the robustness of the model. At the same time, this module can effectively counteract the "dark corner phenomenon" in the dataset and further improve the generalization performance of the model. Next, the SPAE module can effectively alleviate the inter-class fuzzy problem by strengthening the feature expression. Then, the DGAP module solves the intra-class inconsistency problem by extracting the invariance of scale, shape and position. Finally, we propose a new evaluation metric, MultiColoScore, for comprehensively evaluating the segmentation performance of the model on five datasets with different domains. We evaluated the new method quantitatively and qualitatively on five datasets using four metrics. Experimental results show that MEFA-Net significantly improves the accuracy of polyp segmentation and outperforms current state-of-the-art algorithms. Code posted on https://github.com/847001315/MEFA-Net.
Collapse
Affiliation(s)
- Xiao Ke
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China
| | - Guanhong Chen
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China
| | - Hao Liu
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China
| | - Wenzhong Guo
- Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China.
| |
Collapse
|
24
|
Wang C, Jiang M, Li Y, Wei B, Li Y, Wang P, Yang G. MP-FocalUNet: Multiscale parallel focal self-attention U-Net for medical image segmentation. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 260:108562. [PMID: 39675195 DOI: 10.1016/j.cmpb.2024.108562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 12/05/2024] [Accepted: 12/08/2024] [Indexed: 12/17/2024]
Abstract
BACKGROUND AND OBJECTIVE Medical image segmentation has been significantly improved in recent years with the progress of Convolutional Neural Networks (CNNs). Due to the inherent limitations of convolutional operations, CNNs perform poorly in learning the correlation information between global and long-range features. To solve this problem, some existing solutions rely on building deep encoders and down-sampling operations, but such methods are prone to produce redundant network structures and lose local details. Therefore, medical image segmentation tasks require better solutions to improve the modeling of the global context, while maintaining a strong grasp of the low-level details. METHODS We propose a novel multiscale parallel branch architecture (MP-FocalUNet). On the encoder side of MP-FocalUNet, dual-scale sub-networks are used to extract information of different scales. A cross-scale "Feature Fusion" (FF) module was proposed to explore the potential of dual branch networks and fully utilize feature representations at different scales. On the decoder side, combined with the traditional CNN in parallel, focal self-attention is used for long-distance modeling, which can effectively capture the global dependencies and underlying spatial details in a shallower way. RESULTS Our proposed method is evaluated on both abdominal organ segmentation datasets and automatic cardiac diagnosis challenge datasets. Our method consistently outperforms several state-of-the-art segmentation methods with an average Dice score of 82.45 % (2.68 % higher than HC-Net) and 91.44 % (0.35 % higher than HC-Net) on the abdominal organ datasets and the automatic cardiac diagnosis challenge datasets, respectively. CONCLUSIONS Our MP-FocalUNet is a novel encoder-decoder based multiscale parallel branch Transformer network, which solves the problem of insufficient long-distance modeling in CNNs and fuses image information at different scales. Extensive experiments on abdominal and cardiac medical image segmentation tasks show that our MP-FocalUNet outperforms other state-of-the-art methods. In the future, our work will focus on designing more lightweight Transformer-based models and better learning pixel-level intrinsic structural features generated by patch division in visual Transformers.
Collapse
Affiliation(s)
- Chuan Wang
- School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Mingfeng Jiang
- School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Yang Li
- School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China.
| | - Bo Wei
- School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Yongming Li
- College of Communication Engineering, Chongqing University, Chongqing, China
| | - Pin Wang
- College of Communication Engineering, Chongqing University, Chongqing, China
| | - Guang Yang
- Cardiovascular Research Centre, Royal Brompton Hospital, London SW3 6NP, United Kingdom; National Heart and Lung Institute, Imperial College London, London SW7 2AZ, United Kingdom
| |
Collapse
|
25
|
Russon D, Guennec A, Naredo-Turrado J, Xu B, Boussuge C, Battaglia V, Hiron B, Lagarde E. Evaluating pedestrian crossing safety: Implementing and evaluating a convolutional neural network model trained on paired aerial and subjective perspective images. Heliyon 2025; 11:e42428. [PMID: 40028551 PMCID: PMC11872108 DOI: 10.1016/j.heliyon.2025.e42428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 01/30/2025] [Accepted: 01/31/2025] [Indexed: 03/05/2025] Open
Abstract
With pedestrian crossings implicated in a significant proportion of vehicle-pedestrian accidents and the French government's initiatives to improve pedestrian safety, there is a pressing need for efficient, large-scale evaluation of pedestrian crossings. This study proposes the deployment of advanced deep learning neural networks to automate the assessment of pedestrian crossings and roundabouts, leveraging aerial and street-level imagery sourced from Google Maps and Google Street View. Utilizing ConvNextV2, ResNet50, and ResNext50 models, we conducted a comprehensive analysis of pedestrian crossings across various urban and rural settings in France, focusing on nine identified risk factors. Our methodology incorporates Mask R-CNN for precise segmentation and detection of zebra crossings and roundabouts, overcoming traditional data annotation challenges and extending coverage to underrepresented areas. The analysis reveals that the ConvNextV2 model, in particular, demonstrates superior performance across most tasks, despite challenges such as data imbalance and the complex nature of variables like visibility and parking proximity. The findings highlight the potential of convolutional neural networks in improving pedestrian safety by enabling scalable and objective evaluations of crossings. The study underscores the necessity for continued dataset augmentation and methodological advancements to tackle identified challenges. Our research contributes to the broader field of road safety by demonstrating the feasibility and effectiveness of automated, image-based pedestrian crossing audits, paving the way for more informed and effective safety interventions.
Collapse
Affiliation(s)
- Dylan Russon
- University of Bordeaux, INSERM BPH U1219, Bordeaux, F-33000, France
| | - Antoine Guennec
- University of Bordeaux, INSERM BPH U1219, Bordeaux, F-33000, France
| | | | - Binbin Xu
- EuroMov Digital Health in Motion, Univ Montpellier, IMT Mines Ales, Ales, France
| | | | | | | | - Emmanuel Lagarde
- University of Bordeaux, INSERM BPH U1219, Bordeaux, F-33000, France
| |
Collapse
|
26
|
Arikan M, Willoughby J, Ongun S, Sallo F, Montesel A, Ahmed H, Hagag A, Book M, Faatz H, Cicinelli MV, Fawzi AA, Podkowinski D, Cilkova M, De Almeida DM, Zouache M, Ramsamy G, Lilaonitkul W, Dubis AM. OCT5k: A dataset of multi-disease and multi-graded annotations for retinal layers. Sci Data 2025; 12:267. [PMID: 39952954 PMCID: PMC11829038 DOI: 10.1038/s41597-024-04259-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Accepted: 12/09/2024] [Indexed: 02/17/2025] Open
Abstract
Publicly available open-access OCT datasets for retinal layer segmentation have been limited in scope, often being small in size, specific to a single disease, or containing only one grading. This dataset improves upon this with multi-grader and multi-disease labels for training machine learning-based algorithms. The proposed dataset covers three subsets of scans (Age-related Macular Degeneration, Diabetic Macular Edema, and healthy) and annotations for two types of tasks (semantic segmentation and object detection). This dataset compiled 5016 pixel-wise manual labels for 1672 OCT scans featuring 5 layer boundaries for three different disease classes to support development of automatic techniques. A subset of data (566 scans across 9 classes of disease biomarkers) was subsequently labeled for disease features for 4698 bounding box annotations. To minimize bias, images were shuffled and distributed among graders. Retinal layers were corrected, and outliers identified using the interquartile range (IQR). This step was iterated three times, improving layer annotations' quality iteratively, ensuring a reliable dataset for automated retinal image analysis.
Collapse
Affiliation(s)
| | | | - Sevim Ongun
- UCL, Institute of Ophthalmology, London, EC1V 9EL, UK
| | - Ferenc Sallo
- Jules Gonin Eye Hospital, Department of Ophthalmology, University of Lausanne, Lausanne, Switzerland
| | - Andrea Montesel
- Jules Gonin Eye Hospital, Department of Ophthalmology, University of Lausanne, Lausanne, Switzerland
| | - Hend Ahmed
- University College London Hospitals NHS Foundation Trust, London, UK
| | - Ahmed Hagag
- UCL, Institute of Ophthalmology, London, EC1V 9EL, UK
- Moorfields Eye Hospital NHS Foundation, NIHR Moorfields Biomedical Research Centre, London, EC1V 2PD, UK
| | - Marius Book
- Rare Retinal Disease Center, AugenZentrum Siegburg, Siegburg, Germany
| | - Henrik Faatz
- Eye Center at St. Franziskus Hospital Münster, Münster, Germany
| | - Maria Vittoria Cicinelli
- Department of Ophthalmology, IRCCS San Raffaele Scientific Institute, Milan, Italy
- School of Medicine, Vita-Salute San Raffaele University, Milan, Italy
| | | | - Dominika Podkowinski
- Department of Ophthalmology, Kepler University Clinic, Linz, Austria and Vienna Institute for Research in Ocular Surgery (VIROS), Hanusch Hospital, Vienna, Austria
| | - Marketa Cilkova
- Moorfields Eye Hospital NHS Foundation, NIHR Moorfields Biomedical Research Centre, London, EC1V 2PD, UK
| | - Diana Morais De Almeida
- Jules Gonin Eye Hospital, Department of Ophthalmology, University of Lausanne, Lausanne, Switzerland
| | - Moussa Zouache
- Department of Ophthalmology & Visual Sciences, University of Utah, Salt Lake City, USA
| | | | - Watjana Lilaonitkul
- UCL, Global Business School for Health, London, WC1E 6BT, UK
- Health Data Research UK (HDR UK), London, NW1 2BE, UK
- UCL, Institute of Health Informatics, London, NW1 2DA, UK
| | - Adam M Dubis
- UCL, Institute of Ophthalmology, London, EC1V 9EL, UK.
- Department of Ophthalmology & Visual Sciences, University of Utah, Salt Lake City, USA.
| |
Collapse
|
27
|
Ye H, Zhang X, Hu Y, Fu H, Liu J. VSR-Net: Vessel-Like Structure Rehabilitation Network With Graph Clustering. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:1090-1105. [PMID: 40031729 DOI: 10.1109/tip.2025.3526061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
The morphologies of vessel-like structures, such as blood vessels and nerve fibres, play significant roles in disease diagnosis, e.g., Parkinson's disease. Although deep network-based refinement segmentation and topology-preserving segmentation methods recently have achieved promising results in segmenting vessel-like structures, they still face two challenges: 1) existing methods often have limitations in rehabilitating subsection ruptures in segmented vessel-like structures; 2) they are typically overconfident in predicted segmentation results. To tackle these two challenges, this paper attempts to leverage the potential of spatial interconnection relationships among subsection ruptures from the structure rehabilitation perspective. Based on this perspective, we propose a novel Vessel-like Structure Rehabilitation Network (VSR-Net) to both rehabilitate subsection ruptures and improve the model calibration based on coarse vessel-like structure segmentation results. VSR-Net first constructs subsection rupture clusters via a Curvilinear Clustering Module (CCM). Then, the well-designed Curvilinear Merging Module (CMM) is applied to rehabilitate the subsection ruptures to obtain the refined vessel-like structures. Extensive experiments on six 2D/3D medical image datasets show that VSR-Net significantly outperforms state-of-the-art (SOTA) refinement segmentation methods with lower calibration errors. Additionally, we provide quantitative analysis to explain the morphological difference between the VSR-Net's rehabilitation results and ground truth (GT), which are smaller compared to those between SOTA methods and GT, demonstrating that our method more effectively rehabilitates vessel-like structures.
Collapse
|
28
|
Wen L, Sun H, Liang G, Yu Y. A deep ensemble learning framework for glioma segmentation and grading prediction. Sci Rep 2025; 15:4448. [PMID: 39910114 PMCID: PMC11799385 DOI: 10.1038/s41598-025-87127-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Accepted: 01/16/2025] [Indexed: 02/07/2025] Open
Abstract
The segmentation and risk grade prediction of gliomas based on preoperative multimodal magnetic resonance imaging (MRI) are crucial tasks in computer-aided diagnosis. Due to the significant heterogeneity between and within tumors, existing methods mainly rely on single-task approaches, overlooking the inherent correlation between segmentation and grading tasks. Furthermore, the limited availability of glioma grading data presents further challenges. To address these issues, we propose a deep-ensemble learning framework based on multimodal MRI and the U-Net model, which simultaneously performs glioma segmentation and risk grade prediction. We introduce asymmetric convolution and dual-domain attention in the encoder, fully integrating effective information from different modalities, enhancing the extraction of features from critical regions, and constructing a dual-branch decoder that combines spatial features and global semantic information for both segmentation and grading. In addition, we propose a weighted composite adaptive loss function to balance the optimization objectives of the two tasks. Our experimental results on the BraTS dataset demonstrate that our method outperforms state-of-the-art methods, yielding superior segmentation accuracy and precise risk grade prediction.
Collapse
Affiliation(s)
- Liang Wen
- General Hospital of Northern Theater Command, Shenyang, 110122, China.
- China Medical University, Shenyang, 110122, China.
| | - Hui Sun
- College of Information Science and Engineering, Northeastern University, Shenyang, 110819, China
| | - Guobiao Liang
- General Hospital of Northern Theater Command, Shenyang, 110122, China
- China Medical University, Shenyang, 110122, China
| | - Yue Yu
- College of Information Science and Engineering, Northeastern University, Shenyang, 110819, China
| |
Collapse
|
29
|
Wang J, Feng H, Houssou Hounye A, Tang M, Shu Y, Hou M, Chen S. A Boundary-Enhanced Decouple Fusion Segmentation Network for Diagnosis of Adenomatous Polyps. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025; 38:229-244. [PMID: 39037669 PMCID: PMC11811332 DOI: 10.1007/s10278-024-01195-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 06/14/2024] [Accepted: 07/02/2024] [Indexed: 07/23/2024]
Abstract
Adenomatous polyps, a common premalignant lesion, are often classified into villous adenoma (VA) and tubular adenoma (TA). VA has a higher risk of malignancy, whereas TA typically grows slowly and has a lower likelihood of cancerous transformation. Accurate classification is essential for tailored treatment. In this study, we develop a deep learning-based approach for the localization and classification of adenomatous polyps using endoscopic images. Specifically, a pre-trained EGE-UNet is first adopted to extract regions of interest from original images. Multi-level feature maps are then extracted by the feature extraction pipeline (FEP). The deep-level features are fed into the Pyramid Pooling Module (PPM) to capture global contextual information, and the squeeze body edge (SBE) module is then used to decouple the body and edge parts of features, enabling separate analysis of their distinct characteristics. The Group Aggregation Bridge (GAB) and Boundary Enhancement Module (BEM) are then applied to enhance the body features and edge features, respectively, emphasizing their structural and morphological characteristics. By combining the features of the body and edge parts, the final output can be obtained. Experiments show the proposed method achieved promising results on two private datasets. For adenoma vs. non-adenoma classification, It achieved a mIoU of 91.41%, mPA of 96.33%, mHD of 11.63, and mASD of 2.33. For adenoma subclassification (non-adenomas vs. villous adenomas vs. tubular adenomas), it achieved a mIoU of 91.21%, mPA of 94.83%, mHD of 13.75, and mASD of 2.56. These results demonstrate the potential of our approach for precise adenomatous polyp classification.
Collapse
Affiliation(s)
- Jiaoju Wang
- School of Mathematics and Statistics, Central South University, Changsha, 410083, Hunan, China
- School of Mathematics and Statistics, Nanyang Normal University, Nanyang, 473061, Henan, China
| | - Haoran Feng
- School of Mathematics and Statistics, Central South University, Changsha, 410083, Hunan, China
| | - Alphonse Houssou Hounye
- School of Mathematics and Statistics, Central South University, Changsha, 410083, Hunan, China
| | - Meiling Tang
- School of Mathematics and Statistics, Central South University, Changsha, 410083, Hunan, China
| | - Yiming Shu
- School of Mathematics and Statistics, Central South University, Changsha, 410083, Hunan, China
| | - Muzhou Hou
- School of Mathematics and Statistics, Central South University, Changsha, 410083, Hunan, China.
| | - Shuijiao Chen
- Department of Gastroenterology, Xiangya Hospital of Central South University, Changsha, 410008, Hunan, China.
| |
Collapse
|
30
|
Chi J, Chen JH, Wu B, Zhao J, Wang K, Yu X, Zhang W, Huang Y. A Dual-Branch Cross-Modality-Attention Network for Thyroid Nodule Diagnosis Based on Ultrasound Images and Contrast-Enhanced Ultrasound Videos. IEEE J Biomed Health Inform 2025; 29:1269-1282. [PMID: 39356606 DOI: 10.1109/jbhi.2024.3472609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2024]
Abstract
Contrast-enhanced ultrasound (CEUS) has been extensively employed as an imaging modality in thyroid nodule diagnosis due to its capacity to visualise the distribution and circulation of micro-vessels in organs and lesions in a non-invasive manner. However, current CEUS-based thyroid nodule diagnosis methods suffered from: 1) the blurred spatial boundaries between nodules and other anatomies in CEUS videos, and 2) the insufficient representations of the local structural information of nodule tissues by the features extracted only from CEUS videos. In this paper, we propose a novel dual-branch network with a cross-modality-attention mechanism for thyroid nodule diagnosis by integrating the information from tow related modalities, i.e., CEUS videos and ultrasound image. The mechanism has two parts: US-attention-from-CEUS transformer (UAC-T) and CEUS-attention-from-US transformer (CAU-T). As such, this network imitates the manner of human radiologists by decomposing the diagnosis into two correlated tasks: 1) the spatio-temporal features extracted from CEUS are hierarchically embedded into the spatial features extracted from US with UAC-T for the nodule segmentation; 2) the US spatial features are used to guide the extraction of the CEUS spatio-temporal features with CAU-T for the nodule classification. The two tasks are intertwined in the dual-branch end-to-end network and optimized with the multi-task learning (MTL) strategy. The proposed method is evaluated on our collected thyroid US-CEUS dataset. Experimental results show that our method achieves the classification accuracy of 86.92%, specificity of 66.41%, and sensitivity of 97.01%, outperforming the state-of-the-art methods. As a general contribution in the field of multi-modality diagnosis of diseases, the proposed method has provided an effective way to combine static information with its related dynamic information, improving the quality of deep learning based diagnosis with an additional benefit of explainability.
Collapse
|
31
|
Wang L, Xu Q, Chen C, Yang H, Deng G. Adaptive cascade decoders for segmenting challenging regions in medical images. Comput Biol Med 2025; 185:109572. [PMID: 39708501 DOI: 10.1016/j.compbiomed.2024.109572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Revised: 10/14/2024] [Accepted: 12/11/2024] [Indexed: 12/23/2024]
Abstract
CNN-based techniques have achieved impressive outcomes in medical image segmentation but struggle to capture long-term dependencies between pixels. The Transformer, with its strong feature extraction and representation learning abilities, performs exceptionally well within the domain of medical image partitioning. However, there are still shortcomings in bridging local to global connections, resulting in occasional loss of positional information. To address this, we introduce a decoder based on dynamic convolution, called Adaptive Cascade Decoder (ACD). It can adaptively adjust the receptive field size based on medical images, adapting a set of parameters for each medical image individually. The ACD consists of an Adaptive Attention (ADA) and a Multi-Scale Convolution module (MSC). By enhancing feature extraction from local to global scales, it addresses issues of diminished contrast and fuzzy boundaries common in medical image segmentation. While increasing contextual connections, it also reduces certain parameters, thereby lowering memory consumption. Our model, T-ACD, uses the encoder backbone of TransUNet, which chunks feature maps from convolutional neural networks and feeds them as one-dimensional sequences into the Transformer. This leverages the Transformer's prowess in handling sequences, further refining the extracted features. In experiments involving heart and multi-organ segmentation, T-ACD excels in segmenting challenging areas. On the ACDC dataset, we achieve a DICE coefficient of 92.02 %. For the most challenging right ventricle, it is improved to 90.68 %, which is increased by 5.17 %. In the realm of medical segmentation, the core design of ACD can be generalized to other challenging organ segmentations.
Collapse
Affiliation(s)
- Lili Wang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, Heilongjiang, 150080, China.
| | - Qian Xu
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, Heilongjiang, 150080, China
| | - Chen Chen
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, Heilongjiang, 150080, China
| | - Hailu Yang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, Heilongjiang, 150080, China
| | - Ge Deng
- Aerospace Hi-tech Holding Group Co., LTD, Harbin, Heilongjiang, 150060, China
| |
Collapse
|
32
|
Wang X, Yu J, Zhang B, Huang X, Shen X, Xia M. LightAWNet: Lightweight adaptive weighting network based on dynamic convolutions for medical image segmentation. J Appl Clin Med Phys 2025; 26:e14584. [PMID: 39616626 PMCID: PMC11799907 DOI: 10.1002/acm2.14584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 11/05/2024] [Accepted: 11/12/2024] [Indexed: 02/07/2025] Open
Abstract
PURPOSE The complexity of convolutional neural networks (CNNs) can lead to improved segmentation accuracy in medical image analysis but also results in increased network complexity and training challenges, especially under resource limitations. Conversely, lightweight models offer efficiency but often sacrifice accuracy. This paper addresses the challenge of balancing efficiency and accuracy by proposing LightAWNet, a lightweight adaptive weighting neural network for medical image segmentation. METHODS We designed LightAWNet with an efficient inverted bottleneck encoder block optimized by spatial attention. A two-branch strategy is employed to separately extract detailed and spatial features for fusion, enhancing the reusability of model feature maps. Additionally, a lightweight optimized up-sampling operation replaces traditional transposed convolution, and channel attention is utilized in the decoder to produce more accurate outputs efficiently. RESULTS Experimental results on the LiTS2017, MM-WHS, ISIC2018, and Kvasir-SEG datasets demonstrate that LightAWNet achieves state-of-the-art performance with only 2.83 million parameters. Our model significantly outperforms existing methods in terms of segmentation accuracy, highlighting its effectiveness in maintaining high performance with reduced complexity. CONCLUSIONS LightAWNet successfully balances efficiency and accuracy in medical image segmentation. The innovative use of spatial attention, dual-branch feature extraction, and optimized up-sampling operations contribute to its superior performance. These findings offer valuable insights for the development of resource-efficient yet highly accurate segmentation models in medical imaging. The code will be made available at https://github.com/zjmiaprojects/lightawnet upon acceptance for publication.
Collapse
Affiliation(s)
- Xiaoyan Wang
- School of Computer Science and TechnologyZhejiang University of TechnologyHangzhouZhejiangChina
| | - Jianhao Yu
- School of Computer Science and TechnologyZhejiang University of TechnologyHangzhouZhejiangChina
| | - Bangze Zhang
- School of Computer Science and TechnologyZhejiang University of TechnologyHangzhouZhejiangChina
| | - Xiaojie Huang
- The Second Affiliated Hospital, School of MedicineZhejiang UniversityHangzhouChina
| | - Xiaoting Shen
- Stomatology Hospital, School of MedicineZhejiang UniversityHangzhouChina
| | - Ming Xia
- School of Computer Science and TechnologyZhejiang University of TechnologyHangzhouZhejiangChina
| |
Collapse
|
33
|
Huang W, Yan Q, Mou L, Zhao Y, Chen W. A novel multi-scale and fine-grained network for large choroidal vessels segmentation in OCT. Front Cell Dev Biol 2025; 13:1508358. [PMID: 39958890 PMCID: PMC11827571 DOI: 10.3389/fcell.2025.1508358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Accepted: 01/06/2025] [Indexed: 02/18/2025] Open
Abstract
Accurate segmentation of large choroidal vessels using optical coherence tomography (OCT) images enables unprecedented quantitative analysis to understand choroidal diseases. In this paper, we propose a novel multi-scale and fine-grained network called MFGNet. Since choroidal vessels are small targets, long-range dependencies need to be considered, therefore, we developed a two-branch fine-grained feature extraction module that can mix the long-range information extracted by TransFormer with the local information extracted by convolution in parallel, introducing information exchange between the two branches. To address the problem of low contrast and blurred boundaries of choroidal vessels in OCT images, we developed a large kernel and multi-scale attention module, which can improve the features of the target area through multi-scale convolution kernels, channel mixing and feature refinement. We quantitatively evaluated the MFGNet on 800 OCT images with large choroidal vessels manually annotated. The experimental results show that the proposed method has the best performance compared to the most advanced segmentation networks currently available. It is noteworthy that the large choroidal vessels were reconstructed in three dimensions (3D) based on the segmentation results and several 3D morphological parameters were calculated. The statistical analysis of these parameters revealed significant differences between the healthy control group and the high myopia group, thereby confirming the value of the proposed work in facilitating subsequent understanding of the disease and clinical decision-making.
Collapse
Affiliation(s)
- Wei Huang
- School of Biomedical Engineering, Hainan University, Haikou, China
| | - Qifeng Yan
- Laboratory of Advanced Theranostic Materials and Technology, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Lei Mou
- Laboratory of Advanced Theranostic Materials and Technology, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Yitian Zhao
- Laboratory of Advanced Theranostic Materials and Technology, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Wei Chen
- School of Biomedical Engineering, Hainan University, Haikou, China
- Ningbo Key Laboratory of Medical Research on Blinding Eye Diseases, Ningbo Eye Institute, Ningbo Eye Hospital, Wenzhou Medical University, Ningbo, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
34
|
Sahragard E, Farsi H, Mohamadzadeh S. Advancing semantic segmentation: Enhanced UNet algorithm with attention mechanism and deformable convolution. PLoS One 2025; 20:e0305561. [PMID: 39820812 PMCID: PMC11737789 DOI: 10.1371/journal.pone.0305561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 05/31/2024] [Indexed: 01/19/2025] Open
Abstract
This paper presents a novel method for improving semantic segmentation performance in computer vision tasks. Our approach utilizes an enhanced UNet architecture that leverages an improved ResNet50 backbone. We replace the last layer of ResNet50 with deformable convolution to enhance feature representation. Additionally, we incorporate an attention mechanism, specifically ECA-ASPP (Attention Spatial Pyramid Pooling), in the encoding path of UNet to capture multi-scale contextual information effectively. In the decoding path of UNet, we explore the use of attention mechanisms after concatenating low-level features with high-level features. Specifically, we investigate two types of attention mechanisms: ECA (Efficient Channel Attention) and LKA (Large Kernel Attention). Our experiments demonstrate that incorporating attention after concatenation improves segmentation accuracy. Furthermore, we compare the performance of ECA and LKA modules in the decoder path. The results indicate that the LKA module outperforms the ECA module. This finding highlights the importance of exploring different attention mechanisms and their impact on segmentation performance. To evaluate the effectiveness of the proposed method, we conduct experiments on benchmark datasets, including Stanford and Cityscapes, as well as the newly introduced WildPASS and DensPASS datasets. Based on our experiments, the proposed method achieved state-of-the-art results including mIoU 85.79 and 82.25 for the Stanford dataset, and the Cityscapes dataset, respectively. The results demonstrate that our proposed method performs well on these datasets, achieving state-of-the-art results with high segmentation accuracy.
Collapse
Affiliation(s)
- Effat Sahragard
- Department of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| | - Hassan Farsi
- Department of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| | - Sajad Mohamadzadeh
- Department of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| |
Collapse
|
35
|
Mokhtari A, Maris BM, Fiorini P. A Survey on Optical Coherence Tomography-Technology and Application. Bioengineering (Basel) 2025; 12:65. [PMID: 39851339 PMCID: PMC11761895 DOI: 10.3390/bioengineering12010065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2024] [Revised: 01/06/2025] [Accepted: 01/09/2025] [Indexed: 01/26/2025] Open
Abstract
This paper reviews the main research on Optical Coherence Tomography (OCT), focusing on the progress and advancements made by researchers over the past three decades in its methods and medical imaging applications. By analyzing existing studies and developments, this review aims to provide a foundation for future research in the field.
Collapse
Affiliation(s)
- Ali Mokhtari
- Department of Computer Science, University of Verona, 37134 Verona, Italy;
| | - Bogdan Mihai Maris
- Department of Engineering for Innovation Medicine, University of Verona, 37134 Verona, Italy;
| | - Paolo Fiorini
- Department of Engineering for Innovation Medicine, University of Verona, 37134 Verona, Italy;
| |
Collapse
|
36
|
Qi H, Wang W, Dang H, Chen Y, Jia M, Wang X. An Efficient Retinal Fluid Segmentation Network Based on Large Receptive Field Context Capture for Optical Coherence Tomography Images. ENTROPY (BASEL, SWITZERLAND) 2025; 27:60. [PMID: 39851680 PMCID: PMC11764744 DOI: 10.3390/e27010060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2024] [Revised: 01/08/2025] [Accepted: 01/09/2025] [Indexed: 01/26/2025]
Abstract
Optical Coherence Tomography (OCT) is a crucial imaging modality for diagnosing and monitoring retinal diseases. However, the accurate segmentation of fluid regions and lesions remains challenging due to noise, low contrast, and blurred edges in OCT images. Although feature modeling with wide or global receptive fields offers a feasible solution, it typically leads to significant computational overhead. To address these challenges, we propose LKMU-Lite, a lightweight U-shaped segmentation method tailored for retinal fluid segmentation. LKMU-Lite integrates a Decoupled Large Kernel Attention (DLKA) module that captures both local patterns and long-range dependencies, thereby enhancing feature representation. Additionally, it incorporates a Multi-scale Group Perception (MSGP) module that employs Dilated Convolutions with varying receptive field scales to effectively predict lesions of different shapes and sizes. Furthermore, a novel Aggregating-Shift decoder is proposed, reducing model complexity while preserving feature integrity. With only 1.02 million parameters and a computational complexity of 3.82 G FLOPs, LKMU-Lite achieves state-of-the-art performance across multiple metrics on the ICF and RETOUCH datasets, demonstrating both its efficiency and generalizability compared to existing methods.
Collapse
Affiliation(s)
- Hang Qi
- School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing 100081, China; (H.Q.); (W.W.); (H.D.); (Y.C.); (M.J.)
| | - Weijiang Wang
- School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing 100081, China; (H.Q.); (W.W.); (H.D.); (Y.C.); (M.J.)
- BIT Chongqing Institute of Microelectronics and Microsystems, Chongqing 401332, China
| | - Hua Dang
- School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing 100081, China; (H.Q.); (W.W.); (H.D.); (Y.C.); (M.J.)
| | - Yueyang Chen
- School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing 100081, China; (H.Q.); (W.W.); (H.D.); (Y.C.); (M.J.)
| | - Minli Jia
- School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing 100081, China; (H.Q.); (W.W.); (H.D.); (Y.C.); (M.J.)
| | - Xiaohua Wang
- School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing 100081, China; (H.Q.); (W.W.); (H.D.); (Y.C.); (M.J.)
- BIT Chongqing Institute of Microelectronics and Microsystems, Chongqing 401332, China
| |
Collapse
|
37
|
Xie Q, Li X, Li Y, Lu J, Ma S, Zhao Y, Zhang J. A multi-modal multi-branch framework for retinal vessel segmentation using ultra-widefield fundus photographs. Front Cell Dev Biol 2025; 12:1532228. [PMID: 39845080 PMCID: PMC11751237 DOI: 10.3389/fcell.2024.1532228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Accepted: 12/20/2024] [Indexed: 01/24/2025] Open
Abstract
Background Vessel segmentation in fundus photography has become a cornerstone technique for disease analysis. Within this field, Ultra-WideField (UWF) fundus images offer distinct advantages, including an expansive imaging range, detailed lesion data, and minimal adverse effects. However, the high resolution and low contrast inherent to UWF fundus images present significant challenges for accurate segmentation using deep learning methods, thereby complicating disease analysis in this context. Methods To address these issues, this study introduces M3B-Net, a novel multi-modal, multi-branch framework that leverages fundus fluorescence angiography (FFA) images to improve retinal vessel segmentation in UWF fundus images. Specifically, M3B-Net tackles the low segmentation accuracy caused by the inherently low contrast of UWF fundus images. Additionally, we propose an enhanced UWF-based segmentation network in M3B-Net, specifically designed to improve the segmentation of fine retinal vessels. The segmentation network includes the Selective Fusion Module (SFM), which enhances feature extraction within the segmentation network by integrating features generated during the FFA imaging process. To further address the challenges of high-resolution UWF fundus images, we introduce a Local Perception Fusion Module (LPFM) to mitigate context loss during the segmentation cut-patch process. Complementing this, the Attention-Guided Upsampling Module (AUM) enhances segmentation performance through convolution operations guided by attention mechanisms. Results Extensive experimental evaluations demonstrate that our approach significantly outperforms existing state-of-the-art methods for UWF fundus image segmentation.
Collapse
Affiliation(s)
- Qihang Xie
- Cixi Biomedical Research Institute, Wenzhou Medical University, Ningbo, China
- Laboratory of Advanced Theranostic Materials and Technology, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Xuefei Li
- Laboratory of Advanced Theranostic Materials and Technology, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Yuanyuan Li
- Cixi Biomedical Research Institute, Wenzhou Medical University, Ningbo, China
- Laboratory of Advanced Theranostic Materials and Technology, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Jiayi Lu
- Cixi Biomedical Research Institute, Wenzhou Medical University, Ningbo, China
- Laboratory of Advanced Theranostic Materials and Technology, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Shaodong Ma
- Laboratory of Advanced Theranostic Materials and Technology, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Yitian Zhao
- Cixi Biomedical Research Institute, Wenzhou Medical University, Ningbo, China
- Laboratory of Advanced Theranostic Materials and Technology, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Jiong Zhang
- Cixi Biomedical Research Institute, Wenzhou Medical University, Ningbo, China
- Laboratory of Advanced Theranostic Materials and Technology, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| |
Collapse
|
38
|
Li S, Ma F, Yan F, Dong X, Guo Y, Meng J, Liu H. SFNet: Spatial and Frequency Domain Networks for Wide-Field OCT Angiography Retinal Vessel Segmentation. JOURNAL OF BIOPHOTONICS 2025; 18:e202400420. [PMID: 39523861 DOI: 10.1002/jbio.202400420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 10/20/2024] [Accepted: 10/30/2024] [Indexed: 11/16/2024]
Abstract
Automatic segmentation of blood vessels in fundus images is important to assist ophthalmologists in diagnosis. However, automatic segmentation for Optical Coherence Tomography Angiography (OCTA) blood vessels has not been fully investigated due to various difficulties, such as vessel complexity. In addition, there are only a few publicly available OCTA image data sets for training and validating segmentation algorithms. To address these issues, we constructed a wild-field retinal OCTA segmentation data set, the Retinal Vessels Images in OCTA (REVIO) dataset. Second, we propose a new retinal vessel segmentation network based on spatial and frequency domain networks (SFNet). The proposed model are tested on three benchmark data sets including REVIO, ROSE and OCTA-500. The experimental results show superior performance on segmentation tasks compared to the representative methods.
Collapse
Affiliation(s)
- Sien Li
- School of Computer Science, Qufu Normal University, Rizhao, Shandong, China
| | - Fei Ma
- School of Computer Science, Qufu Normal University, Rizhao, Shandong, China
| | - Fen Yan
- Ultrasound Medicine Department Qufu People's Hospital, Qufu, Shandong, China
| | - Xiwei Dong
- School of Computer and Big Data Science, Jiujiang University, Jiujiang, Jiangxi, China
| | - Yanfei Guo
- School of Computer Science, Qufu Normal University, Rizhao, Shandong, China
| | - Jing Meng
- School of Computer Science, Qufu Normal University, Rizhao, Shandong, China
| | - Hongjuan Liu
- School of Computer Science, Qufu Normal University, Rizhao, Shandong, China
| |
Collapse
|
39
|
Baniecki H, Sobieski B, Szatkowski P, Bombinski P, Biecek P. Interpretable machine learning for time-to-event prediction in medicine and healthcare. Artif Intell Med 2025; 159:103026. [PMID: 39579416 DOI: 10.1016/j.artmed.2024.103026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 08/03/2024] [Accepted: 11/15/2024] [Indexed: 11/25/2024]
Abstract
Time-to-event prediction, e.g. cancer survival analysis or hospital length of stay, is a highly prominent machine learning task in medical and healthcare applications. However, only a few interpretable machine learning methods comply with its challenges. To facilitate a comprehensive explanatory analysis of survival models, we formally introduce time-dependent feature effects and global feature importance explanations. We show how post-hoc interpretation methods allow for finding biases in AI systems predicting length of stay using a novel multi-modal dataset created from 1235 X-ray images with textual radiology reports annotated by human experts. Moreover, we evaluate cancer survival models beyond predictive performance to include the importance of multi-omics feature groups based on a large-scale benchmark comprising 11 datasets from The Cancer Genome Atlas (TCGA). Model developers can use the proposed methods to debug and improve machine learning algorithms, while physicians can discover disease biomarkers and assess their significance. We contribute open data and code resources to facilitate future work in the emerging research direction of explainable survival analysis.
Collapse
Affiliation(s)
- Hubert Baniecki
- University of Warsaw, Warsaw, Poland; Warsaw University of Technology, Warsaw, Poland.
| | - Bartlomiej Sobieski
- University of Warsaw, Warsaw, Poland; Warsaw University of Technology, Warsaw, Poland
| | - Patryk Szatkowski
- Warsaw University of Technology, Warsaw, Poland; Medical University of Warsaw, Warsaw, Poland
| | - Przemyslaw Bombinski
- Warsaw University of Technology, Warsaw, Poland; Medical University of Warsaw, Warsaw, Poland
| | - Przemyslaw Biecek
- University of Warsaw, Warsaw, Poland; Warsaw University of Technology, Warsaw, Poland
| |
Collapse
|
40
|
Hu T, Lan Y, Zhang Y, Xu J, Li S, Hung CC. A lung nodule segmentation model based on the transformer with multiple thresholds and coordinate attention. Sci Rep 2024; 14:31743. [PMID: 39738386 PMCID: PMC11686213 DOI: 10.1038/s41598-024-82877-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Accepted: 12/10/2024] [Indexed: 01/02/2025] Open
Abstract
Accurate lung nodule segmentation is fundamental for the early detection of lung cancer. With the rapid development of deep learning, lung nodule segmentation models based on the encoder-decoder structure have become the mainstream research approach. However, during the encoding process, most models have limitations in extracting edge and semantic information and in capturing long-range dependencies. To address these problems, we propose a new lung nodule segmentation model, abbreviated as MCAT-Net. In this model, we construct a multi-threshold feature separation module to capture edge and texture features from different levels and specified intensities of the input image. Secondly, we introduce the coordinate attention mechanism, which allows the model to better recognize and utilize spatial information when handling long-range dependencies, enabling the deep network to maintain its sensitivity to nodule positions. Thirdly, we use the transformer to fully capture the long-range dependencies, further enhancing the global information integration of the network. The proposed method was verified on the LIDC-IDRI and LNDb datasets. The Dice similarity coefficient (DSC) values achieved were 88.29% and 78.51%, and the sensitivities were 86.33% and 75.05%, respectively. The experimental results demonstrated its high practical value for the early diagnosis of lung cancer.
Collapse
Affiliation(s)
- Tianjiao Hu
- School of Artificial Intelligence and Software Engineering, Nanyang Normal University, Nanyang, 473061, China
| | - Yihua Lan
- School of Artificial Intelligence and Software Engineering, Nanyang Normal University, Nanyang, 473061, China.
- Henan Engineering Research Center of Intelligent Processing for Big Data of Digital Image, Nanyang, 473061, China.
| | - Yingqi Zhang
- School of Artificial Intelligence and Software Engineering, Nanyang Normal University, Nanyang, 473061, China
| | - Jiashu Xu
- School of Artificial Intelligence and Software Engineering, Nanyang Normal University, Nanyang, 473061, China
| | - Shuai Li
- School of Artificial Intelligence and Software Engineering, Nanyang Normal University, Nanyang, 473061, China
| | - Chih-Cheng Hung
- Laboratory for Machine Vision and Security Research, Kennesaw State University-Marietta Campus, Marietta, USA
| |
Collapse
|
41
|
Li Y, Zhang X. Lightweight deep learning model for underwater waste segmentation based on sonar images. WASTE MANAGEMENT (NEW YORK, N.Y.) 2024; 190:63-73. [PMID: 39277917 DOI: 10.1016/j.wasman.2024.09.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 08/09/2024] [Accepted: 09/10/2024] [Indexed: 09/17/2024]
Abstract
In recent years, the rapid accumulation of marine waste not only endangers the ecological environment but also causes seawater pollution. Traditional manual salvage methods often have low efficiency and pose safety risks to human operators, making automatic underwater waste recycling a mainstream approach. In this paper, we propose a lightweight multi-scale cross-level network for underwater waste segmentation based on sonar images that provides pixel-level location information and waste categories for autonomous underwater robots. In particular, we introduce hybrid perception and multi-scale attention modules to capture multi-scale contextual features and enhance high-level critical information, respectively. At the same time, we use sampling attention modules and cross-level interaction modules to achieve feature down-sampling and fuse detailed features and semantic features, respectively. Relevant experimental results indicate that our method outperforms other semantic segmentation models and achieves 74.66 % mIoU with only 0.68 M parameters. In particular, compared with the representative PIDNet Small model based on the convolutional neural network architecture, our method can improve the mIoU metric by 1.15 percentage points and can reduce model parameters by approximately 91 %. Compared with the representative SeaFormer T model based on the transformer architecture, our approach can improve the mIoU metric by 2.07 percentage points and can reduce model parameters by approximately 59 %. Our approach maintains a satisfactory balance between model parameters and segmentation performance. Our solution provides new insights into intelligent underwater waste recycling, which helps in promoting sustainable marine development.
Collapse
Affiliation(s)
- Yangke Li
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, MOE Key Lab for Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China.
| | - Xinman Zhang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, MOE Key Lab for Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China.
| |
Collapse
|
42
|
Zhang Y, Deng X, Li T, Li Y, Wang X, Lu M, Yang L. A Neural Network for Segmenting Tumours in Ultrasound Rectal Images. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01358-6. [PMID: 39663316 DOI: 10.1007/s10278-024-01358-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 11/06/2024] [Accepted: 11/26/2024] [Indexed: 12/13/2024]
Abstract
Ultrasound imaging is the most cost-effective approach for the early detection of rectal cancer, which is a high-risk cancer. Our goal was to design an effective method that can accurately identify and segment rectal tumours in ultrasound images, thereby facilitating rectal cancer diagnoses for physicians. This would allow physicians to devote more time to determining whether the tumour is benign or malignant and whether it has metastasized rather than merely confirming its presence. Data originated from the Sichuan Province Cancer Hospital. The test, training, and validation sets were composed of 53 patients with 173 images, 195 patients with 1247 images, and 20 patients with 87 images, respectively. We created a deep learning network architecture consisting of encoders and decoders. To enhance global information capture, we substituted traditional convolutional decoders with global attention decoders and incorporated effective channel information fusion for multiscale information integration. The Dice coefficient (DSC) of the proposed model was 75.49%, which was 4.03% greater than that of the benchmark model, and the Hausdorff distance 95(HD95) was 24.75, which was 8.43 lower than that of the benchmark model. The paired t-test statistically confirmed the significance of the difference between our model and the benchmark model, with a p-value less than 0.05. The proposed method effectively identifies and segments rectal tumours of diverse shapes. Furthermore, it distinguishes between normal rectal images and those containing tumours. Therefore, after consultation with physicians, we believe that our method can effectively assist physicians in diagnosing rectal tumours via ultrasound.
Collapse
Affiliation(s)
- Yuanxi Zhang
- School of Optoelectronic Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China
| | - Xiwen Deng
- Department of Ultrasound, Sichuan Cancer Hospital Institute, Sichuan Cancer Center, School of Medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - Tingting Li
- Department of Ultrasound, Sichuan Cancer Hospital Institute, Sichuan Cancer Center, School of Medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - Yuan Li
- Department of Ultrasound, Sichuan Cancer Hospital Institute, Sichuan Cancer Center, School of Medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaohui Wang
- School of Optoelectronic Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China
| | - Man Lu
- Department of Ultrasound, Sichuan Cancer Hospital Institute, Sichuan Cancer Center, School of Medicine, University of Electronic Science and Technology of China, Chengdu, China.
| | - Lifeng Yang
- School of Optoelectronic Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China.
| |
Collapse
|
43
|
Dumbrique JIS, Hernandez RB, Cruz JML, Pagdanganan RM, Naval PC. Pneumothorax detection and segmentation from chest X-ray radiographs using a patch-based fully convolutional encoder-decoder network. FRONTIERS IN RADIOLOGY 2024; 4:1424065. [PMID: 39722784 PMCID: PMC11668597 DOI: 10.3389/fradi.2024.1424065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Accepted: 11/04/2024] [Indexed: 12/28/2024]
Abstract
Pneumothorax, a life-threatening condition characterized by air accumulation in the pleural cavity, requires early and accurate detection for optimal patient outcomes. Chest X-ray radiographs are a common diagnostic tool due to their speed and affordability. However, detecting pneumothorax can be challenging for radiologists because the sole visual indicator is often a thin displaced pleural line. This research explores deep learning techniques to automate and improve the detection and segmentation of pneumothorax from chest X-ray radiographs. We propose a novel architecture that combines the advantages of fully convolutional neural networks (FCNNs) and Vision Transformers (ViTs) while using only convolutional modules to avoid the quadratic complexity of ViT's self-attention mechanism. This architecture utilizes a patch-based encoder-decoder structure with skip connections to effectively combine high-level and low-level features. Compared to prior research and baseline FCNNs, our model demonstrates significantly higher accuracy in detection and segmentation while maintaining computational efficiency. This is evident on two datasets: (1) the SIIM-ACR Pneumothorax Segmentation dataset and (2) a novel dataset we curated from The Medical City, a private hospital in the Philippines. Ablation studies further reveal that using a mixed Tversky and Focal loss function significantly improves performance compared to using solely the Tversky loss. Our findings suggest our model has the potential to improve diagnostic accuracy and efficiency in pneumothorax detection, potentially aiding radiologists in clinical settings.
Collapse
Affiliation(s)
- Jakov Ivan S. Dumbrique
- Computer Vision and Machine Intelligence Group, Department of Computer Science, University of the Philippines-Diliman, Quezon City, Philippines
- Department of Mathematics, Ateneo de Manila University, Quezon City, Philippines
| | - Reynan B. Hernandez
- Ateneo School of Medicine and Public Health, Pasig, Philippines
- Department of Radiology, The Medical City, Pasig, Philippines
| | | | | | - Prospero C. Naval
- Computer Vision and Machine Intelligence Group, Department of Computer Science, University of the Philippines-Diliman, Quezon City, Philippines
| |
Collapse
|
44
|
Krikid F, Rositi H, Vacavant A. State-of-the-Art Deep Learning Methods for Microscopic Image Segmentation: Applications to Cells, Nuclei, and Tissues. J Imaging 2024; 10:311. [PMID: 39728208 DOI: 10.3390/jimaging10120311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 11/20/2024] [Accepted: 12/02/2024] [Indexed: 12/28/2024] Open
Abstract
Microscopic image segmentation (MIS) is a fundamental task in medical imaging and biological research, essential for precise analysis of cellular structures and tissues. Despite its importance, the segmentation process encounters significant challenges, including variability in imaging conditions, complex biological structures, and artefacts (e.g., noise), which can compromise the accuracy of traditional methods. The emergence of deep learning (DL) has catalyzed substantial advancements in addressing these issues. This systematic literature review (SLR) provides a comprehensive overview of state-of-the-art DL methods developed over the past six years for the segmentation of microscopic images. We critically analyze key contributions, emphasizing how these methods specifically tackle challenges in cell, nucleus, and tissue segmentation. Additionally, we evaluate the datasets and performance metrics employed in these studies. By synthesizing current advancements and identifying gaps in existing approaches, this review not only highlights the transformative potential of DL in enhancing diagnostic accuracy and research efficiency but also suggests directions for future research. The findings of this study have significant implications for improving methodologies in medical and biological applications, ultimately fostering better patient outcomes and advancing scientific understanding.
Collapse
Affiliation(s)
- Fatma Krikid
- Institut Pascal, CNRS, Clermont Auvergne INP, Université Clermont Auvergne, F-63000 Clermont-Ferrand, France
| | - Hugo Rositi
- LORIA, CNRS, Université de Lorraine, F-54000 Nancy, France
| | - Antoine Vacavant
- Institut Pascal, CNRS, Clermont Auvergne INP, Université Clermont Auvergne, F-63000 Clermont-Ferrand, France
| |
Collapse
|
45
|
Gao Y, Chen X, Yang Q, Lasso A, Kolesov I, Pieper S, Kikinis R, Tannenbaum A, Zhu L. An effective and open source interactive 3D medical image segmentation solution. Sci Rep 2024; 14:29878. [PMID: 39622975 PMCID: PMC11612195 DOI: 10.1038/s41598-024-80206-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Accepted: 11/15/2024] [Indexed: 12/06/2024] Open
Abstract
3D medical image segmentation is a key step in numerous clinical applications. Even though many automatic segmentation solutions have been proposed, it is arguably that medical image segmentation is more of a preference than a reference as inter- and intra-variability are widely observed in final segmentation output. Therefore, designing a user oriented and open-source solution for interactive annotation is of great value for the community. In this paper, we present an effective interactive segmentation method that employs an adaptive dynamic programming approach to incorporates users' interactions efficiently. The method first initializes an segmentation through a feature-based geodesic computation. Then, the segmentation is further refined by using an efficient updating scheme requiring only local computations when new user inputs are available, making it applicable to high resolution images and very complex structures. The proposed method is implemented as a user-oriented software module in 3D Slicer. Our approach demonstrates several strengths and contributions. First, we proposed an efficient and effective 3D interactive algorithm with the adaptive dynamic programming method. Second, this is not just a presented algorithm, but also a software with well-designed GUI for users. Third, its open-source nature allows users to make customized modifications according to their specific requirements.
Collapse
Affiliation(s)
- Yi Gao
- School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, 518060, China.
- Shenzhen Key Laboratory of Precision Medicine for Hematological Malignancies, Shenzhen, 518060, China.
- Marshall Laboratory of Biomedical Engineering, Shenzhen, 518060, China.
- Guangdong Provincial Key Laboratory of Mathematical and Neural Dynamical Systems, 523000, Dongguan , China.
| | - Xiaohui Chen
- School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, 518060, China
| | - Qinzhu Yang
- School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, 518060, China
| | - Andras Lasso
- Laboratory for Percutaneous Surgery, School of Computing, Queen's University, Kingston, Canada
| | - Ivan Kolesov
- Departments of Computer Science/Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, USA
| | | | - Ron Kikinis
- Department of Radiology, Brigham and Women's Hospital, and Harvard Medical School, Boston, Massachusetts, USA
| | - Allen Tannenbaum
- Departments of Computer Science/Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, USA
| | - Liangjia Zhu
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, 30303, USA.
| |
Collapse
|
46
|
Tong L, Li T, Zhang Q, Zhang Q, Zhu R, Du W, Hu P. LiViT-Net: A U-Net-like, lightweight Transformer network for retinal vessel segmentation. Comput Struct Biotechnol J 2024; 24:213-224. [PMID: 38572168 PMCID: PMC10987887 DOI: 10.1016/j.csbj.2024.03.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/22/2024] [Accepted: 03/04/2024] [Indexed: 04/05/2024] Open
Abstract
The intricate task of precisely segmenting retinal vessels from images, which is critical for diagnosing various eye diseases, presents significant challenges for models due to factors such as scale variation, complex anatomical patterns, low contrast, and limitations in training data. Building on these challenges, we offer novel contributions spanning model architecture, loss function design, robustness, and real-time efficacy. To comprehensively address these challenges, a new U-Net-like, lightweight Transformer network for retinal vessel segmentation is presented. By integrating MobileViT+ and a novel local representation in the encoder, our design emphasizes lightweight processing while capturing intricate image structures, enhancing vessel edge precision. A novel joint loss is designed, leveraging the characteristics of weighted cross-entropy and Dice loss to effectively guide the model through the task's challenges, such as foreground-background imbalance and intricate vascular structures. Exhaustive experiments were performed on three prominent retinal image databases. The results underscore the robustness and generalizability of the proposed LiViT-Net, which outperforms other methods in complex scenarios, especially in intricate environments with fine vessels or vessel edges. Importantly, optimized for efficiency, LiViT-Net excels on devices with constrained computational power, as evidenced by its fast performance. To demonstrate the model proposed in this study, a freely accessible and interactive website was established (https://hz-t3.matpool.com:28765?token=aQjYR4hqMI), revealing real-time performance with no login requirements.
Collapse
Affiliation(s)
- Le Tong
- The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, No. 100 Haisi Road, Shanghai, 201418, China
| | - Tianjiu Li
- The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, No. 100 Haisi Road, Shanghai, 201418, China
| | - Qian Zhang
- The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, No. 100 Haisi Road, Shanghai, 201418, China
| | - Qin Zhang
- Ophthalmology Department, Jing'an District Central Hospital, No. 259, Xikang Road, Shanghai, 200040, China
| | - Renchaoli Zhu
- The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, No. 100 Haisi Road, Shanghai, 201418, China
| | - Wei Du
- Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, No. 130 Meilong Road, Shanghai, 200237, China
| | - Pengwei Hu
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, 40-1 South Beijing Road, Urumqi, 830011, China
| |
Collapse
|
47
|
Liu H, Zeng Y, Li H, Wang F, Chang J, Guo H, Zhang J. DDANet: A deep dilated attention network for intracerebral haemorrhage segmentation. IET Syst Biol 2024; 18:285-297. [PMID: 39582103 DOI: 10.1049/syb2.12103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 10/08/2024] [Accepted: 10/18/2024] [Indexed: 11/26/2024] Open
Abstract
Intracranial haemorrhage (ICH) is an urgent and potentially fatal medical condition caused by brain blood vessel rupture, leading to blood accumulation in the brain tissue. Due to the pressure and damage it causes to brain tissue, ICH results in severe neurological impairment or even death. Recently, deep neural networks have been widely applied to enhance the speed and precision of ICH detection yet they are still challenged by small or subtle hemorrhages. The authors introduce DDANet, a novel haematoma segmentation model for brain CT images. Specifically, a dilated convolution pooling block is introduced in the intermediate layers of the encoder to enhance feature extraction capabilities of middle layers. Additionally, the authors incorporate a self-attention mechanism to capture global semantic information of high-level features to guide the extraction and processing of low-level features, thereby enhancing the model's understanding of the overall structure while maintaining details. DDANet also integrates residual networks, channel attention, and spatial attention mechanisms for joint optimisation, effectively mitigating the severe class imbalance problem and aiding the training process. Experiments show that DDANet outperforms current methods, achieving the Dice coefficient, Jaccard index, sensitivity, accuracy, and a specificity of 0.712, 0.601, 0.73, 0.997, and 0.998, respectively. The code is available at https://github.com/hpguo1982/DDANet.
Collapse
Affiliation(s)
- Haiyan Liu
- Department of Neurology, Xinyang Central Hospital, Xinyang, China
- School of Medicine, Xinyang Normal University, Xinyang, China
| | - Yu Zeng
- School of Computer and Information Techonology, Xinyang Normal University, Xinyang, China
| | - Hao Li
- Department of Neurology, Xinyang Central Hospital, Xinyang, China
- School of Medicine, Xinyang Normal University, Xinyang, China
| | - Fuxin Wang
- Department of Neurology, Xinyang Central Hospital, Xinyang, China
- School of Medicine, Xinyang Normal University, Xinyang, China
| | - Jianjun Chang
- Department of Neurology, Xinyang Central Hospital, Xinyang, China
| | - Huaping Guo
- School of Computer and Information Techonology, Xinyang Normal University, Xinyang, China
| | - Jian Zhang
- School of Computer and Information Techonology, Xinyang Normal University, Xinyang, China
| |
Collapse
|
48
|
Liao F, Li D, Yang X, Cao W, Xiang D, Yuan G, Wang Y, Zheng J. Topology-preserving segmentation of abdominal muscle layers from ultrasound images. Med Phys 2024; 51:8900-8914. [PMID: 39241262 DOI: 10.1002/mp.17377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 07/16/2024] [Accepted: 07/16/2024] [Indexed: 09/08/2024] Open
Abstract
BACKGROUND In clinical anesthesia, precise segmentation of muscle layers from abdominal ultrasound images is crucial for identifying nerve block locations accurately. Despite deep learning advancements, challenges persist in segmenting muscle layers with accurate topology due to pseudo and weak edges caused by acoustic artifacts in ultrasound imagery. PURPOSE To assist anesthesiologists in locating nerve block areas, we have developed a novel deep learning algorithm that can accurately segment muscle layers in abdominal ultrasound images with interference. METHODS We propose a comprehensive approach emphasizing the preservation of the segmentation's low-rank property to ensure correct topology. Our methodology integrates a Semantic Feature Extraction (SFE) module for redundant encoding, a Low-rank Reconstruction (LR) module to compress this encoding, and an Edge Reconstruction (ER) module to refine segmentation boundaries. Our evaluation involved rigorous testing on clinical datasets, comparing our algorithm against seven established deep learning-based segmentation methods using metrics such as Mean Intersection-over-Union (MIoU) and Hausdorff distance (HD). Statistical rigor was ensured through effect size quantification with Cliff's Delta, Multivariate Analysis of Variance (MANOVA) for multivariate analysis, and application of the Holm-Bonferroni method for multiple comparisons correction. RESULTS We demonstrate that our method outperforms other industry-recognized deep learning approaches on both MIoU and HD metrics, achieving the best outcomes with 88.21%/4.98 (p m a x = 0.1893 $p_{max}=0.1893$ ) on the standard test set and 85.48%/6.98 (p m a x = 0.0448 $p_{max}=0.0448$ ) on the challenging test set. The best&worst results for the other models on the standard test set were (87.20%/5.72)&(83.69%/8.12), and on the challenging test set were (81.25%/10.00)&(71.74%/16.82). Ablation studies further validate the distinct contributions of the proposed modules, which synergistically achieve a balance between maintaining topological integrity and edge precision. CONCLUSIONS Our findings validate the effective segmentation of muscle layers with accurate topology in complex ultrasound images, leveraging low-rank constraints. The proposed method not only advances the field of medical imaging segmentation but also offers practical benefits for clinical anesthesia by improving the reliability of nerve block localization.
Collapse
Affiliation(s)
- Feiyang Liao
- Division of Life Sciences and Medicine, School of Biomedical Engineering (Suzhou), University of Science and Technology of China, Suzhou, China
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, China
| | - Dongli Li
- Department of Anesthesiology, Huashan Hospital, Fudan University, Shanghai, China
| | - Xiaoyu Yang
- Department of Anesthesiology, Huashan Hospital, Fudan University, Shanghai, China
| | - Weiwei Cao
- Division of Life Sciences and Medicine, School of Biomedical Engineering (Suzhou), University of Science and Technology of China, Suzhou, China
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, China
| | - Dehui Xiang
- School of Electronic and Information Engineering, Soochow University, Jiangsu, China
| | - Gang Yuan
- Division of Life Sciences and Medicine, School of Biomedical Engineering (Suzhou), University of Science and Technology of China, Suzhou, China
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, China
| | - Yingwei Wang
- Department of Anesthesiology, Huashan Hospital, Fudan University, Shanghai, China
| | - Jian Zheng
- Division of Life Sciences and Medicine, School of Biomedical Engineering (Suzhou), University of Science and Technology of China, Suzhou, China
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, China
- Jinan Guoke Medical Technology Development Co., Ltd, Jinan, China
| |
Collapse
|
49
|
Sun J, Li Q, Liu Y, Liu Y, Coatrieux G, Coatrieux JL, Chen Y, Lu J. Pathological Asymmetry-Guided Progressive Learning for Acute Ischemic Stroke Infarct Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:4146-4160. [PMID: 38875085 DOI: 10.1109/tmi.2024.3414842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2024]
Abstract
Quantitative infarct estimation is crucial for diagnosis, treatment and prognosis in acute ischemic stroke (AIS) patients. As the early changes of ischemic tissue are subtle and easily confounded by normal brain tissue, it remains a very challenging task. However, existing methods often ignore or confuse the contribution of different types of anatomical asymmetry caused by intrinsic and pathological changes to segmentation. Further, inefficient domain knowledge utilization leads to mis-segmentation for AIS infarcts. Inspired by this idea, we propose a pathological asymmetry-guided progressive learning (PAPL) method for AIS infarct segmentation. PAPL mimics the step-by-step learning patterns observed in humans, including three progressive stages: knowledge preparation stage, formal learning stage, and examination improvement stage. First, knowledge preparation stage accumulates the preparatory domain knowledge of the infarct segmentation task, helping to learn domain-specific knowledge representations to enhance the discriminative ability for pathological asymmetries by constructed contrastive learning task. Then, formal learning stage efficiently performs end-to-end training guided by learned knowledge representations, in which the designed feature compensation module (FCM) can leverage the anatomy similarity between adjacent slices from the volumetric medical image to help aggregate rich anatomical context information. Finally, examination improvement stage encourages improving the infarct prediction from the previous stage, where the proposed perception refinement strategy (RPRS) further exploits the bilateral difference comparison to correct the mis-segmentation infarct regions by adaptively regional shrink and expansion. Extensive experiments on public and in-house NCCT datasets demonstrated the superiority of the proposed PAPL, which is promising to help better stroke evaluation and treatment.
Collapse
|
50
|
Huang W, Zhang L, Shu X, Wang Z, Yi Z. Adaptive Annotation Correlation Based Multi-Annotation Learning for Calibrated Medical Image Segmentation. IEEE J Biomed Health Inform 2024; 28:7175-7183. [PMID: 39196744 DOI: 10.1109/jbhi.2024.3451210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2024]
Abstract
Medical image segmentation is a fundamental task in many clinical applications, yet current automated segmentation methods rely heavily on manual annotations, which are inherently subjective and prone to annotation bias. Recently, modeling annotator preference has garnered great interest, and several methods have been proposed in the past two years. However, the existing methods completely ignore the potential correlation between annotations, such as complementary and discriminative information. In this work, the Adaptive annotation CorrelaTion based multI-annOtation LearNing (ACTION) method is proposed for calibrated medical image segmentation. ACTION employs consensus feature learning and dynamic adaptive weighting to leverage complementary information across annotations and emphasize discriminative information within each annotation based on their correlations, respectively. Meanwhile, memory accumulation-replay is proposed to accumulate the prior knowledge and integrate it into the model to enable the model to accommodate the multi-annotation setting. Two medical image benchmarks with different modalities are utilized to evaluate the performance of ACTION, and extensive experimental results demonstrate that it achieves superior performance compared to several state-of-the-art methods.
Collapse
|