1
|
Jasrotia H, Singh C, Kaur S. EfficientNet-Based Attention Residual U-Net With Guided Loss for Breast Tumor Segmentation in Ultrasound Images. ULTRASOUND IN MEDICINE & BIOLOGY 2025:S0301-5629(25)00088-2. [PMID: 40263094 DOI: 10.1016/j.ultrasmedbio.2025.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Revised: 03/12/2025] [Accepted: 03/18/2025] [Indexed: 04/24/2025]
Abstract
OBJECTIVE Breast cancer poses a major health concern for women globally. Effective segmentation of breast tumors for ultrasound images is crucial for early diagnosis and treatment. Conventional convolutional neural networks have shown promising results in this domain but face challenges due to image complexities and are computationally expensive, limiting their practical application in real-time clinical settings. METHODS We propose Eff-AResUNet-GL, a segmentation model using EfficienetNet-B3 as the encoder. This design integrates attention gates in skip connections to focus on significant features and residual blocks in the decoder to retain details and reduce gradient loss. Additionally, guided loss functions are applied at each decoder layer to generate better features, subsequently improving segmentation accuracy. RESULTS Experimental results on BUSIS and Dataset B demonstrate that Eff-AResUNet-GL achieves superior performance and computational efficiency compared to state-of-the-art models, showing robustness in handling complex breast ultrasound images. CONCLUSION Eff-AResUNet-GL offers a practical, high-performing solution for breast tumor segmentation, demonstrating potential clinical through improved segmentation accuracy and efficiency.
Collapse
Affiliation(s)
- Heena Jasrotia
- Department of Computer Science, Punjabi University, Patiala, India.
| | - Chandan Singh
- Department of Computer Science, Punjabi University, Patiala, India
| | - Sukhjeet Kaur
- Department of Computer Science, Punjabi University, Patiala, India
| |
Collapse
|
2
|
Kong L, Wei Q, Xu C, Ye X, Liu W, Wang M, Fu Y, Chen H. EFCNet enhances the efficiency of segmenting clinically significant small medical objects. Sci Rep 2025; 15:12813. [PMID: 40229279 PMCID: PMC11997218 DOI: 10.1038/s41598-025-93171-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Accepted: 03/05/2025] [Indexed: 04/16/2025] Open
Abstract
Efficient segmentation of small hyperreflective dots, key biomarkers for diseases like macular edema, is critical for diagnosis and treatment monitoring.However, existing models, including Convolutional Neural Networks (CNNs) and Transformers, struggle with these minute structures due to information loss.To address this, we introduce EFCNet, which integrates the Cross-Stage Axial Attention (CSAA) module for enhanced feature fusion and the Multi-Precision Supervision (MPS) module for improved hierarchical guidance. We evaluated EFCNet on two datasets: S-HRD, comprising 313 retinal OCT scans from patients with macular edema, and S-Polyp, a 229-image subset of the publicly available CVC-ClinicDB colonoscopy dataset. EFCNet outperformed state-of-the-art models, achieving average Dice Similarity Coefficient (DSC) gains of 4.88% on S-HRD and 3.49% on S-Polyp, alongside Intersection over Union (IoU) improvements of 3.77% and 3.25%, respectively. Notably, smaller objects benefit most, highlighting EFCNet's effectiveness where conventional models underperform. Unlike U-Net-Large, which offers marginal gains with increased scale, EFCNet's superior performance is driven by its novel design. These findings demonstrate its effectiveness and potential utility in clinical practice.
Collapse
Affiliation(s)
- Lingjie Kong
- School of Data Science, Fudan University, Shanghai, 200433, China
| | - Qiaoling Wei
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
| | - Chengming Xu
- School of Data Science, Fudan University, Shanghai, 200433, China
| | - Xiaofeng Ye
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
- Diagnosis and Treatment Center of Macular Disease, Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
| | - Wei Liu
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
- Diagnosis and Treatment Center of Macular Disease, Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
| | - Min Wang
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
- Diagnosis and Treatment Center of Macular Disease, Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
| | - Yanwei Fu
- School of Data Science, Fudan University, Shanghai, 200433, China.
| | - Han Chen
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China.
| |
Collapse
|
3
|
Du Y, Chen X, Fu Y. Multiscale transformers and multi-attention mechanism networks for pathological nuclei segmentation. Sci Rep 2025; 15:12549. [PMID: 40221423 PMCID: PMC11993704 DOI: 10.1038/s41598-025-90397-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Accepted: 02/12/2025] [Indexed: 04/14/2025] Open
Abstract
Pathology nuclei segmentation is crucial of computer-aided diagnosis in pathology. However, due to the high density, complex backgrounds, and blurred cell boundaries, it makes pathology cell segmentation still a challenging problem. In this paper, we propose a network model for pathology image segmentation based on a multi-scale Transformer multi-attention mechanism. To solve the problem that the high density of cell nuclei and the complexity of the background make it difficult to extract features, a dense attention module is embedded in the encoder, which improves the learning of the target cell information to minimize target information loss; Additionally, to solve the problem of poor segmentation accuracy due to the blurred cell boundaries, the Multi-scale Transformer Attention module is embedded between encoder and decoder, improving the transfer of the boundary feature information and makes the segmented cell boundaries more accurate. Experimental results on MoNuSeg, GlaS and CoNSeP datasets demonstrate the network's superior accuracy.
Collapse
Affiliation(s)
- Yongzhao Du
- College of Engineering, Huaqiao University, Fujian, 362021, China.
- College of Internet of Things Industry, Huaqiao University, Fujian, 362021, China.
| | - Xin Chen
- College of Engineering, Huaqiao University, Fujian, 362021, China
| | - Yuqing Fu
- College of Engineering, Huaqiao University, Fujian, 362021, China
- College of Internet of Things Industry, Huaqiao University, Fujian, 362021, China
| |
Collapse
|
4
|
Ghadimi DJ, Vahdani AM, Karimi H, Ebrahimi P, Fathi M, Moodi F, Habibzadeh A, Khodadadi Shoushtari F, Valizadeh G, Mobarak Salari H, Saligheh Rad H. Deep Learning-Based Techniques in Glioma Brain Tumor Segmentation Using Multi-Parametric MRI: A Review on Clinical Applications and Future Outlooks. J Magn Reson Imaging 2025; 61:1094-1109. [PMID: 39074952 DOI: 10.1002/jmri.29543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 07/07/2024] [Accepted: 07/08/2024] [Indexed: 07/31/2024] Open
Abstract
This comprehensive review explores the role of deep learning (DL) in glioma segmentation using multiparametric magnetic resonance imaging (MRI) data. The study surveys advanced techniques such as multiparametric MRI for capturing the complex nature of gliomas. It delves into the integration of DL with MRI, focusing on convolutional neural networks (CNNs) and their remarkable capabilities in tumor segmentation. Clinical applications of DL-based segmentation are highlighted, including treatment planning, monitoring treatment response, and distinguishing between tumor progression and pseudo-progression. Furthermore, the review examines the evolution of DL-based segmentation studies, from early CNN models to recent advancements such as attention mechanisms and transformer models. Challenges in data quality, gradient vanishing, and model interpretability are discussed. The review concludes with insights into future research directions, emphasizing the importance of addressing tumor heterogeneity, integrating genomic data, and ensuring responsible deployment of DL-driven healthcare technologies. EVIDENCE LEVEL: N/A TECHNICAL EFFICACY: Stage 2.
Collapse
Affiliation(s)
- Delaram J Ghadimi
- School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Amir M Vahdani
- Image Guided Surgery Lab, Research Center for Biomedical Technologies and Robotics, Advanced Medical Technologies and Equipment Institute, Imam Khomeini Hospital Complex, Tehran University of Medical Sciences, Tehran, Iran
| | - Hanie Karimi
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Pouya Ebrahimi
- Tehran Heart Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Mobina Fathi
- School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farzan Moodi
- School of Medicine, Iran University of Medical Sciences, Tehran, Iran
- Quantitative MR Imaging and Spectroscopy Group (QMISG), Tehran University of Medical Sciences, Tehran, Iran
| | - Adrina Habibzadeh
- Student Research Committee, Fasa University of Medical Sciences, Fasa, Iran
| | | | - Gelareh Valizadeh
- Quantitative MR Imaging and Spectroscopy Group (QMISG), Tehran University of Medical Sciences, Tehran, Iran
| | - Hanieh Mobarak Salari
- Quantitative MR Imaging and Spectroscopy Group (QMISG), Tehran University of Medical Sciences, Tehran, Iran
| | - Hamidreza Saligheh Rad
- Quantitative MR Imaging and Spectroscopy Group (QMISG), Tehran University of Medical Sciences, Tehran, Iran
- Department of Medical Physics and Biomedical Engineering, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
5
|
Karunanayake N, Lu L, Yang H, Geng P, Akin O, Furberg H, Schwartz LH, Zhao B. Dual-Stage AI Model for Enhanced CT Imaging: Precision Segmentation of Kidney and Tumors. Tomography 2025; 11:3. [PMID: 39852683 PMCID: PMC11769543 DOI: 10.3390/tomography11010003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2024] [Revised: 12/16/2024] [Accepted: 12/21/2024] [Indexed: 01/26/2025] Open
Abstract
OBJECTIVES Accurate kidney and tumor segmentation of computed tomography (CT) scans is vital for diagnosis and treatment, but manual methods are time-consuming and inconsistent, highlighting the value of AI automation. This study develops a fully automated AI model using vision transformers (ViTs) and convolutional neural networks (CNNs) to detect and segment kidneys and kidney tumors in Contrast-Enhanced (CECT) scans, with a focus on improving sensitivity for small, indistinct tumors. METHODS The segmentation framework employs a ViT-based model for the kidney organ, followed by a 3D UNet model with enhanced connections and attention mechanisms for tumor detection and segmentation. Two CECT datasets were used: a public dataset (KiTS23: 489 scans) and a private institutional dataset (Private: 592 scans). The AI model was trained on 389 public scans, with validation performed on the remaining 100 scans and external validation performed on all 592 private scans. Tumors were categorized by TNM staging as small (≤4 cm) (KiTS23: 54%, Private: 41%), medium (>4 cm to ≤7 cm) (KiTS23: 24%, Private: 35%), and large (>7 cm) (KiTS23: 22%, Private: 24%) for detailed evaluation. RESULTS Kidney and kidney tumor segmentations were evaluated against manual annotations as the reference standard. The model achieved a Dice score of 0.97 ± 0.02 for kidney organ segmentation. For tumor detection and segmentation on the KiTS23 dataset, the sensitivities and average false-positive rates per patient were as follows: 0.90 and 0.23 for small tumors, 1.0 and 0.08 for medium tumors, and 0.96 and 0.04 for large tumors. The corresponding Dice scores were 0.84 ± 0.11, 0.89 ± 0.07, and 0.91 ± 0.06, respectively. External validation on the private data confirmed the model's effectiveness, achieving the following sensitivities and average false-positive rates per patient: 0.89 and 0.15 for small tumors, 0.99 and 0.03 for medium tumors, and 1.0 and 0.01 for large tumors. The corresponding Dice scores were 0.84 ± 0.08, 0.89 ± 0.08, and 0.92 ± 0.06. CONCLUSIONS The proposed model demonstrates consistent and robust performance in segmenting kidneys and kidney tumors of various sizes, with effective generalization to unseen data. This underscores the model's significant potential for clinical integration, offering enhanced diagnostic precision and reliability in radiological assessments.
Collapse
Affiliation(s)
- Nalan Karunanayake
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (L.L.); (H.Y.); (P.G.); (O.A.); (L.H.S.); (B.Z.)
| | - Lin Lu
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (L.L.); (H.Y.); (P.G.); (O.A.); (L.H.S.); (B.Z.)
| | - Hao Yang
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (L.L.); (H.Y.); (P.G.); (O.A.); (L.H.S.); (B.Z.)
| | - Pengfei Geng
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (L.L.); (H.Y.); (P.G.); (O.A.); (L.H.S.); (B.Z.)
| | - Oguz Akin
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (L.L.); (H.Y.); (P.G.); (O.A.); (L.H.S.); (B.Z.)
| | - Helena Furberg
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10017, USA;
| | - Lawrence H. Schwartz
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (L.L.); (H.Y.); (P.G.); (O.A.); (L.H.S.); (B.Z.)
| | - Binsheng Zhao
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (L.L.); (H.Y.); (P.G.); (O.A.); (L.H.S.); (B.Z.)
| |
Collapse
|
6
|
Yang S, Zhang X, He Y, Chen Y, Zhou Y. TBE-Net: A Deep Network Based on Tree-Like Branch Encoder for Medical Image Segmentation. IEEE J Biomed Health Inform 2025; 29:521-534. [PMID: 39374271 DOI: 10.1109/jbhi.2024.3468904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/09/2024]
Abstract
In recent years, encoder-decoder-based network structures have been widely used in designing medical image segmentation models. However, these methods still face some limitations: 1) The network's feature extraction capability is limited, primarily due to insufficient attention to the encoder, resulting in a failure to extract rich and effective features. 2) Unidirectional stepwise decoding of smaller-sized feature maps restricts segmentation performance. To address the above limitations, we propose an innovative Tree-like Branch Encoder Network (TBE-Net), which adopts a tree-like branch encoder to better perform feature extraction and preserve feature information. Additionally, we introduce the Depth and Width Expansion (DWE) module to expand the network depth and width at low parameter cost, thereby enhancing network performance. Furthermore, we design a Deep Aggregation Module (DAM) to better aggregate and process encoder features. Subsequently, we directly decode the aggregated features to generate the segmentation map. The experimental results show that, compared to other advanced algorithms, our method, with the lowest parameter cost, achieved improvements in the IoU metric on the TNBC, PH2, CHASE-DB1, STARE, and COVID-19-CT-Seg datasets by 1.6%, 0.46%, 0.81%, 1.96%, and 0.86%, respectively.
Collapse
|
7
|
Dong K, Hu P, Zhu Y, Tian Y, Li X, Zhou T, Bai X, Liang T, Li J. Attention-enhanced multiscale feature fusion network for pancreas and tumor segmentation. Med Phys 2024; 51:8999-9016. [PMID: 39306864 DOI: 10.1002/mp.17385] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 07/16/2024] [Accepted: 08/20/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Accurate pancreas and pancreatic tumor segmentation from abdominal scans is crucial for diagnosing and treating pancreatic diseases. Automated and reliable segmentation algorithms are highly desirable in both clinical practice and research. PURPOSE Segmenting the pancreas and tumors is challenging due to their low contrast, irregular morphologies, and variable anatomical locations. Additionally, the substantial difference in size between the pancreas and small tumors makes this task difficult. This paper proposes an attention-enhanced multiscale feature fusion network (AMFF-Net) to address these issues via 3D attention and multiscale context fusion methods. METHODS First, to prevent missed segmentation of tumors, we design the residual depthwise attention modules (RDAMs) to extract global features by expanding receptive fields of shallow layers in the encoder. Second, hybrid transformer modules (HTMs) are proposed to model deep semantic features and suppress irrelevant regions while highlighting critical anatomical characteristics. Additionally, the multiscale feature fusion module (MFFM) fuses adjacent top and bottom scale semantic features to address the size imbalance issue. RESULTS The proposed AMFF-Net was evaluated on the public MSD dataset, achieving 82.12% DSC for pancreas and 57.00% for tumors. It also demonstrated effective segmentation performance on the NIH and private datasets, outperforming previous State-Of-The-Art (SOTA) methods. Ablation studies verify the effectiveness of RDAMs, HTMs, and MFFM. CONCLUSIONS We propose an effective deep learning network for pancreas and tumor segmentation from abdominal CT scans. The proposed modules can better leverage global dependencies and semantic information and achieve significantly higher accuracy than the previous SOTA methods.
Collapse
Affiliation(s)
- Kaiqi Dong
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Peijun Hu
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
- Research Center for Data Hub and Security, Zhejiang Laboratory, Hangzhou, China
| | - Yan Zhu
- Research Center for Data Hub and Security, Zhejiang Laboratory, Hangzhou, China
| | - Yu Tian
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Xiang Li
- Department of Hepatobiliary and Pancreatic Surgery, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Tianshu Zhou
- Research Center for Data Hub and Security, Zhejiang Laboratory, Hangzhou, China
| | - Xueli Bai
- Department of Hepatobiliary and Pancreatic Surgery, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Tingbo Liang
- Department of Hepatobiliary and Pancreatic Surgery, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Jingsong Li
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
- Research Center for Data Hub and Security, Zhejiang Laboratory, Hangzhou, China
| |
Collapse
|
8
|
Xu Z, Rittscher J, Ali S. SSL-CPCD: Self-Supervised Learning With Composite Pretext-Class Discrimination for Improved Generalisability in Endoscopic Image Analysis. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:4105-4119. [PMID: 38857149 DOI: 10.1109/tmi.2024.3411933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2024]
Abstract
Data-driven methods have shown tremendous progress in medical image analysis. In this context, deep learning-based supervised methods are widely popular. However, they require a large amount of training data and face issues in generalisability to unseen datasets that hinder clinical translation. Endoscopic imaging data is characterised by large inter- and intra-patient variability that makes these models more challenging to learn representative features for downstream tasks. Thus, despite the publicly available datasets and datasets that can be generated within hospitals, most supervised models still underperform. While self-supervised learning has addressed this problem to some extent in natural scene data, there is a considerable performance gap in the medical image domain. In this paper, we propose to explore patch-level instance-group discrimination and penalisation of inter-class variation using additive angular margin within the cosine similarity metrics. Our novel approach enables models to learn to cluster similar representations, thereby improving their ability to provide better separation between different classes. Our results demonstrate significant improvement on all metrics over the state-of-the-art (SOTA) methods on the test set from the same and diverse datasets. We evaluated our approach for classification, detection, and segmentation. SSL-CPCD attains notable Top 1 accuracy of 79.77% in ulcerative colitis classification, an 88.62% mean average precision (mAP) for detection, and an 82.32% dice similarity coefficient for polyp segmentation tasks. These represent improvements of over 4%, 2%, and 3%, respectively, compared to the baseline architectures. We demonstrate that our method generalises better than all SOTA methods to unseen datasets, reporting over 7% improvement.
Collapse
|
9
|
Zhang X, Tian L, Guo S, Liu Y. STF-Net: sparsification transformer coding guided network for subcortical brain structure segmentation. BIOMED ENG-BIOMED TE 2024; 69:465-480. [PMID: 38712825 DOI: 10.1515/bmt-2023-0121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Accepted: 04/15/2024] [Indexed: 05/08/2024]
Abstract
Subcortical brain structure segmentation plays an important role in the diagnosis of neuroimaging and has become the basis of computer-aided diagnosis. Due to the blurred boundaries and complex shapes of subcortical brain structures, labeling these structures by hand becomes a time-consuming and subjective task, greatly limiting their potential for clinical applications. Thus, this paper proposes the sparsification transformer (STF) module for accurate brain structure segmentation. The self-attention mechanism is used to establish global dependencies to efficiently extract the global information of the feature map with low computational complexity. Also, the shallow network is used to compensate for low-level detail information through the localization of convolutional operations to promote the representation capability of the network. In addition, a hybrid residual dilated convolution (HRDC) module is introduced at the bottom layer of the network to extend the receptive field and extract multi-scale contextual information. Meanwhile, the octave convolution edge feature extraction (OCT) module is applied at the skip connections of the network to pay more attention to the edge features of brain structures. The proposed network is trained with a hybrid loss function. The experimental evaluation on two public datasets: IBSR and MALC, shows outstanding performance in terms of objective and subjective quality.
Collapse
Affiliation(s)
- Xiufeng Zhang
- School of Mechanical and Electrical Engineering, 66455 Dalian Minzu University , Dalian, Liaoning, China
| | - Lingzhuo Tian
- School of Mechanical and Electrical Engineering, 66455 Dalian Minzu University , Dalian, Liaoning, China
| | - Shengjin Guo
- School of Mechanical and Electrical Engineering, 66455 Dalian Minzu University , Dalian, Liaoning, China
| | - Yansong Liu
- School of Mechanical and Electrical Engineering, 66455 Dalian Minzu University , Dalian, Liaoning, China
| |
Collapse
|
10
|
Pang C, Lu X, Liu X, Zhang R, Lyu L. IIAM: Intra and Inter Attention With Mutual Consistency Learning Network for Medical Image Segmentation. IEEE J Biomed Health Inform 2024; 28:5971-5983. [PMID: 38985557 DOI: 10.1109/jbhi.2024.3426074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Medical image segmentation provides a reliable basis for diagnosis analysis and disease treatment by capturing the global and local features of the target region. To learn global features, convolutional neural networks are replaced with pure transformers, or transformer layers are stacked at the deepest layers of convolutional neural networks. Nevertheless, they are deficient in exploring local-global cues at each scale and the interaction among consensual regions in multiple scales, hindering the learning about the changes in size, shape, and position of target objects. To cope with these defects, we propose a novel Intra and Inter Attention with Mutual Consistency Learning Network (IIAM). Concretely, we design an intra attention module to aggregate the CNN-based local features and transformer-based global information on each scale. In addition, to capture the interaction among consensual regions in multiple scales, we devise an inter attention module to explore the cross-scale dependency of the object and its surroundings. Moreover, to reduce the impact of blurred regions in medical images on the final segmentation results, we introduce multiple decoders to estimate the model uncertainty, where we adopt a mutual consistency learning strategy to minimize the output discrepancy during the end-to-end training and weight the outputs of the three decoders as the final segmentation result. Extensive experiments on three benchmark datasets verify the efficacy of our method and demonstrate superior performance of our model to state-of-the-art techniques.
Collapse
|
11
|
Zhao T, Wu H, Leng D, Yao E, Gu S, Yao M, Zhang Q, Wang T, Wu D, Xie L. An artificial intelligence grading system of apical periodontitis in cone-beam computed tomography data. Dentomaxillofac Radiol 2024; 53:447-458. [PMID: 38960866 DOI: 10.1093/dmfr/twae029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 06/02/2024] [Accepted: 06/18/2024] [Indexed: 07/05/2024] Open
Abstract
OBJECTIVES In order to assist junior doctors in better diagnosing apical periodontitis (AP), an artificial intelligence AP grading system was developed based on deep learning (DL) and its reliability and accuracy were evaluated. METHODS One hundred and twenty cone-beam computed tomography (CBCT) images were selected to construct a classification dataset with four categories, which were divided by CBCT periapical index (CBCTPAI), including normal periapical tissue, CBCTPAI 1-2, CBCTPAI 3-5, and young permanent teeth. Three classic algorithms (ResNet50/101/152) as well as one self-invented algorithm (PAINet) were compared with each other. PAINet were also compared with two recent Transformer-based models and three attention models. Their performance was evaluated by accuracy, precision, recall, balanced F score (F1-score), and the area under the macro-average receiver operating curve (AUC). Reliability was evaluated by Cohen's kappa to compare the consistency of model predicted labels with expert opinions. RESULTS PAINet performed best among the four algorithms. The accuracy, precision, recall, F1-score, and AUC on the test set were 0.9333, 0.9415, 0.9333, 0.9336, and 0.9972, respectively. Cohen's kappa was 0.911, which represented almost perfect consistency. CONCLUSIONS PAINet can accurately distinguish between normal periapical tissues, CBCTPAI 1-2, CBCTPAI 3-5, and young permanent teeth. Its results were highly consistent with expert opinions. It can help junior doctors diagnose and score AP, reducing the burden. It can also be promoted in areas where experts are lacking to provide professional diagnostic opinions.
Collapse
Affiliation(s)
- Tianyin Zhao
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Huili Wu
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Diya Leng
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Enhui Yao
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Shuyun Gu
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Minhui Yao
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Qinyu Zhang
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Tong Wang
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Daming Wu
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Lizhe Xie
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| |
Collapse
|
12
|
Xu R, Wang C, Zhang J, Xu S, Meng W, Zhang X. SkinFormer: Learning Statistical Texture Representation With Transformer for Skin Lesion Segmentation. IEEE J Biomed Health Inform 2024; 28:6008-6018. [PMID: 38913520 DOI: 10.1109/jbhi.2024.3417247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Accurate skin lesion segmentation from dermoscopic images is of great importance for skin cancer diagnosis. However, automatic segmentation of melanoma remains a challenging task because it is difficult to incorporate useful texture representations into the learning process. Texture representations are not only related to the local structural information learned by CNN, but also include the global statistical texture information of the input image. In this paper, we propose a transFormer network (SkinFormer) that efficiently extracts and fuses statistical texture representation for Skin lesion segmentation. Specifically, to quantify the statistical texture of input features, a Kurtosis-guided Statistical Counting Operator is designed. We propose Statistical Texture Fusion Transformer and Statistical Texture Enhance Transformer with the help of Kurtosis-guided Statistical Counting Operator by utilizing the transformer's global attention mechanism. The former fuses structural texture information and statistical texture information, and the latter enhances the statistical texture of multi-scale features. Extensive experiments on three publicly available skin lesion datasets validate that our SkinFormer outperforms other SOAT methods, and our method achieves 93.2% Dice score on ISIC 2018. It can be easy to extend SkinFormer to segment 3D images in the future.
Collapse
|
13
|
Xu C, He S, Li H. An attentional mechanism model for segmenting multiple lesion regions in the diabetic retina. Sci Rep 2024; 14:21354. [PMID: 39266650 PMCID: PMC11392929 DOI: 10.1038/s41598-024-72481-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Accepted: 09/09/2024] [Indexed: 09/14/2024] Open
Abstract
Diabetic retinopathy (DR), a leading cause of blindness in diabetic patients, necessitates the precise segmentation of lesions for the effective grading of lesions. DR multi-lesion segmentation faces the main concerns as follows. On the one hand, retinal lesions vary in location, shape, and size. On the other hand, the currently available multi-lesion region segmentation models are insufficient in their extraction of minute features and are prone to overlooking microaneurysms. To solve the above problems, we propose a novel deep learning method: the Multi-Scale Spatial Attention Gate (MSAG) mechanism network. The model inputs images of varying scales in order to extract a range of semantic information. Our innovative Spatial Attention Gate merges low-level spatial details with high-level semantic content, assigning hierarchical attention weights for accurate segmentation. The incorporation of the modified spatial attention gate in the inference stage enhances precision by combining prediction scales hierarchically, thereby improving segmentation accuracy without increasing the associated training costs. We conduct the experiments on the public datasets IDRiD and DDR, and the experimental results show that the proposed method achieves better performance than other methods.
Collapse
Affiliation(s)
- Changzhuan Xu
- Information Branch, Guizhou Provincial People's Hospital, Guizhou, 550001, China.
| | - Song He
- Information Branch, Guizhou Provincial People's Hospital, Guizhou, 550001, China
| | - Hailin Li
- Information Branch, Guizhou Provincial People's Hospital, Guizhou, 550001, China
| |
Collapse
|
14
|
Zhou Y, Mei S, Wang J, Xu Q, Zhang Z, Qin S, Feng J, Li C, Xing S, Wang W, Zhang X, Li F, Zhou Q, He Z, Gao Y. Development and validation of a deep learning-based framework for automated lung CT segmentation and acute respiratory distress syndrome prediction: a multicenter cohort study. EClinicalMedicine 2024; 75:102772. [PMID: 39170939 PMCID: PMC11338113 DOI: 10.1016/j.eclinm.2024.102772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 07/15/2024] [Accepted: 07/17/2024] [Indexed: 08/23/2024] Open
Abstract
Background Acute respiratory distress syndrome (ARDS) is a life-threatening condition with a high incidence and mortality rate in intensive care unit (ICU) admissions. Early identification of patients at high risk for developing ARDS is crucial for timely intervention and improved clinical outcomes. However, the complex pathophysiology of ARDS makes early prediction challenging. This study aimed to develop an artificial intelligence (AI) model for automated lung lesion segmentation and early prediction of ARDS to facilitate timely intervention in the intensive care unit. Methods A total of 928 ICU patients with chest computed tomography (CT) scans were included from November 2018 to November 2021 at three centers in China. Patients were divided into a retrospective cohort for model development and internal validation, and three independent cohorts for external validation. A deep learning-based framework using the UNet Transformer (UNETR) model was developed to perform the segmentation of lung lesions and early prediction of ARDS. We employed various data augmentation techniques using the Medical Open Network for AI (MONAI) framework, enhancing the training sample diversity and improving the model's generalization capabilities. The performance of the deep learning-based framework was compared with a Densenet-based image classification network and evaluated in external and prospective validation cohorts. The segmentation performance was assessed using the Dice coefficient (DC), and the prediction performance was assessed using area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. The contributions of different features to ARDS prediction were visualized using Shapley Explanation Plots. This study was registered with the China Clinical Trial Registration Centre (ChiCTR2200058700). Findings The segmentation task using the deep learning framework achieved a DC of 0.734 ± 0.137 in the validation set. For the prediction task, the deep learning-based framework achieved AUCs of 0.916 [0.858-0.961], 0.865 [0.774-0.945], 0.901 [0.835-0.955], and 0.876 [0.804-0.936] in the internal validation cohort, external validation cohort I, external validation cohort II, and prospective validation cohort, respectively. It outperformed the Densenet-based image classification network in terms of prediction accuracy. Moreover, the ARDS prediction model identified lung lesion features and clinical parameters such as C-reactive protein, albumin, bilirubin, platelet count, and age as significant contributors to ARDS prediction. Interpretation The deep learning-based framework using the UNETR model demonstrated high accuracy and robustness in lung lesion segmentation and early ARDS prediction, and had good generalization ability and clinical applicability. Funding This study was supported by grants from the Shanghai Renji Hospital Clinical Research Innovation and Cultivation Fund (RJPY-DZX-008) and Shanghai Science and Technology Development Funds (22YF1423300).
Collapse
Affiliation(s)
- Yang Zhou
- Department of Critical Care Medicine, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shuya Mei
- Department of Critical Care Medicine, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jiemin Wang
- Department of Critical Care Medicine, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Qiaoyi Xu
- Department of Critical Care Medicine, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhiyun Zhang
- Department of Critical Care Medicine, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shaojie Qin
- Department of Critical Care Medicine, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jinhua Feng
- Department of Critical Care Medicine, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Congye Li
- Department of Critical Care Medicine, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shunpeng Xing
- Department of Critical Care Medicine, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Wei Wang
- Shanghai Public Health Clinical Center, Fudan University, Shanghai, China
| | - Xiaolin Zhang
- Shanghai Public Health Clinical Center, Fudan University, Shanghai, China
| | - Feng Li
- Shanghai Public Health Clinical Center, Fudan University, Shanghai, China
| | - Quanhong Zhou
- Department of Critical Care Medicine, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhengyu He
- Department of Critical Care Medicine, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yuan Gao
- Department of Critical Care Medicine, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| |
Collapse
|
15
|
Zhuo M, Chen X, Guo J, Qian Q, Xue E, Chen Z. Deep Learning-Based Segmentation and Risk Stratification for Gastrointestinal Stromal Tumors in Transabdominal Ultrasound Imaging. JOURNAL OF ULTRASOUND IN MEDICINE : OFFICIAL JOURNAL OF THE AMERICAN INSTITUTE OF ULTRASOUND IN MEDICINE 2024; 43:1661-1672. [PMID: 38822195 DOI: 10.1002/jum.16489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 04/19/2024] [Accepted: 05/12/2024] [Indexed: 06/02/2024]
Abstract
PURPOSE To develop a deep neural network system for the automatic segmentation and risk stratification prediction of gastrointestinal stromal tumors (GISTs). METHODS A total of 980 ultrasound (US) images from 245 GIST patients were retrospectively collected. These images were randomly divided (6:2:2) into a training set, a validation set, and an internal test set. Additionally, 188 US images from 47 prospective GIST patients were collected to evaluate the segmentation and diagnostic performance of the model. Five deep learning-based segmentation networks, namely, UNet, FCN, DeepLabV3+, Swin Transformer, and SegNeXt, were employed, along with the ResNet 18 classification network, to select the most suitable network combination. The performance of the segmentation models was evaluated using metrics such as the intersection over union (IoU), Dice similarity coefficient (DSC), recall, and precision. The classification performance was assessed based on accuracy and the area under the receiver operating characteristic curve (AUROC). RESULTS Among the compared models, SegNeXt-ResNet18 exhibited the best segmentation and classification performance. On the internal test set, the proposed model achieved IoU, DSC, precision, and recall values of 82.1, 90.2, 91.7, and 88.8%, respectively. The accuracy and AUC for GIST risk prediction were 87.4 and 92.0%, respectively. On the external test set, the segmentation models exhibited IoU, DSC, precision, and recall values of 81.0, 89.5, 92.8, and 86.4%, respectively. The accuracy and AUC for GIST risk prediction were 86.7 and 92.5%, respectively. CONCLUSION This two-stage SegNeXt-ResNet18 model achieves automatic segmentation and risk stratification prediction for GISTs and demonstrates excellent segmentation and classification performance.
Collapse
Affiliation(s)
- Minling Zhuo
- Department of Ultrasound, Fujian Medical University Union Hospital, Fuzhou, China
| | - Xing Chen
- Department of General Surgery, Fujian Medical University Provincial Clinical Medical College, Fujian Provincial Hospital, Fuzhou, China
| | - Jingjing Guo
- Department of Ultrasound, Fujian Medical University Union Hospital, Fuzhou, China
| | - Qingfu Qian
- Department of Ultrasound, Fujian Medical University Union Hospital, Fuzhou, China
| | - Ensheng Xue
- Department of Ultrasound, Fujian Medical University Union Hospital, Fuzhou, China
| | - Zhikui Chen
- Department of Ultrasound, Fujian Medical University Union Hospital, Fuzhou, China
| |
Collapse
|
16
|
Degala SKB, Tewari RP, Kamra P, Kasiviswanathan U, Pandey R. Segmentation and Estimation of Fetal Biometric Parameters using an Attention Gate Double U-Net with Guided Decoder Architecture. Comput Biol Med 2024; 180:109000. [PMID: 39133952 DOI: 10.1016/j.compbiomed.2024.109000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 08/01/2024] [Accepted: 08/05/2024] [Indexed: 08/29/2024]
Abstract
The fetus's health is evaluated with the biometric parameters obtained from the low-resolution ultrasound images. The accuracy of biometric parameters in existing protocols typically depends on conventional image processing approaches and hence, is prone to error. This study introduces the Attention Gate Double U-Net with Guided Decoder (ADU-GD) model specifically crafted for fetal biometric parameter prediction. The attention network and guided decoder are specifically designed to dynamically merge local features with their global dependencies, enhancing the precision of parameter estimation. The ADU-GD displays superior performance with Mean Absolute Error of 0.99 mm and segmentation accuracy of 99.1 % when benchmarked against the well-established models. The proposed model consistently achieved a high Dice index score of about 99.1 ± 0.8, with a minimal Hausdorff distance of about 1.01 ± 1.07 and a low Average Symmetric Surface Distance of about 0.25 ± 0.21, demonstrating the model's excellence. In a comprehensive evaluation, ADU-GD emerged as a frontrunner, outperforming existing deep-learning models such as Double U-Net, DeepLabv3, FCN-32s, PSPNet, SegNet, Trans U-Net, Swin U-Net, Mask-R2CNN, and RDHCformer models in terms of Mean Absolute Error for crucial fetal dimensions, including Head Circumference, Abdomen Circumference, Femur Length, and BiParietal Diameter. It achieved superior accuracy with MAE values of 2.2 mm, 2.6 mm, 0.6 mm, and 1.2 mm, respectively.
Collapse
Affiliation(s)
- Sajal Kumar Babu Degala
- Department of Applied Mechanics, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, 211004, Uttar Pradesh, India
| | - Ravi Prakash Tewari
- Department of Applied Mechanics, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, 211004, Uttar Pradesh, India
| | - Pankaj Kamra
- Kamra Ultrasound Centre and United Diagnostics, Prayagraj, 211002, Uttar Pradesh, India
| | - Uvanesh Kasiviswanathan
- Department of Applied Mechanics, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, 211004, Uttar Pradesh, India.
| | - Ramesh Pandey
- Department of Applied Mechanics, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, 211004, Uttar Pradesh, India
| |
Collapse
|
17
|
Yan S, Yang B, Chen A. A differential network with multiple gated reverse attention for medical image segmentation. Sci Rep 2024; 14:20274. [PMID: 39217265 PMCID: PMC11365968 DOI: 10.1038/s41598-024-71194-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 08/26/2024] [Indexed: 09/04/2024] Open
Abstract
UNet architecture has achieved great success in medical image segmentation applications. However, these models still encounter several challenges. One is the loss of pixel-level information caused by multiple down-sampling steps. Additionally, the addition or concatenation method used in the decoder can generate redundant information. These limitations affect the localization ability, weaken the complementarity of features at different levels and can lead to blurred boundaries. However, differential features can effectively compensate for these shortcomings and significantly enhance the performance of image segmentation. Therefore, we propose MGRAD-UNet (multi-gated reverse attention multi-scale differential UNet) based on UNet. We utilize the multi-scale differential decoder to generate abundant differential features at both the pixel level and structure level. These features which serve as gate signals, are transmitted to the gate controller and forwarded to the other differential decoder. In order to enhance the focus on important regions, another differential decoder is equipped with reverse attention. The features obtained by two differential decoders are differentiated for the second time. The resulting differential feature obtained is sent back to the controller as a control signal, then transmitted to the encoder for learning the differential feature by two differential decoders. The core design of MGRAD-UNet lies in extracting comprehensive and accurate features through caching overall differential features and multi-scale differential processing, enabling iterative learning from diverse information. We evaluate MGRAD-UNet against state-of-theart (SOTA) methods on two public datasets. Our method surpasses competitors and provides a new approach for the design of UNet.
Collapse
Affiliation(s)
- Shun Yan
- School of Electronic and Information Engineering, Taizhou University, Taizhou, 318000, Zhejiang, China
| | - Benquan Yang
- School of Electronic and Information Engineering, Taizhou University, Taizhou, 318000, Zhejiang, China.
| | - Aihua Chen
- School of Electronic and Information Engineering, Taizhou University, Taizhou, 318000, Zhejiang, China.
| |
Collapse
|
18
|
Huang H, Chen Z, Zou Y, Lu M, Chen C, Song Y, Zhang H, Yan F. Channel prior convolutional attention for medical image segmentation. Comput Biol Med 2024; 178:108784. [PMID: 38941900 DOI: 10.1016/j.compbiomed.2024.108784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 05/01/2024] [Accepted: 06/15/2024] [Indexed: 06/30/2024]
Abstract
Characteristics such as low contrast and significant organ shape variations are often exhibited in medical images. The improvement of segmentation performance in medical imaging is limited by the generally insufficient adaptive capabilities of existing attention mechanisms. An efficient Channel Prior Convolutional Attention (CPCA) method is proposed in this paper, supporting the dynamic distribution of attention weights in both channel and spatial dimensions. Spatial relationships are effectively extracted while preserving the channel prior by employing a multi-scale depth-wise convolutional module. The ability to focus on informative channels and important regions is possessed by CPCA. A segmentation network called CPCANet for medical image segmentation is proposed based on CPCA. CPCANet is validated on two publicly available datasets. Improved segmentation performance is achieved by CPCANet while requiring fewer computational resources through comparisons with state-of-the-art algorithms. Our code is publicly available at https://github.com/Cuthbert-Huang/CPCANet.
Collapse
Affiliation(s)
- Hejun Huang
- School of Information and Electrical Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
| | - Zuguo Chen
- School of Information and Electrical Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China; Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Ying Zou
- School of Information and Electrical Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
| | - Ming Lu
- School of Information and Electrical Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
| | - Chaoyang Chen
- School of Information and Electrical Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
| | - Youzhi Song
- Rucheng County Hospital of Traditional Chinese Medicine, Chenzhou, 424100, China
| | - Hongqiang Zhang
- School of Information and Electrical Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
| | - Feng Yan
- Changsha Nonferrous Metallurgy Design & Research Institute Co., Changsha, 410019, China
| |
Collapse
|
19
|
Xie Q, Chen Y, Liu S, Lu X. SSCFormer: Revisiting ConvNet-Transformer Hybrid Framework From Scale-Wise and Spatial-Channel-Aware Perspectives for Volumetric Medical Image Segmentation. IEEE J Biomed Health Inform 2024; 28:4830-4841. [PMID: 38648142 DOI: 10.1109/jbhi.2024.3392488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Accurate and robust medical image segmentation is crucial for assisting disease diagnosis, making treatment plan, and monitoring disease progression. Adaptive to different scale variations and regions of interest is essential for high accuracy in automatic segmentation methods. Existing methods based on the U-shaped architecture respectively tackling intra- and inter-scale problem with a hierarchical encoder, however, are restricted by the scope of multi-scale modeling. In addition, global attention and scaling attention in regions of interest have not been appropriately adopted, especially for the salient features. To address these two issues, we propose a ConvNet-Transformer hybrid framework named SSCFormer for accurate and versatile medical image segmentation. The intra-scale ResInception and inter-scale transformer bridge are designed to collaboratively capture the intra- and inter-scale features, facilitating the interaction of small-scale disparity information at a single stage with large-scale from multiple stages. Global attention and scaling attention are cleverly integrated from a spatial-channel-aware perspective. The proposed SSCFormer is tested on four different medical image segmentation tasks. Comprehensive experimental results show that SSCFormer outperforms the current state-of-the-art methods.
Collapse
|
20
|
Kong G, Wu C, Zhang Z, Yin C, Qin D. M 3: using mask-attention and multi-scale for multi-modal brain MRI classification. Front Neuroinform 2024; 18:1403732. [PMID: 39139696 PMCID: PMC11320416 DOI: 10.3389/fninf.2024.1403732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 07/09/2024] [Indexed: 08/15/2024] Open
Abstract
Introduction Brain diseases, particularly the classification of gliomas and brain metastases and the prediction of HT in strokes, pose significant challenges in healthcare. Existing methods, relying predominantly on clinical data or imaging-based techniques such as radiomics, often fall short in achieving satisfactory classification accuracy. These methods fail to adequately capture the nuanced features crucial for accurate diagnosis, often hindered by noise and the inability to integrate information across various scales. Methods We propose a novel approach that mask attention mechanisms with multi-scale feature fusion for Multimodal brain disease classification tasks, termed M 3, which aims to extract features highly relevant to the disease. The extracted features are then dimensionally reduced using Principal Component Analysis (PCA), followed by classification with a Support Vector Machine (SVM) to obtain the predictive results. Results Our methodology underwent rigorous testing on multi-parametric MRI datasets for both brain tumors and strokes. The results demonstrate a significant improvement in addressing critical clinical challenges, including the classification of gliomas, brain metastases, and the prediction of hemorrhagic stroke transformations. Ablation studies further validate the effectiveness of our attention mechanism and feature fusion modules. Discussion These findings underscore the potential of our approach to meet and exceed current clinical diagnostic demands, offering promising prospects for enhancing healthcare outcomes in the diagnosis and treatment of brain diseases.
Collapse
Affiliation(s)
- Guanqing Kong
- Linyi People's Hospital, Linyi City, Shandong Province, China
- Linyi Key Laboratory of Health Data Science, Linyi City, Shandong Province, China
- Shandong Open Laboratory of Data Innovation Application, Linyi City, Shandong Province, China
| | - Chuanfu Wu
- Linyi People's Hospital, Linyi City, Shandong Province, China
- Linyi Key Laboratory of Health Data Science, Linyi City, Shandong Province, China
- Shandong Open Laboratory of Data Innovation Application, Linyi City, Shandong Province, China
| | - Zongqiu Zhang
- Linyi People's Hospital, Linyi City, Shandong Province, China
| | - Chuansheng Yin
- Linyi People's Hospital, Linyi City, Shandong Province, China
| | - Dawei Qin
- Linyi People's Hospital, Linyi City, Shandong Province, China
- Linyi Key Laboratory of Health Data Science, Linyi City, Shandong Province, China
- Shandong Open Laboratory of Data Innovation Application, Linyi City, Shandong Province, China
| |
Collapse
|
21
|
Pan X, Wang D. GC Snakes: An Efficient and Robust Segmentation Model for Hot Forging Images. SENSORS (BASEL, SWITZERLAND) 2024; 24:4821. [PMID: 39123869 PMCID: PMC11314881 DOI: 10.3390/s24154821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 07/22/2024] [Accepted: 07/23/2024] [Indexed: 08/12/2024]
Abstract
Machine vision is a desirable non-contact measurement method for hot forgings, as image segmentation has been a challenging issue in performance and robustness resulting from the diversity of working conditions for hot forgings. Thus, this paper proposes an efficient and robust active contour model and corresponding image segmentation approach for forging images, by which verification experiments are conducted to prove the performance of the segmentation method by measuring geometric parameters for forging parts. Specifically, three types of continuity parameters are defined based on the geometric continuity of equivalent grayscale surfaces for forging images; hence, a new image force and external energy functional are proposed to form a new active contour model, Geometric Continuity Snakes (GC Snakes), which is more percipient to the grayscale distribution characteristics of forging images to improve the convergence for active contour robustly; additionally, a generating strategy for initial control points for GC Snakes is proposed to compose an efficient and robust image segmentation approach. The experimental results show that the proposed GC Snakes has better segmentation performance compared with existing active contour models for forging images of different temperatures and sizes, which provides better performance and efficiency in geometric parameter measurement for hot forgings. The maximum positioning and dimension errors by GC Snakes are 0.5525 mm and 0.3868 mm, respectively, compared with errors of 0.7873 mm and 0.6868 mm by the Snakes model.
Collapse
Affiliation(s)
| | - Delun Wang
- School of Mechanical Engineering, Dalian University of Technology, Dalian 116024, China;
| |
Collapse
|
22
|
P V, R TS, S A, S A. Efficient Kidney Tumor Classification and Segmentation with U-Net. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40040090 DOI: 10.1109/embc53108.2024.10782559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
A novel approach to kidney tumor identification is introduced, which integrates kidney tumor classification and segmentation algorithms. The multi-faceted approach commences with the classification of kidney images, adaptly discerning between normal and tumor instances. For individuals identified as tumor-positive, a sophisticated UNet-based architecture is intricately employed to achieve precise segmentation, capturing nuanced details of both kidney and tumor regions. Many models, including VGG16, MobileNetV3, DenseNet50, and others, were tested in order to achieve this. Among these, MobilenetV3 performs better than the others in terms of accuracy, with a 99.1% accuracy rate and a 99% precision rate for classification. In this research, we applied a novel U-Net model to accurately segregate kidney and kidney tumor from CT scan data. With this, an average dice coefficient score of 0.9445 is obtained. In advancing the landscape of kidney tumor analysis, this proposed strategy not only bridges classification and segmentation but also showcases a significant leap toward refined clinical applications.
Collapse
|
23
|
Cao J, Wang X, Qu Z, Zhuo L, Li X, Zhang H, Yang Y, Wei W. WDFF-Net: Weighted Dual-Branch Feature Fusion Network for Polyp Segmentation With Object-Aware Attention Mechanism. IEEE J Biomed Health Inform 2024; 28:4118-4131. [PMID: 38536686 DOI: 10.1109/jbhi.2024.3381891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]
Abstract
Colon polyps in colonoscopy images exhibit significant differences in color, size, shape, appearance, and location, posing significant challenges to accurate polyp segmentation. In this paper, a Weighted Dual-branch Feature Fusion Network is proposed for Polyp Segmentation, named WDFF-Net, which adopts HarDNet68 as the backbone network. First, a dual-branch feature fusion network architecture is constructed, which includes a shared feature extractor and two feature fusion branches, i.e. Progressive Feature Fusion (PFF) branch and Scale-aware Feature Fusion (SFF) branch. The branches fuse the deep features of multiple layers for different purposes and with different fusion ways. The PFF branch is to address the under-segmentation or over-segmentation problems of flat polyps with low-edge contrast by iteratively fusing the features from low, medium, and high layers. The SFF branch is to tackle the the problem of drastic variations in polyp size and shape, especially the missed segmentation problem for small polyps. These two branches are complementary and play different roles, in improving segmentation accuracy. Second, an Object-aware Attention Mechanism (OAM) is proposed to enhance the features of the target regions and suppress those of the background regions, to interfere with the segmentation performance. Third, a weighted dual-branch the segmentation loss function is specifically designed, which dynamically assigns the weight factors of the loss functions for two branches to optimize their collaborative training. Experimental results on five public colon polyp datasets demonstrate that, the proposed WDFF-Net can achieve a superior segmentation performance with lower model complexity and faster inference speed, while maintaining good generalization ability.
Collapse
|
24
|
Yang L, Gu Y, Bian G, Liu Y. MSDE-Net: A Multi-Scale Dual-Encoding Network for Surgical Instrument Segmentation. IEEE J Biomed Health Inform 2024; 28:4072-4083. [PMID: 38117619 DOI: 10.1109/jbhi.2023.3344716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2023]
Abstract
Minimally invasive surgery, which relies on surgical robots and microscopes, demands precise image segmentation to ensure safe and efficient procedures. Nevertheless, achieving accurate segmentation of surgical instruments remains challenging due to the complexity of the surgical environment. To tackle this issue, this paper introduces a novel multiscale dual-encoding segmentation network, termed MSDE-Net, designed to automatically and precisely segment surgical instruments. The proposed MSDE-Net leverages a dual-branch encoder comprising a convolutional neural network (CNN) branch and a transformer branch to effectively extract both local and global features. Moreover, an attention fusion block (AFB) is introduced to ensure effective information complementarity between the dual-branch encoding paths. Additionally, a multilayer context fusion block (MCF) is proposed to enhance the network's capacity to simultaneously extract global and local features. Finally, to extend the scope of global feature information under larger receptive fields, a multi-receptive field fusion (MRF) block is incorporated. Through comprehensive experimental evaluations on two publicly available datasets for surgical instrument segmentation, the proposed MSDE-Net demonstrates superior performance compared to existing methods.
Collapse
|
25
|
Zhang H, Cai Z. ConvNextUNet: A small-region attentioned model for cardiac MRI segmentation. Comput Biol Med 2024; 177:108592. [PMID: 38781642 DOI: 10.1016/j.compbiomed.2024.108592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 04/08/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024]
Abstract
Cardiac MRI segmentation is a significant research area in medical image processing, holding immense clinical and scientific importance in assisting the diagnosis and treatment of heart diseases. Currently, existing cardiac MRI segmentation algorithms are often constrained by specific datasets and conditions, leading to a notable decrease in segmentation performance when applied to diverse datasets. These limitations affect the algorithm's overall performance and generalization capabilities. Inspired by ConvNext, we introduce a two-dimensional cardiac MRI segmentation U-shaped network called ConvNextUNet. It is the first application of a combination of ConvNext and the U-shaped architecture in the field of cardiac MRI segmentation. Firstly, we incorporate up-sampling modules into the original ConvNext architecture and combine it with the U-shaped framework to achieve accurate reconstruction. Secondly, we integrate Input Stem into ConvNext, and introduce attention mechanisms along the bridging path. By merging features extracted from both the encoder and decoder, a probability distribution is obtained through linear and nonlinear transformations, serving as attention weights, thereby enhancing the signal of the same region of interest. The resulting attention weights are applied to the decoder features, highlighting the region of interest. This allows the model to simultaneously consider local context and global details during the learning phase, fully leveraging the advantages of both global and local perception for a more comprehensive understanding of cardiac anatomical structures. Consequently, the model demonstrates a clear advantage and robust generalization capability, especially in small-region segmentation. Experimental results on the ACDC, LVQuan19, and RVSC datasets confirm that the ConvNextUNet model outperforms the current state-of-the-art models, particularly in small-region segmentation tasks. Furthermore, we conducted cross-dataset training and testing experiments, which revealed that the pre-trained model can accurately segment diverse cardiac datasets, showcasing its powerful generalization capabilities. The source code of this project is available at https://github.com/Zemin-Cai/ConvNextUNet.
Collapse
Affiliation(s)
- Huiyi Zhang
- The Department of Electronic Engineering, Shantou University, Shantou, Guangdong 515063, PR China; Key Laboratory of Digital Signal and Image Processing of Guangdong Province, Shantou, Guangdong 515063, PR China
| | - Zemin Cai
- The Department of Electronic Engineering, Shantou University, Shantou, Guangdong 515063, PR China; Key Laboratory of Digital Signal and Image Processing of Guangdong Province, Shantou, Guangdong 515063, PR China.
| |
Collapse
|
26
|
Zhao J, Liu L, Yang X, Cui Y, Li D, Zhang H, Zhang K. A medical image segmentation method for rectal tumors based on multi-scale feature retention and multiple attention mechanisms. Med Phys 2024; 51:3275-3291. [PMID: 38569054 DOI: 10.1002/mp.17044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 01/16/2024] [Accepted: 01/22/2024] [Indexed: 04/05/2024] Open
Abstract
BACKGROUND With the continuous development of deep learning algorithms in the field of medical images, models for medical image processing based on convolutional neural networks have made great progress. Since medical images of rectal tumors are characterized by specific morphological features and complex edges that differ from natural images, achieving good segmentation results often requires a higher level of enrichment through the utilization of semantic features. PURPOSE The efficiency of feature extraction and utilization has been improved to some extent through enhanced hardware arithmetic and deeper networks in most models. However, problems still exist with detail loss and difficulty in feature extraction, arising from the extraction of high-level semantic features in deep networks. METHODS In this work, a novel medical image segmentation model has been proposed for Magnetic Resonance Imaging (MRI) image segmentation of rectal tumors. The model constructs a backbone architecture based on the idea of jump-connected feature fusion and solves the problems of detail feature loss and low segmentation accuracy using three novel modules: Multi-scale Feature Retention (MFR), Multi-branch Cross-channel Attention (MCA), and Coordinate Attention (CA). RESULTS Compared with existing methods, our proposed model is able to segment the tumor region more effectively, achieving 97.4% and 94.9% in Dice and mIoU metrics, respectively, exhibiting excellent segmentation performance and computational speed. CONCLUSIONS Our proposed model has improved the accuracy of both lesion region and tumor edge segmentation. In particular, the determination of the lesion region can help doctors identify the tumor location in clinical diagnosis, and the accurate segmentation of the tumor edge can assist doctors in judging the necessity and feasibility of surgery.
Collapse
Affiliation(s)
- Jumin Zhao
- College of Information and Computer, Taiyuan University of Technology, Jinzhong, China
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, Taiyuan, China
- Intelligent Perception Engineering Technology Center of Shanxi, Taiyuan, China
- Shanxi Province Engineering Technology Research Center of Spatial Information Network, Taiyuan, China
| | - Linjun Liu
- College of Information and Computer, Taiyuan University of Technology, Jinzhong, China
| | - Xiaotang Yang
- Department of Radiology, Shanxi Province Cancer Hospital, Shanxi Medical University, Taiyuan, China
| | - Yanfen Cui
- Department of Radiology, Shanxi Province Cancer Hospital, Shanxi Medical University, Taiyuan, China
| | - Dengao Li
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, Taiyuan, China
- Intelligent Perception Engineering Technology Center of Shanxi, Taiyuan, China
- Shanxi Province Engineering Technology Research Center of Spatial Information Network, Taiyuan, China
- College of Data Science, Taiyuan University of Technology, Jinzhong, China
| | - Huiting Zhang
- College of Data Science, Taiyuan University of Technology, Jinzhong, China
| | - Kenan Zhang
- College of Information and Computer, Taiyuan University of Technology, Jinzhong, China
| |
Collapse
|
27
|
Pei C, Wu F, Yang M, Pan L, Ding W, Dong J, Huang L, Zhuang X. Multi-Source Domain Adaptation for Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1640-1651. [PMID: 38133966 DOI: 10.1109/tmi.2023.3346285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Unsupervised domain adaptation(UDA) aims to mitigate the performance drop of models tested on the target domain, due to the domain shift from the target to sources. Most UDA segmentation methods focus on the scenario of solely single source domain. However, in practical situations data with gold standard could be available from multiple sources (domains), and the multi-source training data could provide more information for knowledge transfer. How to utilize them to achieve better domain adaptation yet remains to be further explored. This work investigates multi-source UDA and proposes a new framework for medical image segmentation. Firstly, we employ a multi-level adversarial learning scheme to adapt features at different levels between each of the source domains and the target, to improve the segmentation performance. Then, we propose a multi-model consistency loss to transfer the learned multi-source knowledge to the target domain simultaneously. Finally, we validated the proposed framework on two applications, i.e., multi-modality cardiac segmentation and cross-modality liver segmentation. The results showed our method delivered promising performance and compared favorably to state-of-the-art approaches.
Collapse
|
28
|
Seeböck P, Orlando JI, Michl M, Mai J, Schmidt-Erfurth U, Bogunović H. Anomaly guided segmentation: Introducing semantic context for lesion segmentation in retinal OCT using weak context supervision from anomaly detection. Med Image Anal 2024; 93:103104. [PMID: 38350222 DOI: 10.1016/j.media.2024.103104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 12/01/2023] [Accepted: 02/05/2024] [Indexed: 02/15/2024]
Abstract
Automated lesion detection in retinal optical coherence tomography (OCT) scans has shown promise for several clinical applications, including diagnosis, monitoring and guidance of treatment decisions. However, segmentation models still struggle to achieve the desired results for some complex lesions or datasets that commonly occur in real-world, e.g. due to variability of lesion phenotypes, image quality or disease appearance. While several techniques have been proposed to improve them, one line of research that has not yet been investigated is the incorporation of additional semantic context through the application of anomaly detection models. In this study we experimentally show that incorporating weak anomaly labels to standard segmentation models consistently improves lesion segmentation results. This can be done relatively easy by detecting anomalies with a separate model and then adding these output masks as an extra class for training the segmentation model. This provides additional semantic context without requiring extra manual labels. We empirically validated this strategy using two in-house and two publicly available retinal OCT datasets for multiple lesion targets, demonstrating the potential of this generic anomaly guided segmentation approach to be used as an extra tool for improving lesion detection models.
Collapse
Affiliation(s)
- Philipp Seeböck
- Lab for Ophthalmic Image Analysis, Department of Ophthalmology and Optometry, Medical University of Vienna, Austria; Computational Imaging Research Lab, Department of Biomedical Imaging and Image-Guided Therapy, Medical University of Vienna, Austria.
| | - José Ignacio Orlando
- Lab for Ophthalmic Image Analysis, Department of Ophthalmology and Optometry, Medical University of Vienna, Austria; Yatiris Group at PLADEMA Institute, CONICET, Universidad Nacional del Centro de la Provincia de Buenos Aires, Gral. Pinto 399, Tandil, Buenos Aires, Argentina
| | - Martin Michl
- Lab for Ophthalmic Image Analysis, Department of Ophthalmology and Optometry, Medical University of Vienna, Austria
| | - Julia Mai
- Lab for Ophthalmic Image Analysis, Department of Ophthalmology and Optometry, Medical University of Vienna, Austria
| | - Ursula Schmidt-Erfurth
- Lab for Ophthalmic Image Analysis, Department of Ophthalmology and Optometry, Medical University of Vienna, Austria
| | - Hrvoje Bogunović
- Lab for Ophthalmic Image Analysis, Department of Ophthalmology and Optometry, Medical University of Vienna, Austria.
| |
Collapse
|
29
|
Shen C, Roth HR, Hayashi Y, Oda M, Sato G, Miyamoto T, Rueckert D, Mori K. Anatomical attention can help to segment the dilated pancreatic duct in abdominal CT. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-023-03049-z. [PMID: 38498132 DOI: 10.1007/s11548-023-03049-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 12/13/2023] [Indexed: 03/20/2024]
Abstract
PURPOSE Pancreatic duct dilation is associated with an increased risk of pancreatic cancer, the most lethal malignancy with the lowest 5-year relative survival rate. Automatic segmentation of the dilated pancreatic duct from contrast-enhanced CT scans would facilitate early diagnosis. However, pancreatic duct segmentation poses challenges due to its small anatomical structure and poor contrast in abdominal CT. In this work, we investigate an anatomical attention strategy to address this issue. METHODS Our proposed anatomical attention strategy consists of two steps: pancreas localization and pancreatic duct segmentation. The coarse pancreatic mask segmentation is used to guide the fully convolutional networks (FCNs) to concentrate on the pancreas' anatomy and disregard unnecessary features. We further apply a multi-scale aggregation scheme to leverage the information from different scales. Moreover, we integrate the tubular structure enhancement as an additional input channel of FCN. RESULTS We performed extensive experiments on 30 cases of contrast-enhanced abdominal CT volumes. To evaluate the pancreatic duct segmentation performance, we employed four measurements, including the Dice similarity coefficient (DSC), sensitivity, normalized surface distance, and 95 percentile Hausdorff distance. The average DSC achieves 55.7%, surpassing other pancreatic duct segmentation methods on single-phase CT scans only. CONCLUSIONS We proposed an anatomical attention-based strategy for the dilated pancreatic duct segmentation. Our proposed strategy significantly outperforms earlier approaches. The attention mechanism helps to focus on the pancreas region, while the enhancement of the tubular structure enables FCNs to capture the vessel-like structure. The proposed technique might be applied to other tube-like structure segmentation tasks within targeted anatomies.
Collapse
Affiliation(s)
- Chen Shen
- Graduate School of Informatics, Nagoya University, Furo-cho, Nagoya, Aichi, 4648601, Japan
| | - Holger R Roth
- NVIDIA Corporation, San Tomas Expy, Santa Clara, CA, 95051, USA
| | - Yuichiro Hayashi
- Graduate School of Informatics, Nagoya University, Furo-cho, Nagoya, Aichi, 4648601, Japan
| | - Masahiro Oda
- Graduate School of Informatics, Nagoya University, Furo-cho, Nagoya, Aichi, 4648601, Japan
- Information Strategy Office, Information and Communications, Nagoya University, Furo-cho, Nagoya, Aichi, 4648601, Japan
| | - Gen Sato
- Chiba Kensei Hospital, Makuhari-cho, Chiba, Chiba, 2620032, Japan
| | - Tadaaki Miyamoto
- Chiba Kensei Hospital, Makuhari-cho, Chiba, Chiba, 2620032, Japan
| | - Daniel Rueckert
- Department of Computing, Imperial College London, Exhibition Road, London, SW7 2AZ, UK
- Klinikum rechts der lsar, Technical University of Munich, Ismaninger Str. 22, Munich, 81675, Germany
| | - Kensaku Mori
- Graduate School of Informatics, Nagoya University, Furo-cho, Nagoya, Aichi, 4648601, Japan.
- Research Center for Medical Bigdata, National Institute of Informatics, 2-1-2 Hitotsubashi, Tokyo, 1018430, Japan.
| |
Collapse
|
30
|
Chen J, Huang G, Yuan X, Zhong G, Zheng Z, Pun CM, Zhu J, Huang Z. Quaternion Cross-Modality Spatial Learning for Multi-Modal Medical Image Segmentation. IEEE J Biomed Health Inform 2024; 28:1412-1423. [PMID: 38145537 DOI: 10.1109/jbhi.2023.3346529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2023]
Abstract
Recently, the Deep Neural Networks (DNNs) have had a large impact on imaging process including medical image segmentation, and the real-valued convolution of DNN has been extensively utilized in multi-modal medical image segmentation to accurately segment lesions via learning data information. However, the weighted summation operation in such convolution limits the ability to maintain spatial dependence that is crucial for identifying different lesion distributions. In this paper, we propose a novel Quaternion Cross-modality Spatial Learning (Q-CSL) which explores the spatial information while considering the linkage between multi-modal images. Specifically, we introduce to quaternion to represent data and coordinates that contain spatial information. Additionally, we propose Quaternion Spatial-association Convolution to learn the spatial information. Subsequently, the proposed De-level Quaternion Cross-modality Fusion (De-QCF) module excavates inner space features and fuses cross-modality spatial dependency. Our experimental results demonstrate that our approach compared to the competitive methods perform well with only 0.01061 M parameters and 9.95G FLOPs.
Collapse
|
31
|
Fu B, Peng Y, He J, Tian C, Sun X, Wang R. HmsU-Net: A hybrid multi-scale U-net based on a CNN and transformer for medical image segmentation. Comput Biol Med 2024; 170:108013. [PMID: 38271837 DOI: 10.1016/j.compbiomed.2024.108013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 12/26/2023] [Accepted: 01/18/2024] [Indexed: 01/27/2024]
Abstract
Accurate medical image segmentation is of great significance for subsequent diagnosis and analysis. The acquisition of multi-scale information plays an important role in segmenting regions of interest of different sizes. With the emergence of Transformers, numerous networks adopted hybrid structures incorporating Transformers and CNNs to learn multi-scale information. However, the majority of research has focused on the design and composition of CNN and Transformer structures, neglecting the inconsistencies in feature learning between Transformer and CNN. This oversight has resulted in the hybrid network's performance not being fully realized. In this work, we proposed a novel hybrid multi-scale segmentation network named HmsU-Net, which effectively fused multi-scale features. Specifically, HmsU-Net employed a parallel design incorporating both CNN and Transformer architectures. To address the inconsistency in feature learning between CNN and Transformer within the same stage, we proposed the multi-scale feature fusion module. For feature fusion across different stages, we introduced the cross-attention module. Comprehensive experiments conducted on various datasets demonstrate that our approach surpasses current state-of-the-art methods.
Collapse
Affiliation(s)
- Bangkang Fu
- Medical College, Guizhou University, Guizhou 550000, China; Department of Medical Imaging, International Exemplary Cooperation Base of Precision Imaging for Diagnosis and Treatment, Guizhou Provincial People's Hospital, Guizhou 550002, China
| | - Yunsong Peng
- Department of Medical Imaging, International Exemplary Cooperation Base of Precision Imaging for Diagnosis and Treatment, Guizhou Provincial People's Hospital, Guizhou 550002, China
| | - Junjie He
- Department of Medical Imaging, International Exemplary Cooperation Base of Precision Imaging for Diagnosis and Treatment, Guizhou Provincial People's Hospital, Guizhou 550002, China
| | - Chong Tian
- Department of Medical Imaging, International Exemplary Cooperation Base of Precision Imaging for Diagnosis and Treatment, Guizhou Provincial People's Hospital, Guizhou 550002, China
| | - Xinhuan Sun
- Department of Medical Imaging, International Exemplary Cooperation Base of Precision Imaging for Diagnosis and Treatment, Guizhou Provincial People's Hospital, Guizhou 550002, China
| | - Rongpin Wang
- Department of Medical Imaging, International Exemplary Cooperation Base of Precision Imaging for Diagnosis and Treatment, Guizhou Provincial People's Hospital, Guizhou 550002, China.
| |
Collapse
|
32
|
Papanastasiou G, Dikaios N, Huang J, Wang C, Yang G. Is Attention all You Need in Medical Image Analysis? A Review. IEEE J Biomed Health Inform 2024; 28:1398-1411. [PMID: 38157463 DOI: 10.1109/jbhi.2023.3348436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
Medical imaging is a key component in clinical diagnosis, treatment planning and clinical trial design, accounting for almost 90% of all healthcare data. CNNs achieved performance gains in medical image analysis (MIA) over the last years. CNNs can efficiently model local pixel interactions and be trained on small-scale MI data. Despite their important advances, typical CNN have relatively limited capabilities in modelling "global" pixel interactions, which restricts their generalisation ability to understand out-of-distribution data with different "global" information. The recent progress of Artificial Intelligence gave rise to Transformers, which can learn global relationships from data. However, full Transformer models need to be trained on large-scale data and involve tremendous computational complexity. Attention and Transformer compartments ("Transf/Attention") which can well maintain properties for modelling global relationships, have been proposed as lighter alternatives of full Transformers. Recently, there is an increasing trend to co-pollinate complementary local-global properties from CNN and Transf/Attention architectures, which led to a new era of hybrid models. The past years have witnessed substantial growth in hybrid CNN-Transf/Attention models across diverse MIA problems. In this systematic review, we survey existing hybrid CNN-Transf/Attention models, review and unravel key architectural designs, analyse breakthroughs, and evaluate current and future opportunities as well as challenges. We also introduced an analysis framework on generalisation opportunities of scientific and clinical impact, based on which new data-driven domain generalisation and adaptation methods can be stimulated.
Collapse
|
33
|
Ao Y, Shi W, Ji B, Miao Y, He W, Jiang Z. MS-TCNet: An effective Transformer-CNN combined network using multi-scale feature learning for 3D medical image segmentation. Comput Biol Med 2024; 170:108057. [PMID: 38301516 DOI: 10.1016/j.compbiomed.2024.108057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 12/31/2023] [Accepted: 01/26/2024] [Indexed: 02/03/2024]
Abstract
Medical image segmentation is a fundamental research problem in the field of medical image processing. Recently, the Transformer have achieved highly competitive performance in computer vision. Therefore, many methods combining Transformer with convolutional neural networks (CNNs) have emerged for segmenting medical images. However, these methods cannot effectively capture the multi-scale features in medical images, even though texture and contextual information embedded in the multi-scale features are extremely beneficial for segmentation. To alleviate this limitation, we propose a novel Transformer-CNN combined network using multi-scale feature learning for three-dimensional (3D) medical image segmentation, which is called MS-TCNet. The proposed model utilizes a shunted Transformer and CNN to construct an encoder and pyramid decoder, allowing six different scale levels of feature learning. It captures multi-scale features with refinement at each scale level. Additionally, we propose a novel lightweight multi-scale feature fusion (MSFF) module that can fully fuse the different-scale semantic features generated by the pyramid decoder for each segmentation class, resulting in a more accurate segmentation output. We conducted experiments on three widely used 3D medical image segmentation datasets. The experimental results indicated that our method outperformed state-of-the-art medical image segmentation methods, suggesting its effectiveness, robustness, and superiority. Meanwhile, our model has a smaller number of parameters and lower computational complexity than conventional 3D segmentation networks. The results confirmed that the model is capable of effective multi-scale feature learning and that the learned multi-scale features are useful for improving segmentation performance. We open-sourced our code, which can be found at https://github.com/AustinYuAo/MS-TCNet.
Collapse
Affiliation(s)
- Yu Ao
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, 130022, China
| | - Weili Shi
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, 130022, China; Zhongshan Institute of Changchun University of Science and Technology, Zhongshan, 528437, China
| | - Bai Ji
- Department of Hepatobiliary and Pancreatic Surgery, The First Hospital of Jilin University, Changchun, 130061, China
| | - Yu Miao
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, 130022, China; Zhongshan Institute of Changchun University of Science and Technology, Zhongshan, 528437, China
| | - Wei He
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, 130022, China; Zhongshan Institute of Changchun University of Science and Technology, Zhongshan, 528437, China
| | - Zhengang Jiang
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, 130022, China; Zhongshan Institute of Changchun University of Science and Technology, Zhongshan, 528437, China.
| |
Collapse
|
34
|
Luo X, Zhang H, Huang X, Gong H, Zhang J. DBNet-SI: Dual branch network of shift window attention and inception structure for skin lesion segmentation. Comput Biol Med 2024; 170:108090. [PMID: 38320341 DOI: 10.1016/j.compbiomed.2024.108090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 12/27/2023] [Accepted: 01/27/2024] [Indexed: 02/08/2024]
Abstract
The U-shaped convolutional neural network (CNN) has attained remarkable achievements in the segmentation of skin lesion. However, given the inherent locality of convolution, this architecture cannot capture long-range pixel dependencies and multiscale global contextual information effectively. Moreover, repeated convolutions and downsampling operations can readily result in the omission of intricate local fine-grained details. In this paper, we proposed a U-shaped network (DBNet-SI) equipped with a dual-branch module that combines shift window attention and inception structures. First, we proposed a dual-branch module that combines shift window attention and inception structures (MSI) to better capture multiscale global contextual information and long-range pixel dependencies. Specifically, we have devised a cross-branch bidirectional interaction module within the MSI module to enable information complementarity between the two branches in the channel and spatial dimensions. Therefore, MSI is capable of extracting distinguishing and comprehensive features to accurately identify the skin lesion boundaries. Second, we have devised a progressive feature enhancement and information compensation module (PFEIC), which progressively compensates for fine-grained features through reconstructed skip connections and integrated global context attention modules. The results of the experiment show the superior segmentation performance of DBNet-SI compared with other deep learning models for skin lesion segmentation in the ISIC2017 and ISIC2018 datasets. Ablation studies demonstrate that our model can effectively extract rich multiscale global contextual information and compensate for the loss of local details.
Collapse
Affiliation(s)
- Xuqiong Luo
- School of Mathematics and Statistics, Changsha University of Science and Technology, ChangSha 410114, China
| | - Hao Zhang
- School of Mathematics and Statistics, Changsha University of Science and Technology, ChangSha 410114, China
| | - Xiaofei Huang
- School of Mathematics and Statistics, Changsha University of Science and Technology, ChangSha 410114, China
| | - Hongfang Gong
- School of Mathematics and Statistics, Changsha University of Science and Technology, ChangSha 410114, China.
| | - Jin Zhang
- School of Computer and Communication Engineering, Changsha University of Science and Technology, ChangSha 410114, China
| |
Collapse
|
35
|
Shi J, Wang Z, Ruan S, Zhao M, Zhu Z, Kan H, An H, Xue X, Yan B. Rethinking automatic segmentation of gross target volume from a decoupling perspective. Comput Med Imaging Graph 2024; 112:102323. [PMID: 38171254 DOI: 10.1016/j.compmedimag.2023.102323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 10/19/2023] [Accepted: 12/12/2023] [Indexed: 01/05/2024]
Abstract
Accurate and reliable segmentation of Gross Target Volume (GTV) is critical in cancer Radiation Therapy (RT) planning, but manual delineation is time-consuming and subject to inter-observer variations. Recently, deep learning methods have achieved remarkable success in medical image segmentation. However, due to the low image contrast and extreme pixel imbalance between GTV and adjacent tissues, most existing methods usually obtained limited performance on automatic GTV segmentation. In this paper, we propose a Heterogeneous Cascade Framework (HCF) from a decoupling perspective, which decomposes the GTV segmentation into independent recognition and segmentation subtasks. The former aims to screen out the abnormal slices containing GTV, while the latter performs pixel-wise segmentation of these slices. With the decoupled two-stage framework, we can efficiently filter normal slices to reduce false positives. To further improve the segmentation performance, we design a multi-level Spatial Alignment Network (SANet) based on the feature pyramid structure, which introduces a spatial alignment module into the decoder to compensate for the information loss caused by downsampling. Moreover, we propose a Combined Regularization (CR) loss and Balance-Sampling Strategy (BSS) to alleviate the pixel imbalance problem and improve network convergence. Extensive experiments on two public datasets of StructSeg2019 challenge demonstrate that our method outperforms state-of-the-art methods, especially with significant advantages in reducing false positives and accurately segmenting small objects. The code is available at https://github.com/shijun18/GTV_AutoSeg.
Collapse
Affiliation(s)
- Jun Shi
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China.
| | - Zhaohui Wang
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China.
| | - Shulan Ruan
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China.
| | - Minfan Zhao
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China.
| | - Ziqi Zhu
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China.
| | - Hongyu Kan
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China.
| | - Hong An
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China; Laoshan Laboratory Qingdao, Qindao, 266221, China.
| | - Xudong Xue
- Hubei Cancer Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430074, China.
| | - Bing Yan
- Department of radiation oncology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230001, China.
| |
Collapse
|
36
|
Wang J, Liang J, Xiao Y, Zhou JT, Fang Z, Yang F. TaiChiNet: Negative-Positive Cross-Attention Network for Breast Lesion Segmentation in Ultrasound Images. IEEE J Biomed Health Inform 2024; 28:1516-1527. [PMID: 38206781 DOI: 10.1109/jbhi.2024.3352984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
Breast lesion segmentation in ultrasound images is essential for computer-aided breast-cancer diagnosis. To improve the segmentation performance, most approaches design sophisticated deep-learning models by mining the patterns of foreground lesions and normal backgrounds simultaneously or by unilaterally enhancing foreground lesions via various focal losses. However, the potential of normal backgrounds is underutilized, which could reduce false positives by compacting the feature representation of all normal backgrounds. From a novel viewpoint of bilateral enhancement, we propose a negative-positive cross-attention network to concentrate on normal backgrounds and foreground lesions, respectively. Derived from the complementing opposites of bipolarity in TaiChi, the network is denoted as TaiChiNet, which consists of the negative normal-background and positive foreground-lesion paths. To transmit the information across the two paths, a cross-attention module, a complementary MLP-head, and a complementary loss are built for deep-layer features, shallow-layer features, and mutual-learning supervision, separately. To the best of our knowledge, this is the first work to formulate breast lesion segmentation as a mutual supervision task from the foreground-lesion and normal-background views. Experimental results have demonstrated the effectiveness of TaiChiNet on two breast lesion segmentation datasets with a lightweight architecture. Furthermore, extensive experiments on the thyroid nodule segmentation and retinal optic cup/disc segmentation datasets indicate the application potential of TaiChiNet.
Collapse
|
37
|
Zhong S, Zhou H, Zheng Z, Ma Z, Zhang F, Duan J. Hierarchical attention-guided multiscale aggregation network for infrared small target detection. Neural Netw 2024; 171:485-496. [PMID: 38157732 DOI: 10.1016/j.neunet.2023.12.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 11/18/2023] [Accepted: 12/20/2023] [Indexed: 01/03/2024]
Abstract
All man-made flying objects in the sky, ships in the ocean can be regarded as small infrared targets, and the method of tracking them has been received widespread attention in recent years. In search of a further efficient method for infrared small target recognition, we propose a hierarchical attention-guided multiscale aggregation network (HAMANet) in this thesis. The proposed HAMANet mainly consists of a compound guide multilayer perceptron (CG-MLP) block embedded in the backbone net, a spatial-interactive attention module (SiAM), a pixel-interactive attention module (PiAM) and a contextual fusion module (CFM). The CG-MLP marked the width-axis, height-axis, and channel-axis, which can result in a better segmentation effect while reducing computational complexity. SiAM improves global semantic information exchange by increasing the connections between different channels, while PiAM changes the extraction of local key information features by enhancing information exchange at the pixel level. CFM fuses low-level positional information and high-level channel information of the target through coding to improve network stability and target feature utilization. Compared with other state-of-the-art methods on public infrared small target datasets, the results show that our proposed HAMANet has high detection accuracy and a low false-alarm rate.
Collapse
Affiliation(s)
- Shunshun Zhong
- State Key Laboratory of Precision Manufacturing for Extreme Service Performance, College of Mechanical and Electrical Engineering, Central South University, Changsha 410083, China
| | - Haibo Zhou
- State Key Laboratory of Precision Manufacturing for Extreme Service Performance, College of Mechanical and Electrical Engineering, Central South University, Changsha 410083, China
| | - Zhongxu Zheng
- College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410003, China
| | - Zhu Ma
- State Key Laboratory of Precision Manufacturing for Extreme Service Performance, College of Mechanical and Electrical Engineering, Central South University, Changsha 410083, China
| | - Fan Zhang
- School of Automation, Central South University, Changsha 410083, China.
| | - Ji'an Duan
- State Key Laboratory of Precision Manufacturing for Extreme Service Performance, College of Mechanical and Electrical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
38
|
Gao X, Jiang B, Wang X, Huang L, Tu Z. Chest x-ray diagnosis via spatial-channel high-order attention representation learning. Phys Med Biol 2024; 69:045026. [PMID: 38347732 DOI: 10.1088/1361-6560/ad2014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 01/18/2024] [Indexed: 02/15/2024]
Abstract
Objective. Chest x-ray image representation and learning is an important problem in computer-aided diagnostic area. Existing methods usually adopt CNN or Transformers for feature representation learning and focus on learning effective representations for chest x-ray images. Although good performance can be obtained, however, these works are still limited mainly due to the ignorance of mining the correlations of channels and pay little attention on the local context-aware feature representation of chest x-ray image.Approach. To address these problems, in this paper, we propose a novel spatial-channel high-order attention model (SCHA) for chest x-ray image representation and diagnosis. The proposed network architecture mainly contains three modules, i.e. CEBN, SHAM and CHAM. To be specific, firstly, we introduce a context-enhanced backbone network by employing multi-head self-attention to extract initial features for the input chest x-ray images. Then, we develop a novel SCHA which contains both spatial and channel high-order attention learning branches. For the spatial branch, we develop a novel local biased self-attention mechanism which can capture both local and long-range global dependences of positions to learn rich context-aware representation. For the channel branch, we employ Brownian Distance Covariance to encode the correlation information of channels and regard it as the image representation. Finally, the two learning branches are integrated together for the final multi-label diagnosis classification and prediction.Main results. Experiments on the commonly used datasets including ChestX-ray14 and CheXpert demonstrate that our proposed SCHA approach can obtain better performance when comparing many related approaches.Significance. This study obtains a more discriminative method for chest x-ray classification and provides a technique for computer-aided diagnosis.
Collapse
Affiliation(s)
- Xinyue Gao
- The School of Computer Science and Technology, Anhui University, Hefei 230601, People's Republic of China
| | - Bo Jiang
- The School of Computer Science and Technology, Anhui University, Hefei 230601, People's Republic of China
| | - Xixi Wang
- The School of Computer Science and Technology, Anhui University, Hefei 230601, People's Republic of China
| | - Lili Huang
- The School of Computer Science and Technology, Anhui University, Hefei 230601, People's Republic of China
| | - Zhengzheng Tu
- The School of Computer Science and Technology, Anhui University, Hefei 230601, People's Republic of China
| |
Collapse
|
39
|
Ling Y, Wang Y, Dai W, Yu J, Liang P, Kong D. MTANet: Multi-Task Attention Network for Automatic Medical Image Segmentation and Classification. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:674-685. [PMID: 37725719 DOI: 10.1109/tmi.2023.3317088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/21/2023]
Abstract
Medical image segmentation and classification are two of the most key steps in computer-aided clinical diagnosis. The region of interest were usually segmented in a proper manner to extract useful features for further disease classification. However, these methods are computationally complex and time-consuming. In this paper, we proposed a one-stage multi-task attention network (MTANet) which efficiently classifies objects in an image while generating a high-quality segmentation mask for each medical object. A reverse addition attention module was designed in the segmentation task to fusion areas in global map and boundary cues in high-resolution features, and an attention bottleneck module was used in the classification task for image feature and clinical feature fusion. We evaluated the performance of MTANet with CNN-based and transformer-based architectures across three imaging modalities for different tasks: CVC-ClinicDB dataset for polyp segmentation, ISIC-2018 dataset for skin lesion segmentation, and our private ultrasound dataset for liver tumor segmentation and classification. Our proposed model outperformed state-of-the-art models on all three datasets and was superior to all 25 radiologists for liver tumor diagnosis.
Collapse
|
40
|
Zhang D, Fan X, Kang X, Tian S, Xiao G, Yu L, Wu W. Class key feature extraction and fusion for 2D medical image segmentation. Med Phys 2024; 51:1263-1276. [PMID: 37552522 DOI: 10.1002/mp.16636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 06/28/2023] [Accepted: 07/07/2023] [Indexed: 08/09/2023] Open
Abstract
BACKGROUND The size variation, complex semantic environment and high similarity in medical images often prevent deep learning models from achieving good performance. PURPOSE To overcome these problems and improve the model segmentation performance and generalizability. METHODS We propose the key class feature reconstruction module (KCRM), which ranks channel weights and selects key features (KFs) that contribute more to the segmentation results for each class. Meanwhile, KCRM reconstructs all local features to establish the dependence relationship from local features to KFs. In addition, we propose the spatial gating module (SGM), which employs KFs to generate two spatial maps to suppress irrelevant regions, strengthening the ability to locate semantic objects. Finally, we enable the model to adapt to size variations by diversifying the receptive field. RESULTS We integrate these modules into class key feature extraction and fusion network (CKFFNet) and validate its performance on three public medical datasets: CHAOS, UW-Madison, and ISIC2017. The experimental results show that our method achieves better segmentation results and generalizability than those of mainstream methods. CONCLUSION Through quantitative and qualitative research, the proposed module improves the segmentation results and enhances the model generalizability, making it suitable for application and expansion.
Collapse
Affiliation(s)
- Dezhi Zhang
- Department of Dermatology and Venereology, People's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang Clinical Research Center For Dermatologic Diseases, Xinjiang Key Laboratory of Dermatology Research (XJYS1707), Urmuqi, China
| | - Xin Fan
- College of Software, Xinjiang University, Urmuqi, Xinjiang, China
| | - Xiaojing Kang
- Department of Dermatology and Venereology, People's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang Clinical Research Center For Dermatologic Diseases, Xinjiang Key Laboratory of Dermatology Research (XJYS1707), Urmuqi, China
| | - Shengwei Tian
- College of Software, Xinjiang University, Urmuqi, Xinjiang, China
- Key Laboratory of Software Engineering Technology, College of Software, Xin Jiang University, Urumqi, China
| | - Guangli Xiao
- College of Software, Xinjiang University, Urmuqi, Xinjiang, China
| | - Long Yu
- College of Network Center, Xinjiang University, Urumqi, China
- Signal and Signal Processing Laboratory, College of Information Science and Engineering, Xinjiang University, Urumqi, China
| | - Weidong Wu
- Department of Dermatology and Venereology, People's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang Clinical Research Center For Dermatologic Diseases, Xinjiang Key Laboratory of Dermatology Research (XJYS1707), Urmuqi, China
| |
Collapse
|
41
|
Zhao J, Sun L, Sun Z, Zhou X, Si H, Zhang D. MSEF-Net: Multi-scale edge fusion network for lumbosacral plexus segmentation with MR image. Artif Intell Med 2024; 148:102771. [PMID: 38325928 DOI: 10.1016/j.artmed.2024.102771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 12/08/2023] [Accepted: 01/14/2024] [Indexed: 02/09/2024]
Abstract
Nerve damage of spine areas is a common cause of disability and paralysis. The lumbosacral plexus segmentation from magnetic resonance imaging (MRI) scans plays an important role in many computer-aided diagnoses and surgery of spinal nerve lesions. Due to the complex structure and low contrast of the lumbosacral plexus, it is difficult to delineate the regions of edges accurately. To address this issue, we propose a Multi-Scale Edge Fusion Network (MSEF-Net) to fully enhance the edge feature in the encoder and adaptively fuse multi-scale features in the decoder. Specifically, to highlight the edge structure feature, we propose an edge feature fusion module (EFFM) by combining the Sobel operator edge detection and the edge-guided attention module (EAM), respectively. To adaptively fuse the multi-scale feature map in the decoder, we introduce an adaptive multi-scale fusion module (AMSF). Our proposed MSEF-Net method was evaluated on the collected spinal MRI dataset with 89 patients (a total of 2848 MR images). Experimental results demonstrate that our MSEF-Net is effective for lumbosacral plexus segmentation with MR images, when compared with several state-of-the-art segmentation methods.
Collapse
Affiliation(s)
- Junyong Zhao
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, the Key Laboratory of Brain-Machine Intelligence Technology, Ministry of Education, Nanjing 211106, China
| | - Liang Sun
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, the Key Laboratory of Brain-Machine Intelligence Technology, Ministry of Education, Nanjing 211106, China; Nanjing University of Aeronautics and Astronautics Shenzhen Research Institute, Shenzhen 518063, China.
| | - Zhi Sun
- Department of Medical Imaging, Shandong Provincial Hospital, Jinan 250021, China
| | - Xin Zhou
- Department of Orthopedics, Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan 250012, China
| | - Haipeng Si
- Department of Orthopedics, Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan 250012, China.
| | - Daoqiang Zhang
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, the Key Laboratory of Brain-Machine Intelligence Technology, Ministry of Education, Nanjing 211106, China; Nanjing University of Aeronautics and Astronautics Shenzhen Research Institute, Shenzhen 518063, China.
| |
Collapse
|
42
|
Khan R, Zaman A, Chen C, Xiao C, Zhong W, Liu Y, Hassan H, Su L, Xie W, Kang Y, Huang B. MLAU-Net: Deep supervised attention and hybrid loss strategies for enhanced segmentation of low-resolution kidney ultrasound. Digit Health 2024; 10:20552076241291306. [PMID: 39559387 PMCID: PMC11571257 DOI: 10.1177/20552076241291306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 09/25/2024] [Indexed: 11/20/2024] Open
Abstract
Objective The precise segmentation of kidneys from a 2D ultrasound (US) image is crucial for diagnosing and monitoring kidney diseases. However, achieving detailed segmentation is difficult due to US images' low signal-to-noise ratio and low-contrast object boundaries. Methods This paper presents an approach called deep supervised attention with multi-loss functions (MLAU-Net) for US segmentation. The MLAU-Net model combines the benefits of attention mechanisms and deep supervision to improve segmentation accuracy. The attention mechanism allows the model to selectively focus on relevant regions of the kidney and ignore irrelevant background information, while the deep supervision captures the high-dimensional structure of the kidney in US images. Results We conducted experiments on two datasets to evaluate the MLAU-Net model's performance. The Wuerzburg Dynamic Kidney Ultrasound (WD-KUS) dataset with annotation contained kidney US images from 176 patients split into training and testing sets totaling 44,880. The Open Kidney Dataset's second dataset has over 500 B-mode abdominal US images. The proposed approach achieved the highest dice, accuracy, specificity, Hausdorff distance (HD95), recall, and Average Symmetric Surface Distance (ASSD) scores of 90.2%, 98.26%, 98.93%, 8.90 mm, 91.78%, and 2.87 mm, respectively, upon testing and comparison with state-of-the-art U-Net series segmentation frameworks, which demonstrates the potential clinical value of our work. Conclusion The proposed MLAU-Net model has the potential to be applied to other medical image segmentation tasks that face similar challenges of low signal-to-noise ratios and low-contrast object boundaries.
Collapse
Affiliation(s)
- Rashid Khan
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China
- College of Applied Sciences, Shenzhen University, Shenzhen, China
- Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen, China
| | - Asim Zaman
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China
- School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China
| | - Chao Chen
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China
- College of Applied Sciences, Shenzhen University, Shenzhen, China
| | - Chuda Xiao
- Wuerzburg Dynamics Inc., Shenzhen, China
| | - Wen Zhong
- Department of Urology, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Yang Liu
- Department of Urology, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Haseeb Hassan
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China
| | - Liyilei Su
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China
- College of Applied Sciences, Shenzhen University, Shenzhen, China
- Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen, China
| | - Weiguo Xie
- Wuerzburg Dynamics Inc., Shenzhen, China
| | - Yan Kang
- College of Applied Sciences, Shenzhen University, Shenzhen, China
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China
| | - Bingding Huang
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China
| |
Collapse
|
43
|
Zhao Y, Li J, Ren L, Chen Z. DTAN: Diffusion-based Text Attention Network for medical image segmentation. Comput Biol Med 2024; 168:107728. [PMID: 37984203 DOI: 10.1016/j.compbiomed.2023.107728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 11/09/2023] [Accepted: 11/15/2023] [Indexed: 11/22/2023]
Abstract
In the current era, diffusion models have emerged as a groundbreaking force in the realm of medical image segmentation. Against this backdrop, we introduce the Diffusion Text-Attention Network (DTAN), a pioneering segmentation framework that amalgamates the principles of text attention with diffusion models to enhance the precision and integrity of medical image segmentation. Our proposed DTAN architecture is designed to steer the segmentation process towards areas of interest by leveraging a text attention mechanism. This mechanism is adept at identifying and zeroing in on the regions of significance, thus improving the accuracy and robustness of the segmentation. In parallel, the integration of a diffusion model serves to diminish the influence of noise and irrelevant background data in medical images, thereby improving the quality of the segmentation results. The diffusion model is instrumental in filtering out extraneous factors, allowing the network to more effectively capture the nuances and characteristics of the target regions, which in turn enhances segmentation precision. We have subjected DTAN to rigorous evaluation across three datasets: Kvasir-Sessile, Kvasir-SEG, and GlaS. Our focus was particularly drawn to the Kvasir-Sessile dataset due to its relevance to clinical applications. When benchmarked against other state-of-the-art methods, our approach demonstrated significant improvements on the Kvasir-Sessile dataset, with a 2.77% increase in mean Intersection over Union (mIoU) and a 3.06% increase in mean Dice Similarity Coefficient (mDSC). These results provide strong evidence of the DTAN's generalizability and robustness, and its distinct advantages in the task of medical image segmentation.
Collapse
Affiliation(s)
- Yiyang Zhao
- School of Information and electronic engineering, Shandong Technology and Business University, Yantai, China
| | - Jinjiang Li
- School of Information and electronic engineering, Shandong Technology and Business University, Yantai, China
| | - Lu Ren
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Zheng Chen
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China.
| |
Collapse
|
44
|
Pinto-Coelho L. How Artificial Intelligence Is Shaping Medical Imaging Technology: A Survey of Innovations and Applications. Bioengineering (Basel) 2023; 10:1435. [PMID: 38136026 PMCID: PMC10740686 DOI: 10.3390/bioengineering10121435] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 12/12/2023] [Accepted: 12/15/2023] [Indexed: 12/24/2023] Open
Abstract
The integration of artificial intelligence (AI) into medical imaging has guided in an era of transformation in healthcare. This literature review explores the latest innovations and applications of AI in the field, highlighting its profound impact on medical diagnosis and patient care. The innovation segment explores cutting-edge developments in AI, such as deep learning algorithms, convolutional neural networks, and generative adversarial networks, which have significantly improved the accuracy and efficiency of medical image analysis. These innovations have enabled rapid and accurate detection of abnormalities, from identifying tumors during radiological examinations to detecting early signs of eye disease in retinal images. The article also highlights various applications of AI in medical imaging, including radiology, pathology, cardiology, and more. AI-based diagnostic tools not only speed up the interpretation of complex images but also improve early detection of disease, ultimately delivering better outcomes for patients. Additionally, AI-based image processing facilitates personalized treatment plans, thereby optimizing healthcare delivery. This literature review highlights the paradigm shift that AI has brought to medical imaging, highlighting its role in revolutionizing diagnosis and patient care. By combining cutting-edge AI techniques and their practical applications, it is clear that AI will continue shaping the future of healthcare in profound and positive ways.
Collapse
Affiliation(s)
- Luís Pinto-Coelho
- ISEP—School of Engineering, Polytechnic Institute of Porto, 4200-465 Porto, Portugal;
- INESCTEC, Campus of the Engineering Faculty of the University of Porto, 4200-465 Porto, Portugal
| |
Collapse
|
45
|
Li S, Feng Y, Xu H, Miao Y, Lin Z, Liu H, Xu Y, Li F. CAENet: Contrast adaptively enhanced network for medical image segmentation based on a differentiable pooling function. Comput Biol Med 2023; 167:107578. [PMID: 37918260 DOI: 10.1016/j.compbiomed.2023.107578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 10/03/2023] [Accepted: 10/15/2023] [Indexed: 11/04/2023]
Abstract
Pixel differences between classes with low contrast in medical image semantic segmentation tasks often lead to confusion in category classification, posing a typical challenge for recognition of small targets. To address this challenge, we propose a Contrastive Adaptive Augmented Semantic Segmentation Network with a differentiable pooling function. Firstly, an Adaptive Contrast Augmentation module is constructed to automatically extract local high-frequency information, thereby enhancing image details and accentuating the differences between classes. Subsequently, the Frequency-Efficient Channel Attention mechanism is designed to select useful features in the encoding phase, where multifrequency information is employed to extract channel features. One-dimensional convolutional cross-channel interactions are adopted to reduce model complexity. Finally, a differentiable approximation of max pooling is introduced in order to replace standard max pooling, strengthening the connectivity between neurons and reducing information loss caused by downsampling. We evaluated the effectiveness of our proposed method through several ablation experiments and comparison experiments under homogeneous conditions. The experimental results demonstrate that our method competes favorably with other state-of-the-art networks on five medical image datasets, including four public medical image datasets and one clinical image dataset. It can be effectively applied to medical image segmentation.
Collapse
Affiliation(s)
- Shengke Li
- Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen, 529020, Guangdong, China; School of Engineering, Guangzhou College of Technology and Business, Foshan, 528100, Guangdong, China
| | - Yue Feng
- Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen, 529020, Guangdong, China.
| | - Hong Xu
- Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen, 529020, Guangdong, China; Victoria University, Melbourne, 8001, Australia
| | - Yuan Miao
- Victoria University, Melbourne, 8001, Australia
| | - Zhuosheng Lin
- Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen, 529020, Guangdong, China
| | - Huilin Liu
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Ying Xu
- Laboratory of TCM Four Processing, Shanghai University of TCM, Shanghai, 201203, China
| | - Fufeng Li
- Laboratory of TCM Four Processing, Shanghai University of TCM, Shanghai, 201203, China.
| |
Collapse
|
46
|
Peng K, Li Y, Xia Q, Liu T, Shi X, Chen D, Li L, Zhao H, Xiao H. MSMCNet: Differential context drives accurate localization and edge smoothing of lesions for medical image segmentation. Comput Biol Med 2023; 167:107624. [PMID: 37922605 DOI: 10.1016/j.compbiomed.2023.107624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 09/24/2023] [Accepted: 10/23/2023] [Indexed: 11/07/2023]
Abstract
Medical image segmentation plays a crucial role in clinical assistance for diagnosis. The UNet-based network architecture has achieved tremendous success in the field of medical image segmentation. However, most methods commonly employ element-wise addition or channel merging to fuse features, resulting in smaller differentiation of feature information and excessive redundancy. Consequently, this leads to issues such as inaccurate lesion localization and blurred boundaries in segmentation. To alleviate these problems, the Multi-scale Subtraction and Multi-key Context Conversion Networks (MSMCNet) are proposed for medical image segmentation. Through the construction of differentiated contextual representations, MSMCNet emphasizes vital information and achieves precise medical image segmentation by accurately localizing lesions and enhancing boundary perception. Specifically, the construction of differentiated contextual representations is accomplished through the proposed Multi-scale Non-crossover Subtraction (MSNS) module and Multi-key Context Conversion Module (MCCM). The MSNS module utilizes the context of MCCM coding and redistribute the value of feature map pixels. Extensive experiments were conducted on widely used public datasets, including the ISIC-2018 dataset, COVID-19-CT-Seg dataset, Kvasir dataset, as well as a privately constructed traumatic brain injury dataset. The experimental results demonstrated that our proposed MSMCNet outperforms state-of-the-art medical image segmentation methods across different evaluation metrics.
Collapse
Affiliation(s)
- Ke Peng
- College of Artificial Intelligent, Chongqing University of Technology, Chongqing 401135, China
| | - Yulin Li
- College of Artificial Intelligent, Chongqing University of Technology, Chongqing 401135, China
| | - Qingling Xia
- College of Artificial Intelligent, Chongqing University of Technology, Chongqing 401135, China; Department of Radiology, Chongqing University Cancer Hospital, School of Medicine, Chongqing University, Chongqing, 400030, China.
| | - Tianqi Liu
- College of Artificial Intelligent, Chongqing University of Technology, Chongqing 401135, China
| | - Xinyi Shi
- College of Artificial Intelligent, Chongqing University of Technology, Chongqing 401135, China
| | - Diyou Chen
- Institute for Traffic Medicine, Daping Hospital, Army Medical University, Chongqing 400042, China; Department of Radiology, Daping Hospital, Army Medical University, Chongqing 400042, China
| | - Li Li
- College of Artificial Intelligent, Chongqing University of Technology, Chongqing 401135, China
| | - Hui Zhao
- Institute for Traffic Medicine, Daping Hospital, Army Medical University, Chongqing 400042, China
| | - Hanguang Xiao
- College of Artificial Intelligent, Chongqing University of Technology, Chongqing 401135, China.
| |
Collapse
|
47
|
Wu S, Cao Y, Li X, Liu Q, Ye Y, Liu X, Zeng L, Tian M. Attention-guided multi-scale context aggregation network for multi-modal brain glioma segmentation. Med Phys 2023; 50:7629-7640. [PMID: 37151131 DOI: 10.1002/mp.16452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 03/17/2023] [Accepted: 03/20/2023] [Indexed: 05/09/2023] Open
Abstract
BACKGROUND Accurate segmentation of brain glioma is a critical prerequisite for clinical diagnosis, surgical planning and treatment evaluation. In current clinical workflow, physicians typically perform delineation of brain tumor subregions slice-by-slice, which is more susceptible to variabilities in raters and also time-consuming. Besides, even though convolutional neural networks (CNNs) are driving progress, the performance of standard models still have some room for further improvement. PURPOSE To deal with these issues, this paper proposes an attention-guided multi-scale context aggregation network (AMCA-Net) for the accurate segmentation of brain glioma in the magnetic resonance imaging (MRI) images with multi-modalities. METHODS AMCA-Net extracts the multi-scale features from the MRI images and fuses the extracted discriminative features via a self-attention mechanism for brain glioma segmentation. The extraction is performed via a series of down-sampling, convolution layers, and the global context information guidance (GCIG) modules are developed to fuse the features extracted for contextual features. At the end of the down-sampling, a multi-scale fusion (MSF) module is designed to exploit and combine all the extracted multi-scale features. Each of the GCIG and MSF modules contain a channel attention (CA) module that can adaptively calibrate feature responses and emphasize the most relevant features. Finally, multiple predictions with different resolutions are fused through different weightings given by a multi-resolution adaptation (MRA) module instead of the use of averaging or max-pooling to improve the final segmentation results. RESULTS Datasets used in this paper are publicly accessible, that is, the Multimodal Brain Tumor Segmentation Challenges 2018 (BraTS2018) and 2019 (BraTS2019). BraTS2018 contains 285 patient cases and BraTS2019 contains 335 cases. Simulations show that the AMCA-Net has better or comparable performance against that of the other state-of-the-art models. In terms of the Dice score and Hausdorff 95 for the BraTS2018 dataset, 90.4% and 10.2 mm for the whole tumor region (WT), 83.9% and 7.4 mm for the tumor core region (TC), 80.2% and 4.3 mm for the enhancing tumor region (ET), whereas the Dice score and Hausdorff 95 for the BraTS2019 dataset, 91.0% and 10.7 mm for the WT, 84.2% and 8.4 mm for the TC, 80.1% and 4.8 mm for the ET. CONCLUSIONS The proposed AMCA-Net performs comparably well in comparison to several state-of-the-art neural net models in identifying the areas involving the peritumoral edema, enhancing tumor, and necrotic and non-enhancing tumor core of brain glioma, which has great potential for clinical practice. In future research, we will further explore the feasibility of applying AMCA-Net to other similar segmentation tasks.
Collapse
Affiliation(s)
- Shaozhi Wu
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Yunjian Cao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Xinke Li
- West China School of Medicine, Sichuan University, Chengdu, China
| | - Qiyu Liu
- Radiology Department, Mianyang Central Hospital, Mianyang, China
| | - Yuyun Ye
- Department of Electrical and Computer Engineering, University of Tulsa, Tulsa, Oklahoma, USA
| | - Xingang Liu
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Liaoyuan Zeng
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Miao Tian
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
48
|
Wang Z, Zhu J, Fu S, Ye Y. Context fusion network with multi-scale-aware skip connection and twin-split attention for liver tumor segmentation. Med Biol Eng Comput 2023; 61:3167-3180. [PMID: 37470963 DOI: 10.1007/s11517-023-02876-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 06/20/2023] [Indexed: 07/21/2023]
Abstract
Manually annotating liver tumor contours is a time-consuming and labor-intensive task for clinicians. Therefore, automated segmentation is urgently needed in clinical diagnosis. However, automatic segmentation methods face certain challenges due to heterogeneity, fuzzy boundaries, and irregularity of tumor tissue. In this paper, a novel deep learning-based approach with multi-scale-aware (MSA) module and twin-split attention (TSA) module is proposed for tumor segmentation. The MSA module can bridge the semantic gap and reduce the loss of detailed information. The TSA module can recalibrate the channel response of the feature map. Eventually, we can count tumors based on the segmentation results from a 3D perspective for cancer grading. Extensive experiments conducted on the LiTS2017 dataset show the effectiveness of the proposed method by achieving a Dice index of 85.97% and a Jaccard index of 81.56% over the state of the art. In addition, the proposed method also achieved a Dice index of 83.67% and a Jaccard index of 80.11% in 3Dircadb dataset verification, which further reflects its robustness and generalization ability.
Collapse
Affiliation(s)
- Zhendong Wang
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Jiehua Zhu
- Department of Mathematical Sciences, Georgia Southern University, Statesboro, GA, 30460, USA
| | - Shujun Fu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Yangbo Ye
- Department of Mathematics, The University of Iowa, Iowa City, IA, 52242, USA.
| |
Collapse
|
49
|
Zhang X. Image denoising and segmentation model construction based on IWOA-PCNN. Sci Rep 2023; 13:19848. [PMID: 37963960 PMCID: PMC10645996 DOI: 10.1038/s41598-023-47089-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 11/08/2023] [Indexed: 11/16/2023] Open
Abstract
The research suggests a method to improve the present pulse coupled neural network (PCNN), which has a complex structure and unsatisfactory performance in image denoising and image segmentation. Then, a multi strategy collaborative improvement whale optimization algorithm (WOA) is proposed, and an improved whale optimization algorithm (IWOA) is constructed. IWOA is used to find the optimal parameter values of PCNN to optimize PCNN. By combining the aforementioned components, the IWOA-PCNN model had the best image denoising performance, and the produced images were crisper and preserve more information. IWOA-PCNN processed pictures have an average PSNR of 35.87 and an average MSE of 0.24. The average processing time for photos with noise is typically 24.80 s, which is 7.30 s and 7.76 s faster than the WTGAN and IGA-NLM models, respectively. Additionally, the average NU value measures 0.947, and the average D value exceeds 1000. The aforementioned findings demonstrate that the suggested method can successfully enhance the PCNN, improving its capability for image denoising and image segmentation. This can, in part, encourage the use and advancement of the PCNN.
Collapse
Affiliation(s)
- Xiaojun Zhang
- College of Software Technology, Henan Finance University, Zhengzhou, 450000, China.
| |
Collapse
|
50
|
Chen B, Jin J, Liu H, Yang Z, Zhu H, Wang Y, Lin J, Wang S, Chen S. Trends and hotspots in research on medical images with deep learning: a bibliometric analysis from 2013 to 2023. Front Artif Intell 2023; 6:1289669. [PMID: 38028662 PMCID: PMC10665961 DOI: 10.3389/frai.2023.1289669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 10/27/2023] [Indexed: 12/01/2023] Open
Abstract
Background With the rapid development of the internet, the improvement of computer capabilities, and the continuous advancement of algorithms, deep learning has developed rapidly in recent years and has been widely applied in many fields. Previous studies have shown that deep learning has an excellent performance in image processing, and deep learning-based medical image processing may help solve the difficulties faced by traditional medical image processing. This technology has attracted the attention of many scholars in the fields of computer science and medicine. This study mainly summarizes the knowledge structure of deep learning-based medical image processing research through bibliometric analysis and explores the research hotspots and possible development trends in this field. Methods Retrieve the Web of Science Core Collection database using the search terms "deep learning," "medical image processing," and their synonyms. Use CiteSpace for visual analysis of authors, institutions, countries, keywords, co-cited references, co-cited authors, and co-cited journals. Results The analysis was conducted on 562 highly cited papers retrieved from the database. The trend chart of the annual publication volume shows an upward trend. Pheng-Ann Heng, Hao Chen, and Klaus Hermann Maier-Hein are among the active authors in this field. Chinese Academy of Sciences has the highest number of publications, while the institution with the highest centrality is Stanford University. The United States has the highest number of publications, followed by China. The most frequent keyword is "Deep Learning," and the highest centrality keyword is "Algorithm." The most cited author is Kaiming He, and the author with the highest centrality is Yoshua Bengio. Conclusion The application of deep learning in medical image processing is becoming increasingly common, and there are many active authors, institutions, and countries in this field. Current research in medical image processing mainly focuses on deep learning, convolutional neural networks, classification, diagnosis, segmentation, image, algorithm, and artificial intelligence. The research focus and trends are gradually shifting toward more complex and systematic directions, and deep learning technology will continue to play an important role.
Collapse
Affiliation(s)
- Borui Chen
- First School of Clinical Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, China
| | - Jing Jin
- College of Rehabilitation Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, China
| | - Haichao Liu
- College of Rehabilitation Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, China
| | - Zhengyu Yang
- College of Rehabilitation Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, China
| | - Haoming Zhu
- College of Rehabilitation Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, China
| | - Yu Wang
- First School of Clinical Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, China
| | - Jianping Lin
- The School of Health, Fujian Medical University, Fuzhou, China
| | - Shizhong Wang
- The School of Health, Fujian Medical University, Fuzhou, China
| | - Shaoqing Chen
- College of Rehabilitation Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, China
| |
Collapse
|