1
|
Wang H, Ahn E, Bi L, Kim J. Self-supervised multi-modality learning for multi-label skin lesion classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 265:108729. [PMID: 40184849 DOI: 10.1016/j.cmpb.2025.108729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 03/10/2025] [Accepted: 03/16/2025] [Indexed: 04/07/2025]
Abstract
BACKGROUND The clinical diagnosis of skin lesions involves the analysis of dermoscopic and clinical modalities. Dermoscopic images provide detailed views of surface structures, while clinical images offer complementary macroscopic information. Clinicians frequently use the seven-point checklist as an auxiliary tool for melanoma diagnosis and identifying lesion attributes. Supervised deep learning approaches, such as convolutional neural networks, have performed well using dermoscopic and clinical modalities (multi-modality) and further enhanced classification by predicting seven skin lesion attributes (multi-label). However, the performance of these approaches is reliant on the availability of large-scale labeled data, which are costly and time-consuming to obtain, more so with annotating multi-attributes METHODS:: To reduce the dependency on large labeled datasets, we propose a self-supervised learning (SSL) algorithm for multi-modality multi-label skin lesion classification. Compared with single-modality SSL, our algorithm enables multi-modality SSL by maximizing the similarities between paired dermoscopic and clinical images from different views. We introduce a novel multi-modal and multi-label SSL strategy that generates surrogate pseudo-multi-labels for seven skin lesion attributes through clustering analysis. A label-relation-aware module is proposed to refine each pseudo-label embedding, capturing the interrelationships between pseudo-multi-labels. We further illustrate the interrelationships of skin lesion attributes and their relationships with clinical diagnoses using an attention visualization technique. RESULTS The proposed algorithm was validated using the well-benchmarked seven-point skin lesion dataset. Our results demonstrate that our method outperforms the state-of-the-art SSL counterparts. Improvements in the area under receiver operating characteristic curve, precision, sensitivity, and specificity were observed across various lesion attributes and melanoma diagnoses. CONCLUSIONS Our self-supervised learning algorithm offers a robust and efficient solution for multi-modality multi-label skin lesion classification, reducing the reliance on large-scale labeled data. By effectively capturing and leveraging the complementary information between the dermoscopic and clinical images and interrelationships between lesion attributes, our approach holds the potential for improving clinical diagnosis accuracy in dermatology.
Collapse
Affiliation(s)
- Hao Wang
- School of Computer Science, Faculty of Engineering, The University of Sydney, Sydney, NSW 2006, Australia; Institute of Translational Medicine, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China.
| | - Euijoon Ahn
- College of Science and Engineering, James Cook University, Cairns, QLD 4870, Australia.
| | - Lei Bi
- Institute of Translational Medicine, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China.
| | - Jinman Kim
- School of Computer Science, Faculty of Engineering, The University of Sydney, Sydney, NSW 2006, Australia.
| |
Collapse
|
2
|
Liu Y, Yuan D, Xu Z, Zhan Y, Zhang H, Lu J, Lukasiewicz T. Pixel level deep reinforcement learning for accurate and robust medical image segmentation. Sci Rep 2025; 15:8213. [PMID: 40064951 PMCID: PMC11894052 DOI: 10.1038/s41598-025-92117-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Accepted: 02/25/2025] [Indexed: 03/14/2025] Open
Abstract
Existing deep learning methods have achieved significant success in medical image segmentation. However, this success largely relies on stacking advanced modules and architectures, which has created a path dependency. This path dependency is unsustainable, as it leads to increasingly larger model parameters and higher deployment costs. To break this path dependency, we introduce deep reinforcement learning to enhance segmentation performance. However, current deep reinforcement learning methods face challenges such as high training cost, independent iterative processes, and high uncertainty of segmentation masks. Consequently, we propose a Pixel-level Deep Reinforcement Learning model with pixel-by-pixel Mask Generation (PixelDRL-MG) for more accurate and robust medical image segmentation. PixelDRL-MG adopts a dynamic iterative update policy, directly segmenting the regions of interest without requiring user interaction or coarse segmentation masks. We propose a Pixel-level Asynchronous Advantage Actor-Critic (PA3C) strategy to treat each pixel as an agent whose state (foreground or background) is iteratively updated through direct actions. Our experiments on two commonly used medical image segmentation datasets demonstrate that PixelDRL-MG achieves more superior segmentation performances than the state-of-the-art segmentation baselines (especially in boundaries) using significantly fewer model parameters. We also conducted detailed ablation studies to enhance understanding and facilitate practical application. Additionally, PixelDRL-MG performs well in low-resource settings (i.e., 50-shot or 100-shot), making it an ideal choice for real-world scenarios.
Collapse
Affiliation(s)
- Yunxin Liu
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China
- Tianjin Key Laboratory of Bioelectromagnetic Technology and Intelligent Health, Hebei University of Technology, Tianjin, China
| | - Di Yuan
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China
- Tianjin Key Laboratory of Bioelectromagnetic Technology and Intelligent Health, Hebei University of Technology, Tianjin, China
| | - Zhenghua Xu
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China.
- Tianjin Key Laboratory of Bioelectromagnetic Technology and Intelligent Health, Hebei University of Technology, Tianjin, China.
| | - Yuefu Zhan
- The Third People's Hospital of Longgang District Shenzhen, Shenzhen, China.
- The Seventh People's Hospital of Chongqing, No. 1, Village 1, Lijiatuo Labor Union, Banan District, Chongqing, China.
- Longgang Institute of Medical Imaging, Shantou University Medical College, Shenzhen, China.
- Hainan Women and Children's Medical Center, Hainan, China.
| | - Hongwei Zhang
- BigBear (Tianjin) Medical Technology Co., Ltd, Tianjin, China
| | - Jun Lu
- BigBear (Tianjin) Medical Technology Co., Ltd, Tianjin, China
| | - Thomas Lukasiewicz
- Institute of Logic and Computation, Vienna University of Technology, Vienna, Austria
- Department of Computer Science, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
3
|
Sreng S, Ramesh P, Nam Phuong PD, Binte Abdul Gani NF, Chua J, Nongpiur ME, Aung T, Husain R, Schmetterer L, Wong D. Wide-field OCT volumetric segmentation using semi-supervised CNN and transformer integration. Sci Rep 2025; 15:6676. [PMID: 39994298 PMCID: PMC11850926 DOI: 10.1038/s41598-025-89476-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 02/05/2025] [Indexed: 02/26/2025] Open
Abstract
Wide-field optical coherence tomography (OCT) imaging can enable monitoring of peripheral changes in the retina, beyond the conventional fields of view used in current clinical OCT imaging systems. However, wide-field scans can present significant challenges for retinal layer segmentation. Deep Convolutional Neural Networks (CNNs) have shown strong performance in medical imaging segmentation but typically require large-scale, high-quality, pixel-level annotated datasets to be effectively developed. To address this challenge, we propose an advanced semi-supervised learning framework that combines the detailed capabilities of convolutional networks with the broader perspective of transformers. This method efficiently leverages labelled and unlabelled data to reduce dependence on extensive, manually annotated datasets. We evaluated the model performance on a dataset of 74 volumetric OCT scans, each performed using a prototype swept-source OCT system following a wide-field scan protocol with a 15 × 9 mm field of view, comprising 11,750 labelled and 29,016 unlabelled images. Wide-field retinal layer segmentation using the semi-supervised approach show significant improvements (P-value < 0.001) of up to 11% against a UNet baseline model. Comparisons with a clinical spectral-domain-OCT system revealed significant correlations of up to 0.91 (P-value < 0.001) in retinal layer thickness measurements. These findings highlight the effectiveness of semi-supervised learning with cross-teaching between CNNs and transformers for automated OCT layer segmentation.
Collapse
Affiliation(s)
- Syna Sreng
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore City, Singapore
- SERI-NTU Advanced Ocular Engineering (STANCE) Program, Singapore City, Singapore
| | - Padmini Ramesh
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore City, Singapore
- SERI-NTU Advanced Ocular Engineering (STANCE) Program, Singapore City, Singapore
| | - Pham Duc Nam Phuong
- School of Chemistry, Chemical Engineering and Biotechnology, Nanyang Technological University, Singapore City, Singapore
| | | | - Jacqueline Chua
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore City, Singapore
- SERI-NTU Advanced Ocular Engineering (STANCE) Program, Singapore City, Singapore
- Ophthalmology and Visual Sciences Academic Clinical Program (Eye ACP), Duke-NUS Medical School, Singapore City, Singapore
| | | | - Tin Aung
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore City, Singapore
| | - Rahat Husain
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore City, Singapore
| | - Leopold Schmetterer
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore City, Singapore
- SERI-NTU Advanced Ocular Engineering (STANCE) Program, Singapore City, Singapore
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore City, Singapore
- Ophthalmology and Visual Sciences Academic Clinical Program (Eye ACP), Duke-NUS Medical School, Singapore City, Singapore
- School of Chemistry, Chemical Engineering and Biotechnology, Nanyang Technological University, Singapore City, Singapore
- Centre for Medical Physics and Biomedical Engineering, Nanyang Technological University (NTU), Singapore City, Singapore
- Department of Clinical Pharmacology, Medical University of Vienna, Vienna, Austria
- Institute of Molecular and Clinical Ophthalmology, Basel, Switzerland
- Fondation Ophtalmologique Adolphe De Rothschild, Paris, France
| | - Damon Wong
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore City, Singapore.
- SERI-NTU Advanced Ocular Engineering (STANCE) Program, Singapore City, Singapore.
- Ophthalmology and Visual Sciences Academic Clinical Program (Eye ACP), Duke-NUS Medical School, Singapore City, Singapore.
- Institute of Molecular and Clinical Ophthalmology, Basel, Switzerland.
| |
Collapse
|
4
|
Cheng H, Zhang Y, Xu H, Li D, Zhong Z, Zhao Y, Yan Z. MSGU-Net: a lightweight multi-scale ghost U-Net for image segmentation. Front Neurorobot 2025; 18:1480055. [PMID: 39834695 PMCID: PMC11743674 DOI: 10.3389/fnbot.2024.1480055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Accepted: 12/16/2024] [Indexed: 01/22/2025] Open
Abstract
U-Net and its variants have been widely used in the field of image segmentation. In this paper, a lightweight multi-scale Ghost U-Net (MSGU-Net) network architecture is proposed. This can efficiently and quickly process image segmentation tasks while generating high-quality object masks for each object. The pyramid structure (SPP-Inception) module and ghost module are seamlessly integrated in a lightweight manner. Equipped with an efficient local attention (ELA) mechanism and an attention gate mechanism, they are designed to accurately identify the region of interest (ROI). The SPP-Inception module and ghost module work in tandem to effectively merge multi-scale information derived from low-level features, high-level features, and decoder masks at each stage. Comparative experiments were conducted between the proposed MSGU-Net and state-of-the-art networks on the ISIC2017 and ISIC2018 datasets. In short, compared to the baseline U-Net, our model achieves superior segmentation performance while reducing parameter and computation costs by 96.08 and 92.59%, respectively. Moreover, MSGU-Net can serve as a lightweight deep neural network suitable for deployment across a range of intelligent devices and mobile platforms, offering considerable potential for widespread adoption.
Collapse
Affiliation(s)
- Hua Cheng
- Chengdu Civil Aviation Information Technology Co., Ltd, Chengdu, China
| | - Yang Zhang
- Chengdu Civil Aviation Information Technology Co., Ltd, Chengdu, China
| | - Huangxin Xu
- The College of Artificial Intelligence, Shenyang Aerospace University, Shenyang, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Dingliang Li
- Chengdu Civil Aviation Information Technology Co., Ltd, Chengdu, China
| | - Zejian Zhong
- Chengdu Civil Aviation Information Technology Co., Ltd, Chengdu, China
| | - Yinchuan Zhao
- Chengdu Civil Aviation Information Technology Co., Ltd, Chengdu, China
| | - Zhuo Yan
- The College of Artificial Intelligence, Shenyang Aerospace University, Shenyang, China
| |
Collapse
|
5
|
Wu C, Chen Q, Wang H, Guan Y, Mian Z, Huang C, Ruan C, Song Q, Jiang H, Pan J, Li X. A review of deep learning approaches for multimodal image segmentation of liver cancer. J Appl Clin Med Phys 2024; 25:e14540. [PMID: 39374312 PMCID: PMC11633801 DOI: 10.1002/acm2.14540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 05/30/2024] [Accepted: 08/13/2024] [Indexed: 10/09/2024] Open
Abstract
This review examines the recent developments in deep learning (DL) techniques applied to multimodal fusion image segmentation for liver cancer. Hepatocellular carcinoma is a highly dangerous malignant tumor that requires accurate image segmentation for effective treatment and disease monitoring. Multimodal image fusion has the potential to offer more comprehensive information and more precise segmentation, and DL techniques have achieved remarkable progress in this domain. This paper starts with an introduction to liver cancer, then explains the preprocessing and fusion methods for multimodal images, then explores the application of DL methods in this area. Various DL architectures such as convolutional neural networks (CNN) and U-Net are discussed and their benefits in multimodal image fusion segmentation. Furthermore, various evaluation metrics and datasets currently used to measure the performance of segmentation models are reviewed. While reviewing the progress, the challenges of current research, such as data imbalance, model generalization, and model interpretability, are emphasized and future research directions are suggested. The application of DL in multimodal image segmentation for liver cancer is transforming the field of medical imaging and is expected to further enhance the accuracy and efficiency of clinical decision making. This review provides useful insights and guidance for medical practitioners.
Collapse
Affiliation(s)
- Chaopeng Wu
- Department of Radiation OncologyRenmin HospitalWuhan UniversityWuhanHubeiChina
| | - Qiyao Chen
- Department of Radiation OncologyRenmin HospitalWuhan UniversityWuhanHubeiChina
| | - Haoyu Wang
- Department of Radiation OncologyRenmin HospitalWuhan UniversityWuhanHubeiChina
| | - Yu Guan
- Department of Radiation OncologyRenmin HospitalWuhan UniversityWuhanHubeiChina
| | - Zhangyang Mian
- Department of Radiation OncologyRenmin HospitalWuhan UniversityWuhanHubeiChina
| | - Cong Huang
- Department of Radiation OncologyRenmin HospitalWuhan UniversityWuhanHubeiChina
| | - Changli Ruan
- Department of Radiation OncologyRenmin HospitalWuhan UniversityWuhanHubeiChina
| | - Qibin Song
- Department of Radiation OncologyRenmin HospitalWuhan UniversityWuhanHubeiChina
| | - Hao Jiang
- School of Electronic InformationWuhan UniversityWuhanHubeiChina
| | - Jinghui Pan
- Department of Radiation OncologyRenmin HospitalWuhan UniversityWuhanHubeiChina
- School of Electronic InformationWuhan UniversityWuhanHubeiChina
| | - Xiangpan Li
- Department of Radiation OncologyRenmin HospitalWuhan UniversityWuhanHubeiChina
| |
Collapse
|
6
|
Zhao K, Wu X, Xiao Y, Jiang S, Yu P, Wang Y, Wang Q. PlanText: Gradually Masked Guidance to Align Image Phenotypes with Trait Descriptions for Plant Disease Texts. PLANT PHENOMICS (WASHINGTON, D.C.) 2024; 6:0272. [PMID: 39600967 PMCID: PMC11589250 DOI: 10.34133/plantphenomics.0272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 10/09/2024] [Accepted: 10/25/2024] [Indexed: 11/29/2024]
Abstract
Plant diseases are a critical driver of the global food crisis. The integration of advanced artificial intelligence technologies can substantially enhance plant disease diagnostics. However, current methods for early and complex detection remain challenging. Employing multimodal technologies, akin to medical artificial intelligence diagnostics that combine diverse data types, may offer a more effective solution. Presently, the reliance on single-modal data predominates in plant disease research, which limits the scope for early and detailed diagnosis. Consequently, developing text modality generation techniques is essential for overcoming the limitations in plant disease recognition. To this end, we propose a method for aligning plant phenotypes with trait descriptions, which diagnoses text by progressively masking disease images. First, for training and validation, we annotate 5,728 disease phenotype images with expert diagnostic text and provide annotated text and trait labels for 210,000 disease images. Then, we propose a PhenoTrait text description model, which consists of global and heterogeneous feature encoders as well as switching-attention decoders, for accurate context-aware output. Next, to generate a more phenotypically appropriate description, we adopt 3 stages of embedding image features into semantic structures, which generate characterizations that preserve trait features. Finally, our experimental results show that our model outperforms several frontier models in multiple trait descriptions, including the larger models GPT-4 and GPT-4o. Our code and dataset are available at https://plantext.samlab.cn/.
Collapse
Affiliation(s)
- Kejun Zhao
- State Key Laboratory of Public Big Data, School of Computer Science and Technology,
Guizhou University, Guiyang 550025, China
| | - Xingcai Wu
- State Key Laboratory of Public Big Data, School of Computer Science and Technology,
Guizhou University, Guiyang 550025, China
| | - Yuanyuan Xiao
- State Key Laboratory of Public Big Data, School of Computer Science and Technology,
Guizhou University, Guiyang 550025, China
| | - Sijun Jiang
- State Key Laboratory of Public Big Data, School of Computer Science and Technology,
Guizhou University, Guiyang 550025, China
| | - Peijia Yu
- State Key Laboratory of Public Big Data, School of Computer Science and Technology,
Guizhou University, Guiyang 550025, China
| | - Yazhou Wang
- School of Information,
Guizhou University of Finance and Economics, Guiyang 550025, China
| | - Qi Wang
- State Key Laboratory of Public Big Data, School of Computer Science and Technology,
Guizhou University, Guiyang 550025, China
| |
Collapse
|
7
|
Li Z, Li H, Ralescu AL, Dillman JR, Altaye M, Cecil KM, Parikh NA, He L. Joint self-supervised and supervised contrastive learning for multimodal MRI data: Towards predicting abnormal neurodevelopment. Artif Intell Med 2024; 157:102993. [PMID: 39369634 PMCID: PMC11560553 DOI: 10.1016/j.artmed.2024.102993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 08/04/2024] [Accepted: 09/26/2024] [Indexed: 10/08/2024]
Abstract
The integration of different imaging modalities, such as structural, diffusion tensor, and functional magnetic resonance imaging, with deep learning models has yielded promising outcomes in discerning phenotypic characteristics and enhancing disease diagnosis. The development of such a technique hinges on the efficient fusion of heterogeneous multimodal features, which initially reside within distinct representation spaces. Naively fusing the multimodal features does not adequately capture the complementary information and could even produce redundancy. In this work, we present a novel joint self-supervised and supervised contrastive learning method to learn the robust latent feature representation from multimodal MRI data, allowing the projection of heterogeneous features into a shared common space, and thereby amalgamating both complementary and analogous information across various modalities and among similar subjects. We performed a comparative analysis between our proposed method and alternative deep multimodal learning approaches. Through extensive experiments on two independent datasets, the results demonstrated that our method is significantly superior to several other deep multimodal learning methods in predicting abnormal neurodevelopment. Our method has the capability to facilitate computer-aided diagnosis within clinical practice, harnessing the power of multimodal data. The source code of the proposed model is publicly accessible on GitHub: https://github.com/leonzyzy/Contrastive-Network.
Collapse
Affiliation(s)
- Zhiyuan Li
- Imaging Research Center, Department of Radiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Computer Science, University of Cincinnati, Cincinnati, OH, USA
| | - Hailong Li
- Imaging Research Center, Department of Radiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Artificial Intelligence Imaging Research Center, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Neurodevelopmental Disorders Prevention Center, Perinatal Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Radiology, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Anca L Ralescu
- Department of Computer Science, University of Cincinnati, Cincinnati, OH, USA
| | - Jonathan R Dillman
- Imaging Research Center, Department of Radiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Artificial Intelligence Imaging Research Center, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Radiology, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Mekibib Altaye
- Biostatistics and Epidemiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Kim M Cecil
- Imaging Research Center, Department of Radiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Radiology, University of Cincinnati College of Medicine, Cincinnati, OH, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Nehal A Parikh
- Neurodevelopmental Disorders Prevention Center, Perinatal Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Lili He
- Imaging Research Center, Department of Radiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Artificial Intelligence Imaging Research Center, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Neurodevelopmental Disorders Prevention Center, Perinatal Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Radiology, University of Cincinnati College of Medicine, Cincinnati, OH, USA.
| |
Collapse
|
8
|
Li W, Bian R, Zhao W, Xu W, Yang H. Diversity matters: Cross-head mutual mean-teaching for semi-supervised medical image segmentation. Med Image Anal 2024; 97:103302. [PMID: 39154618 DOI: 10.1016/j.media.2024.103302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 08/08/2024] [Accepted: 08/09/2024] [Indexed: 08/20/2024]
Abstract
Semi-supervised medical image segmentation (SSMIS) has witnessed substantial advancements by leveraging limited labeled data and abundant unlabeled data. Nevertheless, existing state-of-the-art (SOTA) methods encounter challenges in accurately predicting labels for the unlabeled data, giving rise to disruptive noise during training and susceptibility to erroneous information overfitting. Moreover, applying perturbations to inaccurate predictions further impedes consistent learning. To address these concerns, we propose a novel cross-head mutual mean-teaching network (CMMT-Net) incorporated weak-strong data augmentations, thereby benefiting both co-training and consistency learning. More concretely, our CMMT-Net extends the cross-head co-training paradigm by introducing two auxiliary mean teacher models, which yield more accurate predictions and provide supplementary supervision. The predictions derived from weakly augmented samples generated by one mean teacher are leveraged to guide the training of another student with strongly augmented samples. Furthermore, two distinct yet synergistic data perturbations at the pixel and region levels are introduced. We propose mutual virtual adversarial training (MVAT) to smooth the decision boundary and enhance feature representations, and a cross-set CutMix strategy to generate more diverse training samples for capturing inherent structural data information. Notably, CMMT-Net simultaneously implements data, feature, and network perturbations, amplifying model diversity and generalization performance. Experimental results on three publicly available datasets indicate that our approach yields remarkable improvements over previous SOTA methods across various semi-supervised scenarios. The code is available at https://github.com/Leesoon1984/CMMT-Net.
Collapse
Affiliation(s)
- Wei Li
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China
| | - Ruifeng Bian
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China
| | - Wenyi Zhao
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China
| | - Weijin Xu
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China
| | - Huihua Yang
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China.
| |
Collapse
|
9
|
You C, Dai W, Liu F, Min Y, Dvornek NC, Li X, Clifton DA, Staib L, Duncan JS. Mine Your Own Anatomy: Revisiting Medical Image Segmentation With Extremely Limited Labels. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; PP:11136-11151. [PMID: 39269798 PMCID: PMC11903367 DOI: 10.1109/tpami.2024.3461321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/15/2024]
Abstract
Recent studies on contrastive learning have achieved remarkable performance solely by leveraging few labels in the context of medical image segmentation. Existing methods mainly focus on instance discrimination and invariant mapping (i.e., pulling positive samples closer and negative samples apart in the feature space). However, they face three common pitfalls: (1) tailness: medical image data usually follows an implicit long-tail class distribution. Blindly leveraging all pixels in training hence can lead to the data imbalance issues, and cause deteriorated performance; (2) consistency: it remains unclear whether a segmentation model has learned meaningful and yet consistent anatomical features due to the intra-class variations between different anatomical features; and (3) diversity: the intra-slice correlations within the entire dataset have received significantly less attention. This motivates us to seek a principled approach for strategically making use of the dataset itself to discover similar yet distinct samples from different anatomical views. In this paper, we introduce a novel semi-supervised 2D medical image segmentation framework termed Mine yOur owNAnatomy (MONA), and make three contributions. First, prior work argues that every pixel equally matters to the model training; we observe empirically that this alone is unlikely to define meaningful anatomical features, mainly due to lacking the supervision signal. We show two simple solutions towards learning invariances-through the use of stronger data augmentations and nearest neighbors. Second, we construct a set of objectives that encourage the model to be capable of decomposing medical images into a collection of anatomical features in an unsupervised manner. Lastly, we both empirically and theoretically, demonstrate the efficacy of our MONA on three benchmark datasets with different labeled settings, achieving new state-of-the-art under different labeled semi-supervised settings. MONA makes minimal assumptions on domain expertise, and hence constitutes a practical and versatile solution in medical image analysis. We provide the PyTorch-like pseudo-code in supplementary.
Collapse
|
10
|
Chai L, Xue S, Tang D, Liu J, Sun N, Liu X. TLF: Triple learning framework for intracranial aneurysms segmentation from unreliable labeled CTA scans. Comput Med Imaging Graph 2024; 116:102421. [PMID: 39084165 DOI: 10.1016/j.compmedimag.2024.102421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 07/21/2024] [Accepted: 07/21/2024] [Indexed: 08/02/2024]
Abstract
Intracranial aneurysm (IA) is a prevalent disease that poses a significant threat to human health. The use of computed tomography angiography (CTA) as a diagnostic tool for IAs remains time-consuming and challenging. Deep neural networks (DNNs) have made significant advancements in the field of medical image segmentation. Nevertheless, training large-scale DNNs demands substantial quantities of high-quality labeled data, making the annotation of numerous brain CTA scans a challenging endeavor. To address these challenges and effectively develop a robust IAs segmentation model from a large amount of unlabeled training data, we propose a triple learning framework (TLF). The framework primarily consists of three learning paradigms: pseudo-supervised learning, contrastive learning, and confident learning. This paper introduces an enhanced mean teacher model and voxel-selective strategy to conduct pseudo-supervised learning on unreliable labeled training data. Concurrently, we construct the positive and negative training pairs within the high-level semantic feature space to improve the overall learning efficiency of the TLF through contrastive learning. In addition, a multi-scale confident learning is proposed to correct unreliable labels, which enables the acquisition of broader local structural information instead of relying on individual voxels. To evaluate the effectiveness of our method, we conducted extensive experiments on a self-built database of hundreds of cases of brain CTA scans with IAs. Experimental results demonstrate that our method can effectively learn a robust CTA scan-based IAs segmentation model using unreliable labeled data, outperforming state-of-the-art methods in terms of segmentation accuracy. Codes are released at https://github.com/XueShuangqian/TLF.
Collapse
Affiliation(s)
- Lei Chai
- Engineering Research Center of Wideband Wireless Communication Technology, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Shuangqian Xue
- Engineering Research Center of Wideband Wireless Communication Technology, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Daodao Tang
- Engineering Research Center of Wideband Wireless Communication Technology, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Jixin Liu
- Engineering Research Center of Wideband Wireless Communication Technology, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Ning Sun
- Engineering Research Center of Wideband Wireless Communication Technology, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, China.
| | - Xiujuan Liu
- Department of Radiology, Zhuhai People's Hospital(Zhuhai Clinical Medical College of Jinan University), Zhuhai 519000, China
| |
Collapse
|
11
|
Guo W, Jin S, Li Y, Jiang Y. The dynamic-static dual-branch deep neural network for urban speeding hotspot identification using street view image data. ACCIDENT; ANALYSIS AND PREVENTION 2024; 203:107636. [PMID: 38776837 DOI: 10.1016/j.aap.2024.107636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 04/24/2024] [Accepted: 05/10/2024] [Indexed: 05/25/2024]
Abstract
The visual information regarding the road environment can influence drivers' perception and judgment, often resulting in frequent speeding incidents. Identifying speeding hotspots in cities can prevent potential speeding incidents, thereby improving traffic safety levels. We propose the Dual-Branch Contextual Dynamic-Static Feature Fusion Network based on static panoramic images and dynamically changing sequence data, aiming to capture global features in the macro scene of the area and dynamically changing information in the micro view for a more accurate urban speeding hotspot area identification. For the static branch, we propose the Multi-scale Contextual Feature Aggregation Network for learning global spatial contextual association information. In the dynamic branch, we construct the Multi-view Dynamic Feature Fusion Network to capture the dynamically changing features of a scene from a continuous sequence of street view images. Additionally, we designed the Dynamic-Static Feature Correlation Fusion Structure to correlate and fuse dynamic and static features. The experimental results show that the model has good performance, and the overall recognition accuracy reaches 99.4%. The ablation experiments show that the recognition effect after the fusion of dynamic and static features is better than that of static and dynamic branches. The proposed model also shows better performance than other deep learning models. In addition, we combine image processing methods and different Class Activation Mapping (CAM) methods to extract speeding frequency visual features from the model perception results. The results show that more accurate speeding frequency features can be obtained by using LayerCAM and GradCAM-Plus for static global scenes and dynamic local sequences, respectively. In the static global scene, the speeding frequency features are mainly concentrated on the buildings and green layout on both sides of the road, while in the dynamic scene, the speeding frequency features shift with the scene changes and are mainly concentrated on the dynamically changing transition areas of greenery, roads, and surrounding buildings. The code and model used for identifying hotspots of urban traffic accidents in this study are available for access: https://github.com/gwt-ZJU/DCDSFF-Net.
Collapse
Affiliation(s)
- Wentong Guo
- Polytechnic Institute & Institute of Intelligent Transportation Systems, Zhejiang University, Hangzhou 310058, China; Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China
| | - Sheng Jin
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China; Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China; Zhongyuan Institute, Zhejiang University, Zhengzhou 450000, China.
| | - Yiding Li
- Henan Institute of Advanced Technology, Zhengzhou University, Zhengzhou 450003, China
| | - Yang Jiang
- Polytechnic Institute & Institute of Intelligent Transportation Systems, Zhejiang University, Hangzhou 310058, China; Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China
| |
Collapse
|
12
|
Zeng B, Chen L, Zheng Y, Chen X. Adaptive Multi-Dimensional Weighted Network With Category-Aware Contrastive Learning for Fine-Grained Hand Bone Segmentation. IEEE J Biomed Health Inform 2024; 28:3985-3996. [PMID: 38640043 DOI: 10.1109/jbhi.2024.3391387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/21/2024]
Abstract
Accurately delineating and categorizing individual hand bones in 3D ultrasound (US) is a promising technology for precise digital diagnostic analysis. However, this is a challenging task due to the inherent imaging limitations of the US and the insignificant feature differences among numerous bones. In this study, we have proposed a novel deep learning-based solution for pediatric hand bone segmentation in the US. Our method is unique in that it allows for effective detailed feature mining through an adaptive multi-dimensional weighting attention mechanism. It innovatively implements a category-aware contrastive learning method to highlight inter-class semantic feature differences, thereby enhancing the category discrimination performance of the model. Extensive experiments on the challenging pediatric clinical hand 3D US datasets show the outstanding performance of the proposed method in segmenting thirty-eight bone structures, with the average Dice coefficient of 90.0%. The results outperform other state-of-the-art methods, demonstrating its effectiveness in fine-grained hand bone segmentation. Our method will be globally released as a plugin in the 3D Slicer, providing an innovative and reliable tool for relevant clinical applications.
Collapse
|
13
|
Sun H, Wei J, Yuan W, Li R. Semi-supervised multi-modal medical image segmentation with unified translation. Comput Biol Med 2024; 176:108570. [PMID: 38749326 DOI: 10.1016/j.compbiomed.2024.108570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 03/08/2024] [Accepted: 05/05/2024] [Indexed: 05/31/2024]
Abstract
The two major challenges to deep-learning-based medical image segmentation are multi-modality and a lack of expert annotations. Existing semi-supervised segmentation models can mitigate the problem of insufficient annotations by utilizing a small amount of labeled data. However, most of these models are limited to single-modal data and cannot exploit the complementary information from multi-modal medical images. A few semi-supervised multi-modal models have been proposed recently, but they have rigid structures and require additional training steps for each modality. In this work, we propose a novel flexible method, semi-supervised multi-modal medical image segmentation with unified translation (SMSUT), and a unique semi-supervised procedure that can leverage multi-modal information to improve the semi-supervised segmentation performance. Our architecture capitalizes on unified translation to extract complementary information from multi-modal data which compels the network to focus on the disparities and salient features among each modality. Furthermore, we impose constraints on the model at both pixel and feature levels, to cope with the lack of annotation information and the diverse representations within semi-supervised multi-modal data. We introduce a novel training procedure tailored for semi-supervised multi-modal medical image analysis, by integrating the concept of conditional translation. Our method has a remarkable ability for seamless adaptation to varying numbers of distinct modalities in the training data. Experiments show that our model exceeds the semi-supervised segmentation counterparts in the public datasets which proves our network's high-performance capabilities and the transferability of our proposed method. The code of our method will be openly available at https://github.com/Sue1347/SMSUT-MedicalImgSegmentation.
Collapse
Affiliation(s)
- Huajun Sun
- South China University of Technology, Guangzhou, 510006, China.
| | - Jia Wei
- South China University of Technology, Guangzhou, 510006, China.
| | - Wenguang Yuan
- Huawei Cloud BU EI Innovation Laboratory, Dongguan, 523000, China.
| | - Rui Li
- Rochester Institute of Technology, Rochester, NY 14623, USA.
| |
Collapse
|
14
|
Long J, Ren Y, Yang C, Ren P, Zeng Z. MDT: semi-supervised medical image segmentation with mixup-decoupling training. Phys Med Biol 2024; 69:065012. [PMID: 38324897 DOI: 10.1088/1361-6560/ad2715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 02/07/2024] [Indexed: 02/09/2024]
Abstract
Objective. In the field of medicine, semi-supervised segmentation algorithms hold crucial research significance while also facing substantial challenges, primarily due to the extreme scarcity of expert-level annotated medical image data. However, many existing semi-supervised methods still process labeled and unlabeled data in inconsistent ways, which can lead to knowledge learned from labeled data being discarded to some extent. This not only lacks a variety of perturbations to explore potential robust information in unlabeled data but also ignores the confirmation bias and class imbalance issues in pseudo-labeling methods.Approach. To solve these problems, this paper proposes a semi-supervised medical image segmentation method 'mixup-decoupling training (MDT)' that combines the idea of consistency and pseudo-labeling. Firstly, MDT introduces a new perturbation strategy 'mixup-decoupling' to fully regularize training data. It not only mixes labeled and unlabeled data at the data level but also performs decoupling operations between the output predictions of mixed target data and labeled data at the feature level to obtain strong version predictions of unlabeled data. Then it establishes a dual learning paradigm based on consistency and pseudo-labeling. Secondly, MDT employs a novel categorical entropy filtering approach to pick high-confidence pseudo-labels for unlabeled data, facilitating more refined supervision.Main results. This paper compares MDT with other advanced semi-supervised methods on 2D and 3D datasets separately. A large number of experimental results show that MDT achieves competitive segmentation performance and outperforms other state-of-the-art semi-supervised segmentation methods.Significance. This paper proposes a semi-supervised medical image segmentation method MDT, which greatly reduces the demand for manually labeled data and eases the difficulty of data annotation to a great extent. In addition, MDT not only outperforms many advanced semi-supervised image segmentation methods in quantitative and qualitative experimental results, but also provides a new and developable idea for semi-supervised learning and computer-aided diagnosis technology research.
Collapse
Affiliation(s)
- Jianwu Long
- College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, People's Republic of China
| | - Yan Ren
- College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, People's Republic of China
| | - Chengxin Yang
- College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, People's Republic of China
| | - Pengcheng Ren
- College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, People's Republic of China
| | - Ziqin Zeng
- College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, People's Republic of China
| |
Collapse
|
15
|
Xu Z, Wang S, Xu G, Liu Y, Yu M, Zhang H, Lukasiewicz T, Gu J. Automatic data augmentation for medical image segmentation using Adaptive Sequence-length based Deep Reinforcement Learning. Comput Biol Med 2024; 169:107877. [PMID: 38157774 DOI: 10.1016/j.compbiomed.2023.107877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 12/03/2023] [Accepted: 12/18/2023] [Indexed: 01/03/2024]
Abstract
Although existing deep reinforcement learning-based approaches have achieved some success in image augmentation tasks, their effectiveness and adequacy for data augmentation in intelligent medical image analysis are still unsatisfactory. Therefore, we propose a novel Adaptive Sequence-length based Deep Reinforcement Learning (ASDRL) model for Automatic Data Augmentation (AutoAug) in intelligent medical image analysis. The improvements of ASDRL-AutoAug are two-fold: (i) To remedy the problem of some augmented images being invalid, we construct a more accurate reward function based on different variations of the augmentation trajectories. This reward function assesses the validity of each augmentation transformation more accurately by introducing different information about the validity of the augmented images. (ii) Then, to alleviate the problem of insufficient augmentation, we further propose a more intelligent automatic stopping mechanism (ASM). ASM feeds a stop signal to the agent automatically by judging the adequacy of image augmentation. This ensures that each transformation before stopping the augmentation can smoothly improve the model performance. Extensive experimental results on three medical image segmentation datasets show that (i) ASDRL-AutoAug greatly outperforms the state-of-the-art data augmentation methods in medical image segmentation tasks, (ii) the proposed improvements are both effective and essential for ASDRL-AutoAug to achieve superior performance, and the new reward evaluates the transformations more accurately than existing reward functions, and (iii) we also demonstrate that ASDRL-AutoAug is adaptive for different images in terms of sequence length, as well as generalizable across different segmentation models.
Collapse
Affiliation(s)
- Zhenghua Xu
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China
| | - Shengxin Wang
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China
| | - Gang Xu
- School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Yunxin Liu
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China
| | - Miao Yu
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China.
| | - Hongwei Zhang
- School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China
| | - Thomas Lukasiewicz
- Institute of Logic and Computation, Vienna University of Technology, Vienna, Austria; Department of Computer Science, University of Oxford, Oxford, United Kingdom
| | - Junhua Gu
- School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| |
Collapse
|
16
|
Zhang J, Zhang S, Shen X, Lukasiewicz T, Xu Z. Multi-ConDoS: Multimodal Contrastive Domain Sharing Generative Adversarial Networks for Self-Supervised Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:76-95. [PMID: 37379176 DOI: 10.1109/tmi.2023.3290356] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]
Abstract
Existing self-supervised medical image segmentation usually encounters the domain shift problem (i.e., the input distribution of pre-training is different from that of fine-tuning) and/or the multimodality problem (i.e., it is based on single-modal data only and cannot utilize the fruitful multimodal information of medical images). To solve these problems, in this work, we propose multimodal contrastive domain sharing (Multi-ConDoS) generative adversarial networks to achieve effective multimodal contrastive self-supervised medical image segmentation. Compared to the existing self-supervised approaches, Multi-ConDoS has the following three advantages: (i) it utilizes multimodal medical images to learn more comprehensive object features via multimodal contrastive learning; (ii) domain translation is achieved by integrating the cyclic learning strategy of CycleGAN and the cross-domain translation loss of Pix2Pix; (iii) novel domain sharing layers are introduced to learn not only domain-specific but also domain-sharing information from the multimodal medical images. Extensive experiments on two publicly multimodal medical image segmentation datasets show that, with only 5% (resp., 10%) of labeled data, Multi-ConDoS not only greatly outperforms the state-of-the-art self-supervised and semi-supervised medical image segmentation baselines with the same ratio of labeled data, but also achieves similar (sometimes even better) performances as fully supervised segmentation methods with 50% (resp., 100%) of labeled data, which thus proves that our work can achieve superior segmentation performances with very low labeling workload. Furthermore, ablation studies prove that the above three improvements are all effective and essential for Multi-ConDoS to achieve this very superior performance.
Collapse
|
17
|
Xu Z, Tang J, Qi C, Yao D, Liu C, Zhan Y, Lukasiewicz T. Cross-domain attention-guided generative data augmentation for medical image analysis with limited data. Comput Biol Med 2024; 168:107744. [PMID: 38006826 DOI: 10.1016/j.compbiomed.2023.107744] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 11/12/2023] [Accepted: 11/20/2023] [Indexed: 11/27/2023]
Abstract
Data augmentation is widely applied to medical image analysis tasks in limited datasets with imbalanced classes and insufficient annotations. However, traditional augmentation techniques cannot supply extra information, making the performance of diagnosis unsatisfactory. GAN-based generative methods have thus been proposed to obtain additional useful information to realize more effective data augmentation; but existing generative data augmentation techniques mainly encounter two problems: (i) Current generative data augmentation lacks of the capability in using cross-domain differential information to extend limited datasets. (ii) The existing generative methods cannot provide effective supervised information in medical image segmentation tasks. To solve these problems, we propose an attention-guided cross-domain tumor image generation model (CDA-GAN) with an information enhancement strategy. The CDA-GAN can generate diverse samples to expand the scale of datasets, improving the performance of medical image diagnosis and treatment tasks. In particular, we incorporate channel attention into a CycleGAN-based cross-domain generation network that captures inter-domain information and generates positive or negative samples of brain tumors. In addition, we propose a semi-supervised spatial attention strategy to guide spatial information of features at the pixel level in tumor generation. Furthermore, we add spectral normalization to prevent the discriminator from mode collapse and stabilize the training procedure. Finally, to resolve an inapplicability problem in the segmentation task, we further propose an application strategy of using this data augmentation model to achieve more accurate medical image segmentation with limited data. Experimental studies on two public brain tumor datasets (BraTS and TCIA) show that the proposed CDA-GAN model greatly outperforms the state-of-the-art generative data augmentation in both practical medical image classification tasks and segmentation tasks; e.g. CDA-GAN is 0.50%, 1.72%, 2.05%, and 0.21% better than the best SOTA baseline in terms of ACC, AUC, Recall, and F1, respectively, in the classification task of BraTS, while its improvements w.r.t. the best SOTA baseline in terms of Dice, Sens, HD95, and mIOU, in the segmentation task of TCIA are 2.50%, 0.90%, 14.96%, and 4.18%, respectively.
Collapse
Affiliation(s)
- Zhenghua Xu
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China
| | - Jiaqi Tang
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China
| | - Chang Qi
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China; Institute of Logic and Computation, Vienna University of Technology, Vienna, Austria.
| | - Dan Yao
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China
| | - Caihua Liu
- College of Computer Science and Technology, Civil Aviation University of China, Tianjin, China
| | - Yuefu Zhan
- Department of Radiology, Hainan Women and Children's Medical Center, Haikou, China
| | - Thomas Lukasiewicz
- Institute of Logic and Computation, Vienna University of Technology, Vienna, Austria; Department of Computer Science, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
18
|
Deng Z, Huang G, Yuan X, Zhong G, Lin T, Pun CM, Huang Z, Liang Z. QMLS: quaternion mutual learning strategy for multi-modal brain tumor segmentation. Phys Med Biol 2023; 69:015014. [PMID: 38061066 DOI: 10.1088/1361-6560/ad135e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 12/07/2023] [Indexed: 12/27/2023]
Abstract
Objective.Due to non-invasive imaging and the multimodality of magnetic resonance imaging (MRI) images, MRI-based multi-modal brain tumor segmentation (MBTS) studies have attracted more and more attention in recent years. With the great success of convolutional neural networks in various computer vision tasks, lots of MBTS models have been proposed to address the technical challenges of MBTS. However, the problem of limited data collection usually exists in MBTS tasks, making existing studies typically have difficulty in fully exploring the multi-modal MRI images to mine complementary information among different modalities.Approach.We propose a novel quaternion mutual learning strategy (QMLS), which consists of a voxel-wise lesion knowledge mutual learning mechanism (VLKML mechanism) and a quaternion multi-modal feature learning module (QMFL module). Specifically, the VLKML mechanism allows the networks to converge to a robust minimum so that aggressive data augmentation techniques can be applied to expand the limited data fully. In particular, the quaternion-valued QMFL module treats different modalities as components of quaternions to sufficiently learn complementary information among different modalities on the hypercomplex domain while significantly reducing the number of parameters by about 75%.Main results.Extensive experiments on the dataset BraTS 2020 and BraTS 2019 indicate that QMLS achieves superior results to current popular methods with less computational cost.Significance.We propose a novel algorithm for brain tumor segmentation task that achieves better performance with fewer parameters, which helps the clinical application of automatic brain tumor segmentation.
Collapse
Affiliation(s)
- Zhengnan Deng
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006, People's Republic of China
| | - Guoheng Huang
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006, People's Republic of China
| | - Xiaochen Yuan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, People's Republic of China
| | - Guo Zhong
- School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, 510006, People's Republic of China
| | - Tongxu Lin
- School of Automation, Guangdong University of Technology, Guangzhou, 510006, People's Republic of China
| | - Chi-Man Pun
- Department of Computer and Information Science, University of Macau, Macao, People's Republic of China
| | - Zhixin Huang
- Department of Neurology, Guangdong Second Provincial General Hospital, Guangzhou, 510317, People's Republic of China
| | - Zhixin Liang
- Department of Nuclear Medicine, Jinshazhou Hospital, Guangzhou University of Chinese Medicine, Guangzhou, 510168, People's Republic of China
| |
Collapse
|
19
|
Wei Q, Tan N, Xiong S, Luo W, Xia H, Luo B. Deep Learning Methods in Medical Image-Based Hepatocellular Carcinoma Diagnosis: A Systematic Review and Meta-Analysis. Cancers (Basel) 2023; 15:5701. [PMID: 38067404 PMCID: PMC10705136 DOI: 10.3390/cancers15235701] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 11/25/2023] [Accepted: 11/29/2023] [Indexed: 06/24/2024] Open
Abstract
(1) Background: The aim of our research was to systematically review papers specifically focused on the hepatocellular carcinoma (HCC) diagnostic performance of DL methods based on medical images. (2) Materials: To identify related studies, a comprehensive search was conducted in prominent databases, including Embase, IEEE, PubMed, Web of Science, and the Cochrane Library. The search was limited to studies published before 3 July 2023. The inclusion criteria consisted of studies that either developed or utilized DL methods to diagnose HCC using medical images. To extract data, binary information on diagnostic accuracy was collected to determine the outcomes of interest, namely, the sensitivity, specificity, and area under the curve (AUC). (3) Results: Among the forty-eight initially identified eligible studies, thirty studies were included in the meta-analysis. The pooled sensitivity was 89% (95% CI: 87-91), the specificity was 90% (95% CI: 87-92), and the AUC was 0.95 (95% CI: 0.93-0.97). Analyses of subgroups based on medical image methods (contrast-enhanced and non-contrast-enhanced images), imaging modalities (ultrasound, magnetic resonance imaging, and computed tomography), and comparisons between DL methods and clinicians consistently showed the acceptable diagnostic performance of DL models. The publication bias and high heterogeneity observed between studies and subgroups can potentially result in an overestimation of the diagnostic accuracy of DL methods in medical imaging. (4) Conclusions: To improve future studies, it would be advantageous to establish more rigorous reporting standards that specifically address the challenges associated with DL research in this particular field.
Collapse
Affiliation(s)
- Qiuxia Wei
- Department of Ultrasound, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, 107 West Yanjiang Road, Guangzhou 510120, China; (Q.W.); (S.X.); (W.L.)
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, 107 West Yanjiang Road, Guangzhou 510120, China
| | - Nengren Tan
- School of Electronic and Information Engineering, Guangxi Normal University, 15 Qixing District, Guilin 541004, China;
| | - Shiyu Xiong
- Department of Ultrasound, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, 107 West Yanjiang Road, Guangzhou 510120, China; (Q.W.); (S.X.); (W.L.)
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, 107 West Yanjiang Road, Guangzhou 510120, China
| | - Wanrong Luo
- Department of Ultrasound, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, 107 West Yanjiang Road, Guangzhou 510120, China; (Q.W.); (S.X.); (W.L.)
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, 107 West Yanjiang Road, Guangzhou 510120, China
| | - Haiying Xia
- School of Electronic and Information Engineering, Guangxi Normal University, 15 Qixing District, Guilin 541004, China;
| | - Baoming Luo
- Department of Ultrasound, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, 107 West Yanjiang Road, Guangzhou 510120, China; (Q.W.); (S.X.); (W.L.)
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, 107 West Yanjiang Road, Guangzhou 510120, China
| |
Collapse
|
20
|
Li K, Zhang G, Li K, Li J, Wang J, Yang Y. Dual CNN cross-teaching semi-supervised segmentation network with multi-kernels and global contrastive loss in ACDC. Med Biol Eng Comput 2023; 61:3409-3417. [PMID: 37684494 DOI: 10.1007/s11517-023-02920-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 08/22/2023] [Indexed: 09/10/2023]
Abstract
The cross-teaching based on Convolutional Neural Network (CNN) and Transformer has been successful in semi-supervised learning; however, the information interaction between local and global relations ignores the semantic features of the medium scale, and at the same time, the information in the process of feature coding is not fully utilized. To solve these problems, we proposed a new semi-supervised segmentation network. Based on the principle of complementary modeling information of different kernel convolutions, we design a dual CNN cross-supervised network with different kernel sizes under cross-teaching. We introduce global feature contrastive learning and generate contrast samples with the help of dual CNN architecture to make efficient use of coding features. We conducted plenty of experiments on the Automated Cardiac Diagnosis Challenge (ACDC) dataset to evaluate our approach. Our method achieves an average Dice Similarity Coefficient (DSC) of 87.2% and Hausdorff distance ([Formula: see text]) of 6.1 mm on 10% labeled data, which is significantly improved compared with many current popular models. Supervised learning is performed on the labeled data, and dual CNN cross-teaching supervised learning is performed on the unlabeled data. All data would be mapped by the two CNNs to generate features, which are used for contrastive learning to optimize the parameters.
Collapse
Affiliation(s)
- Keming Li
- School of Information Science and Electric Engineering, Shandong Jiaotong University, Jinan, China
| | - Guangyuan Zhang
- School of Information Science and Electric Engineering, Shandong Jiaotong University, Jinan, China
| | - Kefeng Li
- School of Information Science and Electric Engineering, Shandong Jiaotong University, Jinan, China.
| | - Jindi Li
- School of Information Science and Electric Engineering, Shandong Jiaotong University, Jinan, China
| | - Jiaqi Wang
- School of Information Science and Electric Engineering, Shandong Jiaotong University, Jinan, China
| | - Yumin Yang
- School of Information Science and Electric Engineering, Shandong Jiaotong University, Jinan, China
| |
Collapse
|
21
|
Yu M, Guo M, Zhang S, Zhan Y, Zhao M, Lukasiewicz T, Xu Z. RIRGAN: An end-to-end lightweight multi-task learning method for brain MRI super-resolution and denoising. Comput Biol Med 2023; 167:107632. [PMID: 39491379 DOI: 10.1016/j.compbiomed.2023.107632] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 10/05/2023] [Accepted: 10/23/2023] [Indexed: 11/05/2024]
Abstract
A common problem in the field of deep-learning-based low-level vision medical images is that most of the research is based on single task learning (STL), which is dedicated to solving one of the situations of low resolution or high noise. Our motivation is to design a model that can perform both SR and DN tasks simultaneously, in order to cope with the actual situation of low resolution and high noise in low-level vision medical images. By improving the existing single image super-resolution (SISR) network and introducing the idea of multi-task learning (MTL), we propose an end-to-end lightweight MTL generative adversarial network (GAN) based network using residual-in-residual-blocks (RIR-Blocks) for feature extraction, RIRGAN, which can concurrently accomplish super-resolution (SR) and denoising (DN) tasks. The generator in RIRGAN is composed of several residual groups with a long skip connection (LSC), which can help form a very deep network and enable the network to focus on learning high-frequency (HF) information. The introduction of a discriminator based on relativistic average discriminator (RaD) greatly improves the discriminator's ability and makes the generated image have more realistic details. Meanwhile, the use of hybrid loss function not only ensures that RIRGAN has the ability of MTL, but also enables RIRGAN to give a more balanced attention between quantitative evaluation of metrics and qualitative evaluation of human vision. The experimental results show that the quality of the restoration image of RIRGAN is superior to the SR and DN methods based on STL in both subjective perception and objective evaluation metrics when processing medical images with low-level vision. Our RIRGAN is more in line with the practical requirements of medical practice.
Collapse
Affiliation(s)
- Miao Yu
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China
| | - Miaomiao Guo
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China
| | - Shuai Zhang
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China.
| | - Yuefu Zhan
- Department of Radiology, Hainan Women and Children's Medical Center, Haikou, China
| | - Mingkang Zhao
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China
| | - Thomas Lukasiewicz
- Institute of Logic and Computation, Vienna University of Technology, Vienna, Austria; Department of Computer Science, University of Oxford, Oxford, United Kingdom
| | - Zhenghua Xu
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China.
| |
Collapse
|
22
|
Xu Z, Zhang X, Zhang H, Liu Y, Zhan Y, Lukasiewicz T. EFPN: Effective medical image detection using feature pyramid fusion enhancement. Comput Biol Med 2023; 163:107149. [PMID: 37348265 DOI: 10.1016/j.compbiomed.2023.107149] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 05/15/2023] [Accepted: 06/07/2023] [Indexed: 06/24/2023]
Abstract
Feature pyramid networks (FPNs) are widely used in the existing deep detection models to help them utilize multi-scale features. However, there exist two multi-scale feature fusion problems for the FPN-based deep detection models in medical image detection tasks: insufficient multi-scale feature fusion and the same importance for multi-scale features. Therefore, in this work, we propose a new enhanced backbone model, EFPNs, to overcome these problems and help the existing FPN-based detection models to achieve much better medical image detection performances. We first introduce an additional top-down pyramid to help the detection networks fuse deeper multi-scale information; then, a scale enhancement module is developed to use different sizes of kernels to generate more diverse multi-scale features. Finally, we propose a feature fusion attention module to estimate and assign different importance weights to features with different depths and scales. Extensive experiments are conducted on two public lesion detection datasets for different medical image modalities (X-ray and MRI). On the mAP and mR evaluation metrics, EFPN-based Faster R-CNNs improved 1.55% and 4.3% on the PenD (X-ray) dataset, and 2.74% and 3.1% on the BraTs (MRI) dataset, respectively. EFPN-based Faster R-CNNs achieve much better performances than the state-of-the-art baselines in medical image detection tasks. The proposed three improvements are all essential and effective for EFPNs to achieve superior performances; and besides Faster R-CNNs, EFPNs can be easily applied to other deep models to significantly enhance their performances in medical image detection tasks.
Collapse
Affiliation(s)
- Zhenghua Xu
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin, China.
| | - Xudong Zhang
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin, China
| | - Hexiang Zhang
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin, China.
| | - Yunxin Liu
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin, China
| | - Yuefu Zhan
- Department of Radiology, Hainan Women and Children's Medical Center, Haikou, China.
| | - Thomas Lukasiewicz
- Institute of Logic and Computation, TU Wien, Vienna, Austria; Department of Computer Science, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
23
|
Liu H, Zhuang Y, Song E, Xu X, Ma G, Cetinkaya C, Hung CC. A modality-collaborative convolution and transformer hybrid network for unpaired multi-modal medical image segmentation with limited annotations. Med Phys 2023; 50:5460-5478. [PMID: 36864700 DOI: 10.1002/mp.16338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 02/07/2023] [Accepted: 02/22/2023] [Indexed: 03/04/2023] Open
Abstract
BACKGROUND Multi-modal learning is widely adopted to learn the latent complementary information between different modalities in multi-modal medical image segmentation tasks. Nevertheless, the traditional multi-modal learning methods require spatially well-aligned and paired multi-modal images for supervised training, which cannot leverage unpaired multi-modal images with spatial misalignment and modality discrepancy. For training accurate multi-modal segmentation networks using easily accessible and low-cost unpaired multi-modal images in clinical practice, unpaired multi-modal learning has received comprehensive attention recently. PURPOSE Existing unpaired multi-modal learning methods usually focus on the intensity distribution gap but ignore the scale variation problem between different modalities. Besides, within existing methods, shared convolutional kernels are frequently employed to capture common patterns in all modalities, but they are typically inefficient at learning global contextual information. On the other hand, existing methods highly rely on a large number of labeled unpaired multi-modal scans for training, which ignores the practical scenario when labeled data is limited. To solve the above problems, we propose a modality-collaborative convolution and transformer hybrid network (MCTHNet) using semi-supervised learning for unpaired multi-modal segmentation with limited annotations, which not only collaboratively learns modality-specific and modality-invariant representations, but also could automatically leverage extensive unlabeled scans for improving performance. METHODS We make three main contributions to the proposed method. First, to alleviate the intensity distribution gap and scale variation problems across modalities, we develop a modality-specific scale-aware convolution (MSSC) module that can adaptively adjust the receptive field sizes and feature normalization parameters according to the input. Secondly, we propose a modality-invariant vision transformer (MIViT) module as the shared bottleneck layer for all modalities, which implicitly incorporates convolution-like local operations with the global processing of transformers for learning generalizable modality-invariant representations. Third, we design a multi-modal cross pseudo supervision (MCPS) method for semi-supervised learning, which enforces the consistency between the pseudo segmentation maps generated by two perturbed networks to acquire abundant annotation information from unlabeled unpaired multi-modal scans. RESULTS Extensive experiments are performed on two unpaired CT and MR segmentation datasets, including a cardiac substructure dataset derived from the MMWHS-2017 dataset and an abdominal multi-organ dataset consisting of the BTCV and CHAOS datasets. Experiment results show that our proposed method significantly outperforms other existing state-of-the-art methods under various labeling ratios, and achieves a comparable segmentation performance close to single-modal methods with fully labeled data by only leveraging a small portion of labeled data. Specifically, when the labeling ratio is 25%, our proposed method achieves overall mean DSC values of 78.56% and 76.18% in cardiac and abdominal segmentation, respectively, which significantly improves the average DSC value of two tasks by 12.84% compared to single-modal U-Net models. CONCLUSIONS Our proposed method is beneficial for reducing the annotation burden of unpaired multi-modal medical images in clinical applications.
Collapse
Affiliation(s)
- Hong Liu
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Yuzhou Zhuang
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Enmin Song
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Xiangyang Xu
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Guangzhi Ma
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Coskun Cetinkaya
- Center for Machine Vision and Security Research, Kennesaw State University, Kennesaw, Georgia, USA
| | - Chih-Cheng Hung
- Center for Machine Vision and Security Research, Kennesaw State University, Kennesaw, Georgia, USA
| |
Collapse
|
24
|
Qinhong D, Yue H, Wendong B, Yukun D, Huan Y, Yongming X. MAS-Net:Multi-modal Assistant Segmentation Network For Lumbar Intervertebral Disc. Phys Med Biol 2023; 68:175044. [PMID: 37567228 DOI: 10.1088/1361-6560/acef9f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 08/11/2023] [Indexed: 08/13/2023]
Abstract
Objective.Despite advancements in medical imaging technology, the diagnosis and positioning of lumbar disc diseases still heavily rely on the expertise and experience of medical professionals. This process is often time-consuming, labor-intensive, and susceptible to subjective factors. Achieving automatic positioning and segmentation of lumbar intervertebral disc (LID) is the first and critical step in intelligent diagnosis of lumbar disc diseases. However, due to the complexity of the vertebral body and the ambiguity of the soft tissue boundaries of the LID, accurate and intelligent segmentation of LIDs remains challenging. The study aims to accurately and intelligently segment and locate LIDs by fully utilizing multi-modal lumbar magnetic resonance Images (MRIs).Approach.A novel multi-modal assistant segmentation network (MAS-Net) is proposed in this paper. The architecture consists of four key components: the multi-branch fusion encoder (MBFE), the cross-modality correlation evaluation (CMCE), the channel fusion transformer (CFT), and the selective Kernel (SK) based decoder. The MBFE module captures and integrates various modal features, while the CMCE module facilitates the fusion process between the MBFE and decoder. The CFT module selectively guides the flow of information between the MBFE and decoder and effectively utilizes skip connections from multiple layers. The SK module computes the significance of each channel using global pooling operations and applies weights to the input feature maps to improve the models recognition of important features.Main results.The proposed MAS-Net achieved a dice coefficient of 93.08% on IVD3Seg and 93.22% on DualModalDisc dataset, outperforming the current state-of-the-art network, accurately segmenting the LIDs, and generating a 3D model that can precisely display the LIDs.Significance.MAS-Net automates the diagnostics process and addresses challenges faced by doctors. Simplifying and enhancing the clarity of visual representation, multi-modal MRI allows for better information complementation and LIDs segmentation. By successfully integrating data from various modalities, the accuracy of LID segmentation is improved.
Collapse
Affiliation(s)
- Du Qinhong
- Department of Computer Science and Technology, Qingdao University, QingDao, People's Republic of China
| | - He Yue
- Department of Computer Science and Technology, Qingdao University, QingDao, People's Republic of China
| | - Bu Wendong
- Department of Computer Science and Technology, Qingdao University, QingDao, People's Republic of China
| | - Du Yukun
- Department of Spinal surgery, The affiliated hospital of Qingdao University, QingDao, People's Republic of China
| | - Yang Huan
- Department of Computer Science and Technology, Qingdao University, QingDao, People's Republic of China
| | - Xi Yongming
- Department of Spinal surgery, The affiliated hospital of Qingdao University, QingDao, People's Republic of China
| |
Collapse
|
25
|
Yuan D, Xu Z, Tian B, Wang H, Zhan Y, Lukasiewicz T. μ-Net: Medical image segmentation using efficient and effective deep supervision. Comput Biol Med 2023; 160:106963. [PMID: 37150087 DOI: 10.1016/j.compbiomed.2023.106963] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 03/30/2023] [Accepted: 04/18/2023] [Indexed: 05/09/2023]
Abstract
Although the existing deep supervised solutions have achieved some great successes in medical image segmentation, they have the following shortcomings; (i) semantic difference problem: since they are obtained by very different convolution or deconvolution processes, the intermediate masks and predictions in deep supervised baselines usually contain semantics with different depth, which thus hinders the models' learning capabilities; (ii) low learning efficiency problem: additional supervision signals will inevitably make the training of the models more time-consuming. Therefore, in this work, we first propose two deep supervised learning strategies, U-Net-Deep and U-Net-Auto, to overcome the semantic difference problem. Then, to resolve the low learning efficiency problem, upon the above two strategies, we further propose a new deep supervised segmentation model, called μ-Net, to achieve not only effective but also efficient deep supervised medical image segmentation by introducing a tied-weight decoder to generate pseudo-labels with more diverse information and also speed up the convergence in training. Finally, three different types of μ-Net-based deep supervision strategies are explored and a Similarity Principle of Deep Supervision is further derived to guide future research in deep supervised learning. Experimental studies on four public benchmark datasets show that μ-Net greatly outperforms all the state-of-the-art baselines, including the state-of-the-art deeply supervised segmentation models, in terms of both effectiveness and efficiency. Ablation studies sufficiently prove the soundness of the proposed Similarity Principle of Deep Supervision, the necessity and effectiveness of the tied-weight decoder, and using both the segmentation and reconstruction pseudo-labels for deep supervised learning.
Collapse
Affiliation(s)
- Di Yuan
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China
| | - Zhenghua Xu
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China.
| | - Biao Tian
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China
| | - Hening Wang
- State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China
| | - Yuefu Zhan
- Department of Radiology, Hainan Women and Children's Medical Center, Haikou, China
| | - Thomas Lukasiewicz
- Institute of Logic and Computation, TU Wien, Vienna, Austria; Department of Computer Science, University of Oxford, Oxford, United Kingdom
| |
Collapse
|