1
|
Xie X, Zhang X, Tang X, Zhao J, Xiong D, Ouyang L, Yang B, Zhou H, Ling BWK, Teo KL. MACTFusion: Lightweight Cross Transformer for Adaptive Multimodal Medical Image Fusion. IEEE J Biomed Health Inform 2025; 29:3317-3328. [PMID: 38640042 DOI: 10.1109/jbhi.2024.3391620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/21/2024]
Abstract
Multimodal medical image fusion aims to integrate complementary information from different modalities of medical images. Deep learning methods, especially recent vision Transformers, have effectively improved image fusion performance. However, there are limitations for Transformers in image fusion, such as lacks of local feature extraction and cross-modal feature interaction, resulting in insufficient multimodal feature extraction and integration. In addition, the computational cost of Transformers is higher. To address these challenges, in this work, we develop an adaptive cross-modal fusion strategy for unsupervised multimodal medical image fusion. Specifically, we propose a novel lightweight cross Transformer based on cross multi-axis attention mechanism. It includes cross-window attention and cross-grid attention to mine and integrate both local and global interactions of multimodal features. The cross Transformer is further guided by a spatial adaptation fusion module, which allows the model to focus on the most relevant information. Moreover, we design a special feature extraction module that combines multiple gradient residual dense convolutional and Transformer layers to obtain local features from coarse to fine and capture global features. The proposed strategy significantly boosts the fusion performance while minimizing computational costs. Extensive experiments, including clinical brain tumor image fusion, have shown that our model can achieve clearer texture details and better visual quality than other state-of-the-art fusion methods.
Collapse
|
2
|
Lin Q, Guo S, Zhang H, Gao Z. Causal recurrent intervention for cross-modal cardiac image segmentation. Comput Med Imaging Graph 2025; 123:102549. [PMID: 40279865 DOI: 10.1016/j.compmedimag.2025.102549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Revised: 04/01/2025] [Accepted: 04/01/2025] [Indexed: 04/29/2025]
Abstract
Cross-modal cardiac image segmentation is essential for cardiac disease analysis. In diagnosis, it enables clinicians to obtain more precise information about cardiac structure or function for potential signs by leveraging specific imaging modalities. For instance, cardiovascular pathologies such as myocardial infarction and congenital heart defects require precise cross-modal characterization to guide clinical decisions. The growing adoption of cross-modal segmentation in clinical research underscores its technical value, yet annotating cardiac images with multiple slices is time-consuming and labor-intensive, making it difficult to meet clinical and deep learning demands. To reduce the need for labels, cross-modal approaches could leverage general knowledge from multiple modalities. However, implementing a cross-modal method remains challenging due to cross-domain confounding. This challenge arises from the intricate effects of modality and view alterations between images, including inconsistent high-dimensional features. The confounding complicates the causality between the observation (image) and the prediction (label), thereby weakening the domain-invariant representation. Existing disentanglement methods face difficulties in addressing the confounding due to the insufficient depiction of the relationship between latent factors. This paper proposes the causal recurrent intervention (CRI) method to overcome the above challenge. It establishes a structural causal model that allows individual domains to maintain causal consistency through interventions. The CRI method integrates diverse high-dimensional variations into a singular causal relationship by embedding image slices into a sequence. This approach further distinguishes stable and dynamic factors from the sequence, subsequently separating the stable factor into modal and view factors and establishing causal connections between them. It then learns the dynamic factor and the view factor from the observation to obtain the label. Experimental results on cross-modal cardiac images of 1697 examples show that the CRI method delivers promising and productive cross-modal cardiac image segmentation performance.
Collapse
Affiliation(s)
- Qixin Lin
- School of Biomedical Engineering, Sun Yat-sen University, Shenzhen, China.
| | - Saidi Guo
- School of Cyberspace Security, Zhengzhou University, Zhengzhou, China.
| | - Heye Zhang
- School of Biomedical Engineering, Sun Yat-sen University, Shenzhen, China; Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, China.
| | - Zhifan Gao
- School of Biomedical Engineering, Sun Yat-sen University, Shenzhen, China.
| |
Collapse
|
3
|
Tsampras T, Karamanidou T, Papanastasiou G, Stavropoulos TG. Deep learning for cardiac imaging: focus on myocardial diseases, a narrative review. Hellenic J Cardiol 2025; 81:18-24. [PMID: 39662734 DOI: 10.1016/j.hjc.2024.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Accepted: 12/04/2024] [Indexed: 12/13/2024] Open
Abstract
The integration of computational technologies into cardiology has significantly advanced the diagnosis and management of cardiovascular diseases. Computational cardiology, particularly, through cardiovascular imaging and informatics, enables a precise diagnosis of myocardial diseases utilizing techniques such as echocardiography, cardiac magnetic resonance imaging, and computed tomography. Early-stage disease classification, especially in asymptomatic patients, benefits from these advancements, potentially altering disease progression and improving patient outcomes. Automatic segmentation of myocardial tissue using deep learning (DL) algorithms improves efficiency and consistency in analyzing large patient populations. Radiomic analysis can reveal subtle disease characteristics from medical images and can enhance disease detection, enable patient stratification, and facilitate monitoring of disease progression and treatment response. Radiomic biomarkers have already demonstrated high diagnostic accuracy in distinguishing myocardial pathologies and promise treatment individualization in cardiology, earlier disease detection, and disease monitoring. In this context, this narrative review explores the current state of the art in DL applications in medical imaging (CT, CMR, echocardiography, and SPECT), focusing on automatic segmentation, radiomic feature phenotyping, and prediction of myocardial diseases, while also discussing challenges in integration of DL models in clinical practice.
Collapse
|
4
|
Yang H, Yang M, Chen J, Yao G, Zou Q, Jia L. Multimodal deep learning approaches for precision oncology: a comprehensive review. Brief Bioinform 2024; 26:bbae699. [PMID: 39757116 DOI: 10.1093/bib/bbae699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Revised: 12/02/2024] [Accepted: 12/18/2024] [Indexed: 01/07/2025] Open
Abstract
The burgeoning accumulation of large-scale biomedical data in oncology, alongside significant strides in deep learning (DL) technologies, has established multimodal DL (MDL) as a cornerstone of precision oncology. This review provides an overview of MDL applications in this field, based on an extensive literature survey. In total, 651 articles published before September 2024 are included. We first outline publicly available multimodal datasets that support cancer research. Then, we discuss key DL training methods, data representation techniques, and fusion strategies for integrating multimodal data. The review also examines MDL applications in tumor segmentation, detection, diagnosis, prognosis, treatment selection, and therapy response monitoring. Finally, we critically assess the limitations of current approaches and propose directions for future research. By synthesizing current progress and identifying challenges, this review aims to guide future efforts in leveraging MDL to advance precision oncology.
Collapse
Affiliation(s)
- Huan Yang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Chengdian Road, Kecheng District, Quzhou 324000, Zhejiang, China
| | - Minglei Yang
- Department of Pathology, The First Affiliated Hospital of Zhengzhou University, Jianshe Dong Road, Erqi District, Zhengzhou 450052, Henan, China
| | - Jiani Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Chengdian Road, Kecheng District, Quzhou 324000, Zhejiang, China
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Ligong Road, Jimei District, Xiamen 361024, Fujian, China
| | - Guocong Yao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Chengdian Road, Kecheng District, Quzhou 324000, Zhejiang, China
- School of Computer and Information Engineering, Henan University, Jinming Avenue, Longting District, Kaifeng 475001, Henan, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Chengdian Road, Kecheng District, Quzhou 324000, Zhejiang, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Section2, North Jianshe Road, Chenghua District, Chengdu 610054, Sichuan, China
| | - Linpei Jia
- Department of Nephrology, Xuanwu Hospital, Capital Medical University, Changchun Street, Xicheng District, Beijing 100053, China
| |
Collapse
|
5
|
Sabati M, Yang M, Chauhan A. Editorial for "Collaborative Learning for Annotation-Efficient Volumetric MR Image Segmentation". J Magn Reson Imaging 2024; 60:1615-1616. [PMID: 38258419 DOI: 10.1002/jmri.29212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 01/24/2024] Open
Affiliation(s)
- Mohammad Sabati
- Hoglund Biomedical Imaging Center, University of Kansas Medical Center, Kansas City, Kansas, USA
- Bioengineering Program, School of Engineering, University of Kansas, Lawrence, Kansas, USA
| | - Mingrui Yang
- Department of Biomedical Engineering, Program of Advanced Musculoskeletal Imaging, Cleveland Clinic, Cleveland, Ohio, USA
| | - Anil Chauhan
- Department of Radiology, University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
6
|
Chen L, Bian Y, Zeng J, Meng Q, Zhu W, Shi F, Shao C, Chen X, Xiang D. Style Consistency Unsupervised Domain Adaptation Medical Image Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:4882-4895. [PMID: 39236126 DOI: 10.1109/tip.2024.3451934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
Unsupervised domain adaptation medical image segmentation is aimed to segment unlabeled target domain images with labeled source domain images. However, different medical imaging modalities lead to large domain shift between their images, in which well-trained models from one imaging modality often fail to segment images from anothor imaging modality. In this paper, to mitigate domain shift between source domain and target domain, a style consistency unsupervised domain adaptation image segmentation method is proposed. First, a local phase-enhanced style fusion method is designed to mitigate domain shift and produce locally enhanced organs of interest. Second, a phase consistency discriminator is constructed to distinguish the phase consistency of domain-invariant features between source domain and target domain, so as to enhance the disentanglement of the domain-invariant and style encoders and removal of domain-specific features from the domain-invariant encoder. Third, a style consistency estimation method is proposed to obtain inconsistency maps from intermediate synthesized target domain images with different styles to measure the difficult regions, mitigate domain shift between synthesized target domain images and real target domain images, and improve the integrity of interested organs. Fourth, style consistency entropy is defined for target domain images to further improve the integrity of the interested organ by the concentration on the inconsistent regions. Comprehensive experiments have been performed with an in-house dataset and a publicly available dataset. The experimental results have demonstrated the superiority of our framework over state-of-the-art methods.
Collapse
|
7
|
Liu Z, Kainth K, Zhou A, Deyer TW, Fayad ZA, Greenspan H, Mei X. A review of self-supervised, generative, and few-shot deep learning methods for data-limited magnetic resonance imaging segmentation. NMR IN BIOMEDICINE 2024; 37:e5143. [PMID: 38523402 DOI: 10.1002/nbm.5143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 02/15/2024] [Accepted: 02/16/2024] [Indexed: 03/26/2024]
Abstract
Magnetic resonance imaging (MRI) is a ubiquitous medical imaging technology with applications in disease diagnostics, intervention, and treatment planning. Accurate MRI segmentation is critical for diagnosing abnormalities, monitoring diseases, and deciding on a course of treatment. With the advent of advanced deep learning frameworks, fully automated and accurate MRI segmentation is advancing. Traditional supervised deep learning techniques have advanced tremendously, reaching clinical-level accuracy in the field of segmentation. However, these algorithms still require a large amount of annotated data, which is oftentimes unavailable or impractical. One way to circumvent this issue is to utilize algorithms that exploit a limited amount of labeled data. This paper aims to review such state-of-the-art algorithms that use a limited number of annotated samples. We explain the fundamental principles of self-supervised learning, generative models, few-shot learning, and semi-supervised learning and summarize their applications in cardiac, abdomen, and brain MRI segmentation. Throughout this review, we highlight algorithms that can be employed based on the quantity of annotated data available. We also present a comprehensive list of notable publicly available MRI segmentation datasets. To conclude, we discuss possible future directions of the field-including emerging algorithms, such as contrastive language-image pretraining, and potential combinations across the methods discussed-that can further increase the efficacy of image segmentation with limited labels.
Collapse
Affiliation(s)
- Zelong Liu
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Komal Kainth
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Alexander Zhou
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Timothy W Deyer
- East River Medical Imaging, New York, New York, USA
- Department of Radiology, Cornell Medicine, New York, New York, USA
| | - Zahi A Fayad
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Diagnostic, Molecular, and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Hayit Greenspan
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Diagnostic, Molecular, and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Xueyan Mei
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Diagnostic, Molecular, and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| |
Collapse
|
8
|
Sun H, Wei J, Yuan W, Li R. Semi-supervised multi-modal medical image segmentation with unified translation. Comput Biol Med 2024; 176:108570. [PMID: 38749326 DOI: 10.1016/j.compbiomed.2024.108570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 03/08/2024] [Accepted: 05/05/2024] [Indexed: 05/31/2024]
Abstract
The two major challenges to deep-learning-based medical image segmentation are multi-modality and a lack of expert annotations. Existing semi-supervised segmentation models can mitigate the problem of insufficient annotations by utilizing a small amount of labeled data. However, most of these models are limited to single-modal data and cannot exploit the complementary information from multi-modal medical images. A few semi-supervised multi-modal models have been proposed recently, but they have rigid structures and require additional training steps for each modality. In this work, we propose a novel flexible method, semi-supervised multi-modal medical image segmentation with unified translation (SMSUT), and a unique semi-supervised procedure that can leverage multi-modal information to improve the semi-supervised segmentation performance. Our architecture capitalizes on unified translation to extract complementary information from multi-modal data which compels the network to focus on the disparities and salient features among each modality. Furthermore, we impose constraints on the model at both pixel and feature levels, to cope with the lack of annotation information and the diverse representations within semi-supervised multi-modal data. We introduce a novel training procedure tailored for semi-supervised multi-modal medical image analysis, by integrating the concept of conditional translation. Our method has a remarkable ability for seamless adaptation to varying numbers of distinct modalities in the training data. Experiments show that our model exceeds the semi-supervised segmentation counterparts in the public datasets which proves our network's high-performance capabilities and the transferability of our proposed method. The code of our method will be openly available at https://github.com/Sue1347/SMSUT-MedicalImgSegmentation.
Collapse
Affiliation(s)
- Huajun Sun
- South China University of Technology, Guangzhou, 510006, China.
| | - Jia Wei
- South China University of Technology, Guangzhou, 510006, China.
| | - Wenguang Yuan
- Huawei Cloud BU EI Innovation Laboratory, Dongguan, 523000, China.
| | - Rui Li
- Rochester Institute of Technology, Rochester, NY 14623, USA.
| |
Collapse
|
9
|
Papanastasiou G, Dikaios N, Huang J, Wang C, Yang G. Is Attention all You Need in Medical Image Analysis? A Review. IEEE J Biomed Health Inform 2024; 28:1398-1411. [PMID: 38157463 DOI: 10.1109/jbhi.2023.3348436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
Medical imaging is a key component in clinical diagnosis, treatment planning and clinical trial design, accounting for almost 90% of all healthcare data. CNNs achieved performance gains in medical image analysis (MIA) over the last years. CNNs can efficiently model local pixel interactions and be trained on small-scale MI data. Despite their important advances, typical CNN have relatively limited capabilities in modelling "global" pixel interactions, which restricts their generalisation ability to understand out-of-distribution data with different "global" information. The recent progress of Artificial Intelligence gave rise to Transformers, which can learn global relationships from data. However, full Transformer models need to be trained on large-scale data and involve tremendous computational complexity. Attention and Transformer compartments ("Transf/Attention") which can well maintain properties for modelling global relationships, have been proposed as lighter alternatives of full Transformers. Recently, there is an increasing trend to co-pollinate complementary local-global properties from CNN and Transf/Attention architectures, which led to a new era of hybrid models. The past years have witnessed substantial growth in hybrid CNN-Transf/Attention models across diverse MIA problems. In this systematic review, we survey existing hybrid CNN-Transf/Attention models, review and unravel key architectural designs, analyse breakthroughs, and evaluate current and future opportunities as well as challenges. We also introduced an analysis framework on generalisation opportunities of scientific and clinical impact, based on which new data-driven domain generalisation and adaptation methods can be stimulated.
Collapse
|
10
|
Shao M, Cheng C, Hu C, Zheng J, Zhang B, Wang T, Jin G, Liu Z, Zuo C. Semisupervised 3D segmentation of pancreatic tumors in positron emission tomography/computed tomography images using a mutual information minimization and cross-fusion strategy. Quant Imaging Med Surg 2024; 14:1747-1765. [PMID: 38415108 PMCID: PMC10895119 DOI: 10.21037/qims-23-1153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 12/08/2023] [Indexed: 02/29/2024]
Abstract
Background Accurate segmentation of pancreatic cancer tumors using positron emission tomography/computed tomography (PET/CT) multimodal images is crucial for clinical diagnosis and prognosis evaluation. However, deep learning methods for automated medical image segmentation require a substantial amount of manually labeled data, making it time-consuming and labor-intensive. Moreover, addition or simple stitching of multimodal images leads to redundant information, failing to fully exploit the complementary information of multimodal images. Therefore, we developed a semisupervised multimodal network that leverages limited labeled samples and introduces a cross-fusion and mutual information minimization (MIM) strategy for PET/CT 3D segmentation of pancreatic tumors. Methods Our approach combined a cross multimodal fusion (CMF) module with a cross-attention mechanism. The complementary multimodal features were fused to form a multifeature set to enhance the effectiveness of feature extraction while preserving specific features of each modal image. In addition, we designed an MIM module to mitigate redundant high-level modal information and compute the latent loss of PET and CT. Finally, our method employed the uncertainty-aware mean teacher semi-supervised framework to segment regions of interest from PET/CT images using a small amount of labeled data and a large amount of unlabeled data. Results We evaluated our combined MIM and CMF semisupervised segmentation network (MIM-CMFNet) on a private dataset of pancreatic cancer, yielding an average Dice coefficient of 73.14%, an average Jaccard index score of 60.56%, and an average 95% Hausdorff distance (95HD) of 6.30 mm. In addition, to verify the broad applicability of our method, we used a public dataset of head and neck cancer, yielding an average Dice coefficient of 68.71%, an average Jaccard index score of 57.72%, and an average 95HD of 7.88 mm. Conclusions The experimental results demonstrate the superiority of our MIM-CMFNet over existing semisupervised techniques. Our approach can achieve a performance similar to that of fully supervised segmentation methods while significantly reducing the data annotation cost by 80%, suggesting it is highly practicable for clinical application.
Collapse
Affiliation(s)
- Min Shao
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Department of Medical Imaging, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, China
| | - Chao Cheng
- Department of Nuclear Medicine, the First Affiliated Hospital (Changhai Hospital) of Naval Medical University, Shanghai, China
| | - Chengyuan Hu
- Department of AI Algorithm, Shenzhen Poros Technology Co., Ltd., Shenzhen, China
| | - Jian Zheng
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Department of Medical Imaging, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, China
| | - Bo Zhang
- Department of Radiology, the Second Affiliated Hospital of Soochow University, Suzhou, China
| | - Tao Wang
- Department of Nuclear Medicine, the First Affiliated Hospital (Changhai Hospital) of Naval Medical University, Shanghai, China
| | - Gang Jin
- Department of Hepatobiliary Pancreatic Surgery, the First Affiliated Hospital (Changhai Hospital) of Naval Medical University, Shanghai, China
| | - Zhaobang Liu
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Department of Medical Imaging, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, China
| | - Changjing Zuo
- Department of Nuclear Medicine, the First Affiliated Hospital (Changhai Hospital) of Naval Medical University, Shanghai, China
| |
Collapse
|
11
|
Wu H, Zhang B, Chen C, Qin J. Federated Semi-Supervised Medical Image Segmentation via Prototype-Based Pseudo-Labeling and Contrastive Learning. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:649-661. [PMID: 37703140 DOI: 10.1109/tmi.2023.3314430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]
Abstract
Existing federated learning works mainly focus on the fully supervised training setting. In realistic scenarios, however, most clinical sites can only provide data without annotations due to the lack of resources or expertise. In this work, we are concerned with the practical yet challenging federated semi-supervised segmentation (FSSS), where labeled data are only with several clients and other clients can just provide unlabeled data. We take an early attempt to tackle this problem and propose a novel FSSS method with prototype-based pseudo-labeling and contrastive learning. First, we transmit a labeled-aggregated model, which is obtained based on prototype similarity, to each unlabeled client, to work together with the global model for debiased pseudo labels generation via a consistency- and entropy-aware selection strategy. Second, we transfer image-level prototypes from labeled datasets to unlabeled clients and conduct prototypical contrastive learning on unlabeled models to enhance their discriminative power. Finally, we perform the dynamic model aggregation with a designed consistency-aware aggregation strategy to dynamically adjust the aggregation weights of each local model. We evaluate our method on COVID-19 X-ray infected region segmentation, COVID-19 CT infected region segmentation and colorectal polyp segmentation, and experimental results consistently demonstrate the effectiveness of our proposed method. Codes areavailable at https://github.com/zhangbaiming/FedSemiSeg.
Collapse
|
12
|
Milosevic M, Jin Q, Singh A, Amal S. Applications of AI in multi-modal imaging for cardiovascular disease. FRONTIERS IN RADIOLOGY 2024; 3:1294068. [PMID: 38283302 PMCID: PMC10811170 DOI: 10.3389/fradi.2023.1294068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 12/22/2023] [Indexed: 01/30/2024]
Abstract
Data for healthcare is diverse and includes many different modalities. Traditional approaches to Artificial Intelligence for cardiovascular disease were typically limited to single modalities. With the proliferation of diverse datasets and new methods in AI, we are now able to integrate different modalities, such as magnetic resonance scans, computerized tomography scans, echocardiography, x-rays, and electronic health records. In this paper, we review research from the last 5 years in applications of AI to multi-modal imaging. There have been many promising results in registration, segmentation, and fusion of different magnetic resonance imaging modalities with each other and computer tomography scans, but there are still many challenges that need to be addressed. Only a few papers have addressed modalities such as x-ray, echocardiography, or non-imaging modalities. As for prediction or classification tasks, there have only been a couple of papers that use multiple modalities in the cardiovascular domain. Furthermore, no models have been implemented or tested in real world cardiovascular clinical settings.
Collapse
Affiliation(s)
- Marko Milosevic
- Roux Institute, Northeastern University, Portland, ME, United States
| | - Qingchu Jin
- Roux Institute, Northeastern University, Portland, ME, United States
| | - Akarsh Singh
- College of Engineering, Northeastern University, Boston, MA, United States
| | - Saeed Amal
- Roux Institute, Northeastern University, Portland, ME, United States
| |
Collapse
|
13
|
Xie Q, Li Y, He N, Ning M, Ma K, Wang G, Lian Y, Zheng Y. Unsupervised Domain Adaptation for Medical Image Segmentation by Disentanglement Learning and Self-Training. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:4-14. [PMID: 35853072 DOI: 10.1109/tmi.2022.3192303] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Unsupervised domain adaption (UDA), which aims to enhance the segmentation performance of deep models on unlabeled data, has recently drawn much attention. In this paper, we propose a novel UDA method (namely DLaST) for medical image segmentation via disentanglement learning and self-training. Disentanglement learning factorizes an image into domain-invariant anatomy and domain-specific modality components. To make the best of disentanglement learning, we propose a novel shape constraint to boost the adaptation performance. The self-training strategy further adaptively improves the segmentation performance of the model for the target domain through adversarial learning and pseudo label, which implicitly facilitates feature alignment in the anatomy space. Experimental results demonstrate that the proposed method outperforms the state-of-the-art UDA methods for medical image segmentation on three public datasets, i.e., a cardiac dataset, an abdominal dataset and a brain dataset. The code will be released soon.
Collapse
|
14
|
Xing F, Yang X, Cornish TC, Ghosh D. Learning with limited target data to detect cells in cross-modality images. Med Image Anal 2023; 90:102969. [PMID: 37802010 DOI: 10.1016/j.media.2023.102969] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 08/16/2023] [Accepted: 09/11/2023] [Indexed: 10/08/2023]
Abstract
Deep neural networks have achieved excellent cell or nucleus quantification performance in microscopy images, but they often suffer from performance degradation when applied to cross-modality imaging data. Unsupervised domain adaptation (UDA) based on generative adversarial networks (GANs) has recently improved the performance of cross-modality medical image quantification. However, current GAN-based UDA methods typically require abundant target data for model training, which is often very expensive or even impossible to obtain for real applications. In this paper, we study a more realistic yet challenging UDA situation, where (unlabeled) target training data is limited and previous work seldom delves into cell identification. We first enhance a dual GAN with task-specific modeling, which provides additional supervision signals to assist with generator learning. We explore both single-directional and bidirectional task-augmented GANs for domain adaptation. Then, we further improve the GAN by introducing a differentiable, stochastic data augmentation module to explicitly reduce discriminator overfitting. We examine source-, target-, and dual-domain data augmentation for GAN enhancement, as well as joint task and data augmentation in a unified GAN-based UDA framework. We evaluate the framework for cell detection on multiple public and in-house microscopy image datasets, which are acquired with different imaging modalities, staining protocols and/or tissue preparations. The experiments demonstrate that our method significantly boosts performance when compared with the reference baseline, and it is superior to or on par with fully supervised models that are trained with real target annotations. In addition, our method outperforms recent state-of-the-art UDA approaches by a large margin on different datasets.
Collapse
Affiliation(s)
- Fuyong Xing
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, 13001 E 17th Pl, Aurora, CO 80045, USA.
| | - Xinyi Yang
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, 13001 E 17th Pl, Aurora, CO 80045, USA
| | - Toby C Cornish
- Department of Pathology, University of Colorado Anschutz Medical Campus, 13001 E 17th Pl, Aurora, CO 80045, USA
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, 13001 E 17th Pl, Aurora, CO 80045, USA
| |
Collapse
|
15
|
Abulnaga SM, Dey N, Young SI, Pan E, Hobgood KI, Wang CJ, Grant PE, Turk EA, Golland P. Shape-aware Segmentation of the Placenta in BOLD Fetal MRI Time Series. THE JOURNAL OF MACHINE LEARNING FOR BIOMEDICAL IMAGING 2023; 2:527-546. [PMID: 39469044 PMCID: PMC11514310 DOI: 10.59275/j.melba.2023-g3f8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/30/2024]
Abstract
Blood oxygen level dependent (BOLD) MRI time series with maternal hyperoxia can assess placental oxygenation and function. Measuring precise BOLD changes in the placenta requires accurate temporal placental segmentation and is confounded by fetal and maternal motion, contractions, and hyperoxia-induced intensity changes. Current BOLD placenta segmentation methods warp a manually annotated subject-specific template to the entire time series. However, as the placenta is a thin, elongated, and highly non-rigid organ subject to large deformations and obfuscated edges, existing work cannot accurately segment the placental shape, especially near boundaries. In this work, we propose a machine learning segmentation framework for placental BOLD MRI and apply it to segmenting each volume in a time series. We use a placental-boundary weighted loss formulation and perform a comprehensive evaluation across several popular segmentation objectives. Our model is trained and tested on a cohort of 91 subjects containing healthy fetuses, fetuses with fetal growth restriction, and mothers with high BMI. Biomedically, our model performs reliably in segmenting volumes in both normoxic and hyperoxic points in the BOLD time series. We further find that boundary-weighting increases placental segmentation performance by 8.3% and 6.0% Dice coefficient for the cross-entropy and signed distance transform objectives, respectively.
Collapse
Affiliation(s)
- S Mazdak Abulnaga
- CSAIL/EECS, Massachusetts Institute of Technology, Cambridge, MA, USA; MGH/HST Martinos Center for Biomedical Imaging, Harvard Medical School, Boston, MA, USA
| | - Neel Dey
- CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Sean I Young
- MGH/HST Martinos Center for Biomedical Imaging, Harvard Medical School, Boston, MA, USA; CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Eileen Pan
- CSAIL/EECS, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | - Clinton J Wang
- CSAIL/EECS, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - P Ellen Grant
- Fetal-Neonatal Neuroimaging and Developmental Science Center, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Esra Abaci Turk
- Fetal-Neonatal Neuroimaging and Developmental Science Center, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Polina Golland
- CSAIL/EECS, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
16
|
Gu R, Wang G, Lu J, Zhang J, Lei W, Chen Y, Liao W, Zhang S, Li K, Metaxas DN, Zhang S. CDDSA: Contrastive domain disentanglement and style augmentation for generalizable medical image segmentation. Med Image Anal 2023; 89:102904. [PMID: 37506556 DOI: 10.1016/j.media.2023.102904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 06/06/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023]
Abstract
Generalization to previously unseen images with potential domain shifts is essential for clinically applicable medical image segmentation. Disentangling domain-specific and domain-invariant features is key for Domain Generalization (DG). However, existing DG methods struggle to achieve effective disentanglement. To address this problem, we propose an efficient framework called Contrastive Domain Disentanglement and Style Augmentation (CDDSA) for generalizable medical image segmentation. First, a disentangle network decomposes the image into domain-invariant anatomical representation and domain-specific style code, where the former is sent for further segmentation that is not affected by domain shift, and the disentanglement is regularized by a decoder that combines the anatomical representation and style code to reconstruct the original image. Second, to achieve better disentanglement, a contrastive loss is proposed to encourage the style codes from the same domain and different domains to be compact and divergent, respectively. Finally, to further improve generalizability, we propose a style augmentation strategy to synthesize images with various unseen styles in real time while maintaining anatomical information. Comprehensive experiments on a public multi-site fundus image dataset and an in-house multi-site Nasopharyngeal Carcinoma Magnetic Resonance Image (NPC-MRI) dataset show that the proposed CDDSA achieved remarkable generalizability across different domains, and it outperformed several state-of-the-art methods in generalizable segmentation. Code is available at https://github.com/HiLab-git/DAG4MIA.
Collapse
Affiliation(s)
- Ran Gu
- School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Guotai Wang
- School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, China; Shanghai AI Lab, Shanghai, China.
| | - Jiangshan Lu
- School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Jingyang Zhang
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China; School of Biomedical Engineering, ShanghaiTech University, Shanghai, China
| | - Wenhui Lei
- School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China; Shanghai AI Lab, Shanghai, China
| | - Yinan Chen
- SenseTime Research, Shanghai, China; West China Hospital-SenseTime Joint Lab, West China Biomedical Big Data Center, Sichuan University, Chengdu, China
| | - Wenjun Liao
- Department of Radiation Oncology, Sichuan Cancer Hospital and Institute, University of Electronic Science and Technology of China, Chengdu, China
| | - Shichuan Zhang
- Department of Radiation Oncology, Sichuan Cancer Hospital and Institute, University of Electronic Science and Technology of China, Chengdu, China
| | - Kang Li
- West China Hospital-SenseTime Joint Lab, West China Biomedical Big Data Center, Sichuan University, Chengdu, China
| | - Dimitris N Metaxas
- Department of Computer Science, Rutgers University, Piscataway NJ 08854, USA
| | - Shaoting Zhang
- School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, China; SenseTime Research, Shanghai, China; Shanghai AI Lab, Shanghai, China.
| |
Collapse
|
17
|
Dai Y, Zou B, Zhu C, Li Y, Chen Z, Ji Z, Kui X, Zhang W. DE-JANet: A unified network based on dual encoder and joint attention for Alzheimer's disease classification using multi-modal data. Comput Biol Med 2023; 165:107396. [PMID: 37703717 DOI: 10.1016/j.compbiomed.2023.107396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 07/28/2023] [Accepted: 08/26/2023] [Indexed: 09/15/2023]
Abstract
Structural magnetic resonance imaging (sMRI), which can reflect cerebral atrophy, plays an important role in the early detection of Alzheimer's disease (AD). However, the information provided by analyzing only the morphological changes in sMRI is relatively limited, and the assessment of the atrophy degree is subjective. Therefore, it is meaningful to combine sMRI with other clinical information to acquire complementary diagnosis information and achieve a more accurate classification of AD. Nevertheless, how to fuse these multi-modal data effectively is still challenging. In this paper, we propose DE-JANet, a unified AD classification network that integrates image data sMRI with non-image clinical data, such as age and Mini-Mental State Examination (MMSE) score, for more effective multi-modal analysis. DE-JANet consists of three key components: (1) a dual encoder module for extracting low-level features from the image and non-image data according to specific encoding regularity, (2) a joint attention module for fusing multi-modal features, and (3) a token classification module for performing AD-related classification according to the fused multi-modal features. Our DE-JANet is evaluated on the ADNI dataset, with a mean accuracy of 0.9722 and 0.9538 for AD classification and mild cognition impairment (MCI) classification, respectively, which is superior to existing methods and indicates advanced performance on AD-related diagnosis tasks.
Collapse
Affiliation(s)
- Yulan Dai
- School of Computer Science and Engineering, Central South University, Changsha, China; Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha, China
| | - Beiji Zou
- School of Computer Science and Engineering, Central South University, Changsha, China; Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha, China
| | - Chengzhang Zhu
- School of Computer Science and Engineering, Central South University, Changsha, China; Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha, China.
| | - Yang Li
- School of Computer Science and Engineering, Central South University, Changsha, China; Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha, China
| | - Zhi Chen
- School of Computer Science and Engineering, Central South University, Changsha, China; Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha, China
| | - Zexin Ji
- School of Computer Science and Engineering, Central South University, Changsha, China; Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha, China
| | - Xiaoyan Kui
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Wensheng Zhang
- Institute of Automation, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
18
|
Li L, Ding W, Huang L, Zhuang X, Grau V. Multi-modality cardiac image computing: A survey. Med Image Anal 2023; 88:102869. [PMID: 37384950 DOI: 10.1016/j.media.2023.102869] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 05/01/2023] [Accepted: 06/12/2023] [Indexed: 07/01/2023]
Abstract
Multi-modality cardiac imaging plays a key role in the management of patients with cardiovascular diseases. It allows a combination of complementary anatomical, morphological and functional information, increases diagnosis accuracy, and improves the efficacy of cardiovascular interventions and clinical outcomes. Fully-automated processing and quantitative analysis of multi-modality cardiac images could have a direct impact on clinical research and evidence-based patient management. However, these require overcoming significant challenges including inter-modality misalignment and finding optimal methods to integrate information from different modalities. This paper aims to provide a comprehensive review of multi-modality imaging in cardiology, the computing methods, the validation strategies, the related clinical workflows and future perspectives. For the computing methodologies, we have a favored focus on the three tasks, i.e., registration, fusion and segmentation, which generally involve multi-modality imaging data, either combining information from different modalities or transferring information across modalities. The review highlights that multi-modality cardiac imaging data has the potential of wide applicability in the clinic, such as trans-aortic valve implantation guidance, myocardial viability assessment, and catheter ablation therapy and its patient selection. Nevertheless, many challenges remain unsolved, such as missing modality, modality selection, combination of imaging and non-imaging data, and uniform analysis and representation of different modalities. There is also work to do in defining how the well-developed techniques fit in clinical workflows and how much additional and relevant information they introduce. These problems are likely to continue to be an active field of research and the questions to be answered in the future.
Collapse
Affiliation(s)
- Lei Li
- Department of Engineering Science, University of Oxford, Oxford, UK.
| | - Wangbin Ding
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, China
| | - Liqin Huang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, China
| | - Xiahai Zhuang
- School of Data Science, Fudan University, Shanghai, China
| | - Vicente Grau
- Department of Engineering Science, University of Oxford, Oxford, UK
| |
Collapse
|
19
|
Hadler T, Ammann C, Wetzl J, Viezzer D, Gröschel J, Fenski M, Abazi E, Lange S, Hennemuth A, Schulz-Menger J. Lazy Luna: Extendible software for multilevel reader comparison in cardiovascular magnetic resonance imaging. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 238:107615. [PMID: 37257373 DOI: 10.1016/j.cmpb.2023.107615] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 04/27/2023] [Accepted: 05/16/2023] [Indexed: 06/02/2023]
Abstract
BACKGROUND AND OBJECTIVES Cardiovascular Magnetic Resonance (CMR) imaging is a growing field with increasing diagnostic utility in clinical routine. Quantitative diagnostic parameters are typically calculated based on contours or points provided by readers, e.g. natural intelligences (NI) such as clinicians or researchers, and artificial intelligences (AI). As clinical applications multiply, evaluating the precision and reproducibility of quantitative parameters becomes increasingly important. Although segmentation challenges for AIs and guidelines for clinicians provide quality assessments and regulation, the methods ought to be combined and streamlined for clinical applications. The goal of the developed software, Lazy Luna (LL), is to offer a flexible evaluation tool that is readily extendible to new sequences and scientific endeavours. METHODS An interface was designed for LL, which allows for comparing annotated CMR images. Geometric objects ensure precise calculations of metric values and clinical results regardless of whether annotations originate from AIs or NIs. A graphical user interface (GUI) is provided to make the software available to non-programmers. The GUI allows for an interactive inspection of image datasets as well as implementing tracing procedures, which follow statistical reader differences in clinical results to their origins in individual image contours. The backend software builds on a set of meta-classes, which can be extended to new imaging sequences and clinical parameters. Following an agile development procedure with clinical feedback allows for a quick implementation of new classes, figures and tables for evaluation. RESULTS Two application cases present LL's extendibility to clinical evaluation and AI development contexts. The first concerns T1 parametric mapping images segmented by two expert readers. Quantitative result differences are traced to reveal typical segmentation dissimilarities from which these differences originate. The meta-classes are extended to this new application scenario. The second applies to the open source Late Gadolinium Enhancement (LGE) quantification challenge for AI developers "Emidec", which illustrates LL's usability as open source software. CONCLUSION The presented software Lazy Luna allows for an automated multilevel comparison of readers as well as identifying qualitative reasons for statistical reader differences. The open source software LL can be extended to new application cases in the future.
Collapse
Affiliation(s)
- Thomas Hadler
- Working Group on CMR, Experimental and Clinical Research Center, a cooperation between the Max Delbrück Center for Molecular Medicine in the Helmholtz Association and Charité - Universitätsmedizin Berlin, Berlin, Germany; Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany; DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany.
| | - Clemens Ammann
- Working Group on CMR, Experimental and Clinical Research Center, a cooperation between the Max Delbrück Center for Molecular Medicine in the Helmholtz Association and Charité - Universitätsmedizin Berlin, Berlin, Germany; Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | | | - Darian Viezzer
- Working Group on CMR, Experimental and Clinical Research Center, a cooperation between the Max Delbrück Center for Molecular Medicine in the Helmholtz Association and Charité - Universitätsmedizin Berlin, Berlin, Germany; Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany; DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany
| | - Jan Gröschel
- Working Group on CMR, Experimental and Clinical Research Center, a cooperation between the Max Delbrück Center for Molecular Medicine in the Helmholtz Association and Charité - Universitätsmedizin Berlin, Berlin, Germany; Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany; DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany; Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Maximilian Fenski
- Working Group on CMR, Experimental and Clinical Research Center, a cooperation between the Max Delbrück Center for Molecular Medicine in the Helmholtz Association and Charité - Universitätsmedizin Berlin, Berlin, Germany; Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Endri Abazi
- Working Group on CMR, Experimental and Clinical Research Center, a cooperation between the Max Delbrück Center for Molecular Medicine in the Helmholtz Association and Charité - Universitätsmedizin Berlin, Berlin, Germany; Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Steffen Lange
- Department of Computer Sciences, Hochschule Darmstadt - University of Applied Sciences, Darmstadt, Germany
| | - Anja Hennemuth
- DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany; Institute of Cardiovascular Computer-assisted Medicine, Charité - Universitätsmedizin Berlin, Berlin, Germany; Fraunhofer MEVIS, Bremen, Germany
| | - Jeanette Schulz-Menger
- Working Group on CMR, Experimental and Clinical Research Center, a cooperation between the Max Delbrück Center for Molecular Medicine in the Helmholtz Association and Charité - Universitätsmedizin Berlin, Berlin, Germany; Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany; DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany; Siemens Healthineers, Erlangen, Germany; Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany; Department of Cardiology and Nephrology, HELIOS Hospital Berlin-Buch, Berlin, Germany
| |
Collapse
|
20
|
A comprehensive survey on design and application of autoencoder in deep learning. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
|
21
|
Li Y, Sun X, Wang S, Li X, Qin Y, Pan J, Chen P. MDST: multi-domain sparse-view CT reconstruction based on convolution and swin transformer. Phys Med Biol 2023; 68:095019. [PMID: 36889004 DOI: 10.1088/1361-6560/acc2ab] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 03/08/2023] [Indexed: 03/10/2023]
Abstract
Objective.Sparse-view computed tomography (SVCT), which can reduce the radiation doses administered to patients and hasten data acquisition, has become an area of particular interest to researchers. Most existing deep learning-based image reconstruction methods are based on convolutional neural networks (CNNs). Due to the locality of convolution and continuous sampling operations, existing approaches cannot fully model global context feature dependencies, which makes the CNN-based approaches less efficient in modeling the computed tomography (CT) images with various structural information.Approach.To overcome the above challenges, this paper develops a novel multi-domain optimization network based on convolution and swin transformer (MDST). MDST uses swin transformer block as the main building block in both projection (residual) domain and image (residual) domain sub-networks, which models global and local features of the projections and reconstructed images. MDST consists of two modules for initial reconstruction and residual-assisted reconstruction, respectively. The sparse sinogram is first expanded in the initial reconstruction module with a projection domain sub-network. Then, the sparse-view artifacts are effectively suppressed by an image domain sub-network. Finally, the residual assisted reconstruction module to correct the inconsistency of the initial reconstruction, further preserving image details.Main results. Extensive experiments on CT lymph node datasets and real walnut datasets show that MDST can effectively alleviate the loss of fine details caused by information attenuation and improve the reconstruction quality of medical images.Significance.MDST network is robust and can effectively reconstruct images with different noise level projections. Different from the current prevalent CNN-based networks, MDST uses transformer as the main backbone, which proves the potential of transformer in SVCT reconstruction.
Collapse
Affiliation(s)
- Yu Li
- Department of Information and Communication Engineering, North University of China, Taiyuan, People's Republic of China
- The State Key Lab for Electronic Testing Technology, North University of China, People's Republic of China
| | - XueQin Sun
- Department of Information and Communication Engineering, North University of China, Taiyuan, People's Republic of China
- The State Key Lab for Electronic Testing Technology, North University of China, People's Republic of China
| | - SuKai Wang
- Department of Information and Communication Engineering, North University of China, Taiyuan, People's Republic of China
- The State Key Lab for Electronic Testing Technology, North University of China, People's Republic of China
| | - XuRu Li
- Department of Information and Communication Engineering, North University of China, Taiyuan, People's Republic of China
- The State Key Lab for Electronic Testing Technology, North University of China, People's Republic of China
| | - YingWei Qin
- Department of Information and Communication Engineering, North University of China, Taiyuan, People's Republic of China
- The State Key Lab for Electronic Testing Technology, North University of China, People's Republic of China
| | - JinXiao Pan
- Department of Information and Communication Engineering, North University of China, Taiyuan, People's Republic of China
- The State Key Lab for Electronic Testing Technology, North University of China, People's Republic of China
| | - Ping Chen
- Department of Information and Communication Engineering, North University of China, Taiyuan, People's Republic of China
- The State Key Lab for Electronic Testing Technology, North University of China, People's Republic of China
| |
Collapse
|
22
|
Liu Z, Wei J, Li R, Zhou J. Learning multi-modal brain tumor segmentation from privileged semi-paired MRI images with curriculum disentanglement learning. Comput Biol Med 2023; 159:106927. [PMID: 37105113 DOI: 10.1016/j.compbiomed.2023.106927] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 04/02/2023] [Accepted: 04/13/2023] [Indexed: 04/29/2023]
Abstract
Since the brain is the human body's primary command and control center, brain cancer is one of the most dangerous cancers. Automatic segmentation of brain tumors from multi-modal images is important in diagnosis and treatment. Due to the difficulties in obtaining multi-modal paired images in clinical practice, recent studies segment brain tumors solely relying on unpaired images and discarding the available paired images. Although these models solve the dependence on paired images, they cannot fully exploit the complementary information from different modalities, resulting in low unimodal segmentation accuracy. Hence, this work studies the unimodal segmentation with privileged semi-paired images, i.e., limited paired images are introduced to the training phase. Specifically, we present a novel two-step (intra-modality and inter-modality) curriculum disentanglement learning framework. The modality-specific style codes describe the attenuation of tissue features and image contrast, and modality-invariant content codes contain anatomical and functional information extracted from the input images. Besides, we address the problem of unthorough decoupling by introducing constraints on the style and content spaces. Experiments on the BraTS2020 dataset highlight that our model outperforms the competing models on unimodal segmentation, achieving average dice scores of 82.91%, 72.62%, and 54.80% for WT (the whole tumor), TC (the tumor core), and ET (the enhancing tumor), respectively. Finally, we further evaluate our model's variable multi-modal brain tumor segmentation performance by introducing a fusion block (TFusion). The experimental results reveal that our model achieves the best WT segmentation performance for all 15 possible modality combinations with 87.31% average accuracy. In summary, we propose a curriculum disentanglement learning framework for unimodal segmentation with privileged semi-paired images. Moreover, the benefits of the improved unimodal segmentation extend to variable multi-modal segmentation, demonstrating that improving the unimodal segmentation performance is significant for brain tumor segmentation with missing modalities. Our code is available at https://github.com/scut-cszcl/SpBTS.
Collapse
Affiliation(s)
- Zecheng Liu
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China.
| | - Jia Wei
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China.
| | - Rui Li
- Golisano College of Computing and Information Sciences, Rochester Institute of Technology, Rochester, NY, USA.
| | - Jianlong Zhou
- Data Science Institute, University of Technology Sydney, Ultimo, NSW 2007, Australia.
| |
Collapse
|
23
|
Ammann C, Hadler T, Gröschel J, Kolbitsch C, Schulz-Menger J. Multilevel comparison of deep learning models for function quantification in cardiovascular magnetic resonance: On the redundancy of architectural variations. Front Cardiovasc Med 2023; 10:1118499. [PMID: 37144061 PMCID: PMC10151814 DOI: 10.3389/fcvm.2023.1118499] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 03/27/2023] [Indexed: 05/06/2023] Open
Abstract
Background Cardiac function quantification in cardiovascular magnetic resonance requires precise contouring of the heart chambers. This time-consuming task is increasingly being addressed by a plethora of ever more complex deep learning methods. However, only a small fraction of these have made their way from academia into clinical practice. In the quality assessment and control of medical artificial intelligence, the opaque reasoning and associated distinctive errors of neural networks meet an extraordinarily low tolerance for failure. Aim The aim of this study is a multilevel analysis and comparison of the performance of three popular convolutional neural network (CNN) models for cardiac function quantification. Methods U-Net, FCN, and MultiResUNet were trained for the segmentation of the left and right ventricles on short-axis cine images of 119 patients from clinical routine. The training pipeline and hyperparameters were kept constant to isolate the influence of network architecture. CNN performance was evaluated against expert segmentations for 29 test cases on contour level and in terms of quantitative clinical parameters. Multilevel analysis included breakdown of results by slice position, as well as visualization of segmentation deviations and linkage of volume differences to segmentation metrics via correlation plots for qualitative analysis. Results All models showed strong correlation to the expert with respect to quantitative clinical parameters (rz ' = 0.978, 0.977, 0.978 for U-Net, FCN, MultiResUNet respectively). The MultiResUNet significantly underestimated ventricular volumes and left ventricular myocardial mass. Segmentation difficulties and failures clustered in basal and apical slices for all CNNs, with the largest volume differences in the basal slices (mean absolute error per slice: 4.2 ± 4.5 ml for basal, 0.9 ± 1.3 ml for midventricular, 0.9 ± 0.9 ml for apical slices). Results for the right ventricle had higher variance and more outliers compared to the left ventricle. Intraclass correlation for clinical parameters was excellent (≥0.91) among the CNNs. Conclusion Modifications to CNN architecture were not critical to the quality of error for our dataset. Despite good overall agreement with the expert, errors accumulated in basal and apical slices for all models.
Collapse
Affiliation(s)
- Clemens Ammann
- Working Group on CMR, Experimental and Clinical Research Center, A cooperation between the Max Delbrück Center for Molecular Medicine in the Helmholtz Association and Charité — Universitätsmedizin Berlin, Berlin, Germany
- Charité — Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
- DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany
| | - Thomas Hadler
- Working Group on CMR, Experimental and Clinical Research Center, A cooperation between the Max Delbrück Center for Molecular Medicine in the Helmholtz Association and Charité — Universitätsmedizin Berlin, Berlin, Germany
- Charité — Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
- DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany
| | - Jan Gröschel
- Working Group on CMR, Experimental and Clinical Research Center, A cooperation between the Max Delbrück Center for Molecular Medicine in the Helmholtz Association and Charité — Universitätsmedizin Berlin, Berlin, Germany
- Charité — Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
- DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany
| | - Christoph Kolbitsch
- Physikalisch-Technische Bundesanstalt (PTB), Braunschweig and Berlin, Germany
| | - Jeanette Schulz-Menger
- Working Group on CMR, Experimental and Clinical Research Center, A cooperation between the Max Delbrück Center for Molecular Medicine in the Helmholtz Association and Charité — Universitätsmedizin Berlin, Berlin, Germany
- Charité — Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
- DZHK (German Centre for Cardiovascular Research), partner site Berlin, Berlin, Germany
- Department of Cardiology and Nephrology, HELIOS Hospital Berlin-Buch, Berlin, Germany
| |
Collapse
|
24
|
Li B, Yang T, Zhao X. NVTrans-UNet: Neighborhood vision transformer based U-Net for multi-modal cardiac MR image segmentation. J Appl Clin Med Phys 2023; 24:e13908. [PMID: 36651634 PMCID: PMC10018676 DOI: 10.1002/acm2.13908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 10/22/2022] [Accepted: 12/30/2022] [Indexed: 01/19/2023] Open
Abstract
With the rapid development of artificial intelligence and image processing technology, medical imaging technology has turned into a critical tool for clinical diagnosis and disease treatment. The extraction and segmentation of the regions of interest in cardiac images are crucial to the diagnosis of cardiovascular diseases. Due to the erratically diastolic and systolic cardiac, the boundaries of Magnetic Resonance (MR) images are quite fuzzy. Moreover, it is hard to provide complete information using a single modality due to the complex structure of the cardiac image. Furthermore, conventional CNN-based segmentation methods are weak in feature extraction. To overcome these challenges, we propose a multi-modal method for cardiac image segmentation, called NVTrans-UNet. Firstly, we employ the Neighborhood Vision Transformer (NVT) module, which takes advantage of Neighborhood Attention (NA) and inductive biases. It can better extract the local information of the cardiac image as well as reduce the computational cost. Secondly, we introduce a Multi-modal Gated Fusion (MGF) network, which can automatically adjust the contributions of different modal feature maps and make full use of multi-modal information. Thirdly, the bottleneck layer with Atrous Spatial Pyramid Pooling (ASPP) is proposed to expand the feature receptive field. Finally, the mixed loss is added to the cardiac image to focus the fuzzy boundary and realize accurate segmentation. We evaluated our model on MyoPS 2020 dataset. The Dice score of myocardial infarction (MI) was 0.642 ± 0.171, and the Dice score of myocardial infarction + edema (MI + ME) was 0.574 ± 0.110. Compared with the baseline, the MI increases by 11.2%, and the MI + ME increases by 12.5%. The results show the effectiveness of the proposed NVTrans-UNet in the segmentation of MI and ME.
Collapse
Affiliation(s)
- Bingjie Li
- School of Information Science and EngineeringHenan University of TechnologyZhengzhouChina
| | - Tiejun Yang
- School of Artificial Intelligence and Big DataHenan University of TechnologyZhengzhouChina
- Key Laboratory of Grain Information Processing and Control (HAUT)Ministry of EducationZhengzhouChina
- Henan Key Laboratory of Grain Photoelectric Detection and Control (HAUT)ZhengzhouHenanChina
| | - Xiang Zhao
- School of Information Science and EngineeringHenan University of TechnologyZhengzhouChina
| |
Collapse
|
25
|
Cui R, Yang R, Liu F, Geng H. HD 2A-Net: A novel dual gated attention network using comprehensive hybrid dilated convolutions for medical image segmentation. Comput Biol Med 2023; 152:106384. [PMID: 36493731 DOI: 10.1016/j.compbiomed.2022.106384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 11/19/2022] [Accepted: 11/28/2022] [Indexed: 12/03/2022]
Abstract
The convolutional neural networks (CNNs) have been widely proposed in the medical image analysis tasks, especially in the image segmentations. In recent years, the encoder-decoder structures, such as the U-Net, were rendered. However, the multi-scale information transmission and effective modeling for long-range feature dependencies in these structures were not sufficiently considered. To improve the performance of the existing methods, we propose a novel hybrid dual dilated attention network (HD2A-Net) to conduct the lesion region segmentations. In the proposed network, we innovatively present the comprehensive hybrid dilated convolution (CHDC) module, which facilitates the transmission of the multi-scale information. Based on the CHDC module and the attention mechanisms, we design a novel dual dilated gated attention (DDGA) block to enhance the saliency of related regions from the multi-scale aspect. Besides, a dilated dense (DD) block is designed to expand the receptive fields. The ablation studies were performed to verify our proposed blocks. Besides, the interpretability of the HD2A-Net was analyzed through the visualization of the attention weight maps from the key blocks. Compared to the state-of-the-art methods including CA-Net, DeepLabV3+, and Attention U-Net, the HD2A-Net outperforms significantly, with the metrics of Dice, Average Symmetric Surface Distance (ASSD), and mean Intersection-over-Union (mIoU) reaching 93.16%, 93.63%, and 94.72%, 0.36 pix, 0.69 pix, and 0.52 pix, and 88.03%, 88.67%, and 90.33% on three publicly available medical image datasets: MAEDE-MAFTOUNI (COVID-19 CT), ISIC-2018 (Melanoma Dermoscopy), and Kvasir-SEG (Gastrointestinal Disease Polyp), respectively.
Collapse
Affiliation(s)
- Rongsheng Cui
- College of Electronic Information and Optical Engineering, Nankai University, Tianjin, China
| | - Runzhuo Yang
- College of Electronic Information and Optical Engineering, Nankai University, Tianjin, China
| | - Feng Liu
- College of Electronic Information and Optical Engineering, Nankai University, Tianjin, China; Tianjin Key Laboratory of Optoelectronic Sensor and Sensing Network Technology, Nankai University, Tianjin, China.
| | - Hua Geng
- Department of Pathology, Tianjin Chest Hospital, Tianjin, China
| |
Collapse
|
26
|
Li Y, Wu C, Qi H, Si D, Ding H, Chen H. Motion correction for native myocardial T 1 mapping using self-supervised deep learning registration with contrast separation. NMR IN BIOMEDICINE 2022; 35:e4775. [PMID: 35599351 DOI: 10.1002/nbm.4775] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 05/15/2022] [Accepted: 05/18/2022] [Indexed: 06/15/2023]
Abstract
In myocardial T1 mapping, undesirable motion poses significant challenges because uncorrected motion can affect T1 estimation accuracy and cause incorrect diagnosis. In this study, we propose and evaluate a motion correction method for myocardial T1 mapping using self-supervised deep learning based registration with contrast separation (SDRAP). A sparse coding based method was first proposed to separate the contrast component from T1 -weighted (T1w) images. Then, a self-supervised deep neural network with cross-correlation (SDRAP-CC) or mutual information as the registration similarity measurement was developed to register contrast separated images, after which signal fitting was performed on the motion corrected T1w images to generate motion corrected T1 maps. The registration network was trained and tested in 80 healthy volunteers with images acquired using the modified Look-Locker inversion recovery (MOLLI) sequence. The proposed SDRAP was compared with the free form deformation (FFD) registration method regarding (1) Dice similarity coefficient (DSC) and mean boundary error (MBE) of myocardium contours, (2) T1 value and standard deviation (SD) of T1 fitting, (3) subjective evaluation score for overall image quality and motion artifact level, and (4) computation time. Results showed that SDRAP-CC achieved the highest DSC of 85.0 ± 3.9% and the lowest MBE of 0.92 ± 0.25 mm among the methods compared. Additionally, SDRAP-CC performed the best by resulting in lower SD value (28.1 ± 17.6 ms) and higher subjective image quality scores (3.30 ± 0.79 for overall quality and 3.53 ± 0.68 for motion artifact) evaluated by a cardiologist. The proposed SDRAP took only 0.52 s to register one slice of MOLLI images, achieving about sevenfold acceleration over FFD (3.7 s/slice).
Collapse
Affiliation(s)
- Yuze Li
- Center for Biomedical Imaging Research (CBIR), School of Medicine, Tsinghua University, Beijing, China
| | - Chunyan Wu
- Center for Biomedical Imaging Research (CBIR), School of Medicine, Tsinghua University, Beijing, China
| | - Haikun Qi
- School of Biomedical Engineering, ShanghaiTech University, Shanghai, China
| | - Dongyue Si
- Center for Biomedical Imaging Research (CBIR), School of Medicine, Tsinghua University, Beijing, China
| | - Haiyan Ding
- Center for Biomedical Imaging Research (CBIR), School of Medicine, Tsinghua University, Beijing, China
| | - Huijun Chen
- Center for Biomedical Imaging Research (CBIR), School of Medicine, Tsinghua University, Beijing, China
| |
Collapse
|
27
|
Liu X, Sanchez P, Thermos S, O'Neil AQ, Tsaftaris SA. Learning disentangled representations in the imaging domain. Med Image Anal 2022; 80:102516. [PMID: 35751992 DOI: 10.1016/j.media.2022.102516] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Revised: 04/05/2022] [Accepted: 06/10/2022] [Indexed: 12/12/2022]
Abstract
Disentangled representation learning has been proposed as an approach to learning general representations even in the absence of, or with limited, supervision. A good general representation can be fine-tuned for new target tasks using modest amounts of data, or used directly in unseen domains achieving remarkable performance in the corresponding task. This alleviation of the data and annotation requirements offers tantalising prospects for applications in computer vision and healthcare. In this tutorial paper, we motivate the need for disentangled representations, revisit key concepts, and describe practical building blocks and criteria for learning such representations. We survey applications in medical imaging emphasising choices made in exemplar key works, and then discuss links to computer vision applications. We conclude by presenting limitations, challenges, and opportunities.
Collapse
Affiliation(s)
- Xiao Liu
- School of Engineering, The University of Edinburgh, Edinburgh EH9 3FG, UK.
| | - Pedro Sanchez
- School of Engineering, The University of Edinburgh, Edinburgh EH9 3FG, UK
| | - Spyridon Thermos
- School of Engineering, The University of Edinburgh, Edinburgh EH9 3FG, UK
| | - Alison Q O'Neil
- School of Engineering, The University of Edinburgh, Edinburgh EH9 3FG, UK; Canon Medical Research Europe, Edinburgh EH6 5NP, UK
| | - Sotirios A Tsaftaris
- School of Engineering, The University of Edinburgh, Edinburgh EH9 3FG, UK; The Alan Turing Institute, London NW1 2DB, UK
| |
Collapse
|
28
|
Zhao SX, Chen Y, Yang KF, Luo Y, Ma BY, Li YJ. A Local and Global Feature Disentangled Network: Toward Classification of Benign-Malignant Thyroid Nodules From Ultrasound Image. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:1497-1509. [PMID: 34990353 DOI: 10.1109/tmi.2022.3140797] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Thyroid nodules are one of the most common nodular lesions. The incidence of thyroid cancer has increased rapidly in the past three decades and is one of the cancers with the highest incidence. As a non-invasive imaging modality, ultrasonography can identify benign and malignant thyroid nodules, and it can be used for large-scale screening. In this study, inspired by the domain knowledge of sonographers when diagnosing ultrasound images, a local and global feature disentangled network (LoGo-Net) is proposed to classify benign and malignant thyroid nodules. This model imitates the dual-pathway structure of human vision and establishes a new feature extraction method to improve the recognition performance of nodules. We use the tissue-anatomy disentangled (TAD) block to connect the dual pathways, which decouples the cues of local and global features based on the self-attention mechanism. To verify the effectiveness of the model, we constructed a large-scale dataset and conducted extensive experiments. The results show that our method achieves an accuracy of 89.33%, which has the potential to be used in the clinical practice of doctors, including early cancer screening procedures in remote or resource-poor areas.
Collapse
|
29
|
Wang C, Yang G, Papanastasiou G. Unsupervised Image Registration towards Enhancing Performance and Explainability in Cardiac and Brain Image Analysis. SENSORS (BASEL, SWITZERLAND) 2022; 22:2125. [PMID: 35336295 PMCID: PMC8951078 DOI: 10.3390/s22062125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 03/01/2022] [Accepted: 03/07/2022] [Indexed: 02/04/2023]
Abstract
Magnetic Resonance Imaging (MRI) typically recruits multiple sequences (defined here as "modalities"). As each modality is designed to offer different anatomical and functional clinical information, there are evident disparities in the imaging content across modalities. Inter- and intra-modality affine and non-rigid image registration is an essential medical image analysis process in clinical imaging, as for example before imaging biomarkers need to be derived and clinically evaluated across different MRI modalities, time phases and slices. Although commonly needed in real clinical scenarios, affine and non-rigid image registration is not extensively investigated using a single unsupervised model architecture. In our work, we present an unsupervised deep learning registration methodology that can accurately model affine and non-rigid transformations, simultaneously. Moreover, inverse-consistency is a fundamental inter-modality registration property that is not considered in deep learning registration algorithms. To address inverse consistency, our methodology performs bi-directional cross-modality image synthesis to learn modality-invariant latent representations, and involves two factorised transformation networks (one per each encoder-decoder channel) and an inverse-consistency loss to learn topology-preserving anatomical transformations. Overall, our model (named "FIRE") shows improved performances against the reference standard baseline method (i.e., Symmetric Normalization implemented using the ANTs toolbox) on multi-modality brain 2D and 3D MRI and intra-modality cardiac 4D MRI data experiments. We focus on explaining model-data components to enhance model explainability in medical image registration. On computational time experiments, we show that the FIRE model performs on a memory-saving mode, as it can inherently learn topology-preserving image registration directly in the training phase. We therefore demonstrate an efficient and versatile registration technique that can have merit in multi-modal image registrations in the clinical setting.
Collapse
Affiliation(s)
- Chengjia Wang
- Edinburgh Imaging Facility QMRI, Centre for Cardiovascular Science, University of Edinburgh, Edinburgh EH16 4TJ, UK;
| | - Guang Yang
- Faculty of Medicine, National Heart & Lung Institute, Imperial College London, London SW7 2BX, UK
| | - Giorgos Papanastasiou
- Edinburgh Imaging Facility QMRI, Centre for Cardiovascular Science, University of Edinburgh, Edinburgh EH16 4TJ, UK;
- School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, UK
| |
Collapse
|
30
|
Decomposing normal and abnormal features of medical images for content-based image retrieval of glioma imaging. Med Image Anal 2021; 74:102227. [PMID: 34543911 DOI: 10.1016/j.media.2021.102227] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 09/05/2021] [Accepted: 09/06/2021] [Indexed: 11/20/2022]
Abstract
In medical imaging, the characteristics purely derived from a disease should reflect the extent to which abnormal findings deviate from the normal features. Indeed, physicians often need corresponding images without abnormal findings of interest or, conversely, images that contain similar abnormal findings regardless of normal anatomical context. This is called comparative diagnostic reading of medical images, which is essential for a correct diagnosis. To support comparative diagnostic reading, content-based image retrieval (CBIR) that can selectively utilize normal and abnormal features in medical images as two separable semantic components will be useful. In this study, we propose a neural network architecture to decompose the semantic components of medical images into two latent codes: normal anatomy code and abnormal anatomy code. The normal anatomy code represents counterfactual normal anatomies that should have existed if the sample is healthy, whereas the abnormal anatomy code attributes to abnormal changes that reflect deviation from the normal baseline. By calculating the similarity based on either normal or abnormal anatomy codes or the combination of the two codes, our algorithm can retrieve images according to the selected semantic component from a dataset consisting of brain magnetic resonance images of gliomas. Moreover, it can utilize a synthetic query vector combining normal and abnormal anatomy codes from two different query images. To evaluate whether the retrieved images are acquired according to the targeted semantic component, the overlap of the ground-truth labels is calculated as metrics of the semantic consistency. Our algorithm provides a flexible CBIR framework by handling the decomposed features with qualitatively and quantitatively remarkable results.
Collapse
|
31
|
Representation Disentanglement for Multi-modal Brain MRI Analysis. INFORMATION PROCESSING IN MEDICAL IMAGING : PROCEEDINGS OF THE ... CONFERENCE 2021; 12729:321-333. [PMID: 35173402 DOI: 10.1007/978-3-030-78191-0_25] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Multi-modal MRIs are widely used in neuroimaging applications since different MR sequences provide complementary information about brain structures. Recent works have suggested that multi-modal deep learning analysis can benefit from explicitly disentangling anatomical (shape) and modality (appearance) information into separate image presentations. In this work, we challenge mainstream strategies by showing that they do not naturally lead to representation disentanglement both in theory and in practice. To address this issue, we propose a margin loss that regularizes the similarity in relationships of the representations across subjects and modalities. To enable robust training, we further use a conditional convolution to design a single model for encoding images of all modalities. Lastly, we propose a fusion function to combine the disentangled anatomical representations as a set of modality-invariant features for downstream tasks. We evaluate the proposed method on three multi-modal neuroimaging datasets. Experiments show that our proposed method can achieve superior disentangled representations compared to existing disentanglement strategies. Results also indicate that the fused anatomical representation has potential in the downstream task of zero-dose PET reconstruction and brain tumor segmentation.
Collapse
|