1
|
Liu J, Shen N, Wang W, Li X, Wang W, Yuan Y, Tian Y, Luo G, Wang K. Lightweight cross-resolution coarse-to-fine network for efficient deformable medical image registration. Med Phys 2025. [PMID: 40280883 DOI: 10.1002/mp.17827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 03/10/2025] [Accepted: 03/25/2025] [Indexed: 04/29/2025] Open
Abstract
BACKGROUND Accurate and efficient deformable medical image registration is crucial in medical image analysis. While recent deep learning-based registration methods have achieved state-of-the-art accuracy, they often suffer from extensive network parameters and slow inference times, leading to inefficiency. Efforts to reduce model size and input resolution can improve computational efficiency but frequently result in suboptimal accuracy. PURPOSE To address the trade-off between high accuracy and efficiency, we propose a Lightweight Cross-Resolution Coarse-to-Fine registration framework, termed LightCRCF. METHODS Our method is built on an ultra-lightweight U-Net architecture with only 0.1 million parameters, offering remarkable efficiency. To mitigate accuracy degradation resulting from fewer parameters while preserving the lightweight nature of the networks, LightCRCF introduces three key innovations as follows: (1) selecting an efficient cross-resolution coarse-to-fine (C2F) registration strategy and integrating it into the lightweight network to progressively decompose the deformation fields into multiresolution subfields to capture fine-grained deformations; (2) a Texture-aware Reparameterization (TaRep) module that integrates Sobel and Laplacian operators to extract rich textural information; (3) a Group-flow Reparameterization (GfRep) module that captures diverse deformation modes by decomposing the deformation field into multiple groups. Furthermore, we introduce a structural reparameterization technique that enhances training accuracy through multibranch structures of the TaRep and GfRep modules, while maintaining efficient inference by equivalently transforming these multibranch structures into single-branch standard convolutions. RESULTS We evaluate LightCRCF against various methods on the three public MRI datasets (LPBA, OASIS, and ACDC) and one CT dataset (abdomen CT). Following the previous data division methods, the LPBA dataset comprises 30 training image pairs and nine testing image pairs. For the OASIS dataset, the training, validation, and testing data consist of 1275, 110, and 660 image pairs, respectively. Similarly, for the ACDC dataset, the training, validation, and testing data include 180, 20, and 100 image pairs, respectively. For intersubject registration of the abdomen CT dataset, there are 380 training pairs, six validation pairs, and 42 testing pairs. Compared to state-of-the-art C2F methods, LightCRCF achieves comparable accuracy scores (DSC, HD95, and MSE), while demonstrating significantly superior performance across all efficiency metrics (Params, VRAM, FLOPs, and inference time). Relative to efficiency-first approaches, LightCRCF significantly outperforms these methods in accuracy metrics. CONCLUSIONS Our LightCRCF method offers a favorable trade-off between accuracy and efficiency, maintaining high accuracy while achieving superior efficiency, thereby highlighting its potential for clinical applications. The code will be available at https://github.com/PerceptionComputingLab/LightCRCF.
Collapse
Affiliation(s)
- Jun Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Nuo Shen
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Wenyi Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Xiangyu Li
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Wei Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen, Shenzhen, Guangdong, China
| | - Yongfeng Yuan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Ye Tian
- Department of Cardiology, The First Affiliated Hospital, Cardiovascular Institute, Harbin Medical University, Harbin, Heilongjiang, China
| | - Gongning Luo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Kuanquan Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| |
Collapse
|
2
|
Zhang J, Zeng X. M2OCNN: Many-to-One Collaboration Neural Networks for simultaneously multi-modal medical image synthesis and fusion. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 261:108612. [PMID: 39908634 DOI: 10.1016/j.cmpb.2025.108612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Revised: 01/10/2025] [Accepted: 01/19/2025] [Indexed: 02/07/2025]
Abstract
BACKGROUND AND OBJECTIVE Acquiring comprehensive information from multi-modal medical images remains a challenge in clinical diagnostics and treatment, due to complex inter-modal dependencies and missing modalities. While cross-modal medical image synthesis (CMIS) and multi-modal medical image fusion (MMIF) address certain issues, existing methods typically treat these as separate tasks, lacking a unified framework that can generate both synthesized and fused images in the presence of missing modalities. METHODS In this paper, we propose the Many-to-One Collaboration Neural Network (M2OCNN), a unified model designed to simultaneously address CMIS and MMIF. Unlike traditional approaches, M2OCNN treats fusion as a specific form of synthesis and provides a comprehensive solution even when modalities are missing. The network consists of three modules: the Parallel Untangling Hybrid Network, Comprehensive Feature Router, and Series Omni-modal Hybrid Network. Additionally, we introduce a mixed-resolution attention mechanism and two transformer variants, Coarsormer and ReCoarsormer, to suppress high-frequency interference and enhance model performance. M2OCNN outperformed state-of-the-art methods on three multi-modal medical imaging datasets, achieving an average PSNR improvement of 2.4 dB in synthesis tasks and producing high-quality fusion images despite missing modalities. The source code is available at https://github.com/zjno108/M2OCNN. CONCLUSION M2OCNN offers a novel solution by unifying CMIS and MMIF tasks in a single framework, enabling the generation of both synthesized and fused images from a single modality. This approach sets a new direction for research in multi-modal medical imaging, with implications for improving clinical diagnosis and treatment.
Collapse
Affiliation(s)
- Jian Zhang
- Chongqing Key Laboratory of Image Cognition, College of Computer Science and Technology, Chongqing University of Posts and Telecommunication, Chongqing, 400065, China.
| | - Xianhua Zeng
- Chongqing Key Laboratory of Image Cognition, College of Computer Science and Technology, Chongqing University of Posts and Telecommunication, Chongqing, 400065, China.
| |
Collapse
|
3
|
Pelcat A, Le Berre A, Ben Hassen W, Debacker C, Charron S, Thirion B, Legrand L, Turc G, Oppenheim C, Benzakoun J. Generative T2*-weighted images as a substitute for true T2*-weighted images on brain MRI in patients with acute stroke. Diagn Interv Imaging 2025:S2211-5684(25)00048-8. [PMID: 40113490 DOI: 10.1016/j.diii.2025.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2024] [Revised: 03/11/2025] [Accepted: 03/14/2025] [Indexed: 03/22/2025]
Abstract
PURPOSE The purpose of this study was to validate a deep learning algorithm that generates T2*-weighted images from diffusion-weighted (DW) images and to compare its performance with that of true T2*-weighted images for hemorrhage detection on MRI in patients with acute stroke. MATERIALS AND METHODS This single-center, retrospective study included DW- and T2*-weighted images obtained less than 48 hours after symptom onset in consecutive patients admitted for acute stroke. Datasets were divided into training (60 %), validation (20 %), and test (20 %) sets, with stratification by stroke type (hemorrhagic/ischemic). A generative adversarial network was trained to produce generative T2*-weighted images using DW images. Concordance between true T2*-weighted images and generative T2*-weighted images for hemorrhage detection was independently graded by two readers into three categories (parenchymal hematoma, hemorrhagic infarct or no hemorrhage), and discordances were resolved by consensus reading. Sensitivity, specificity and accuracy of generative T2*-weighted images were estimated using true T2*-weighted images as the standard of reference. RESULTS A total of 1491 MRI sets from 939 patients (487 women, 452 men) with a median age of 71 years (first quartile, 57; third quartile, 81; range: 21-101) were included. In the test set (n = 300), there were no differences between true T2*-weighted images and generative T2*-weighted images for intraobserver reproducibility (κ = 0.97 [95 % CI: 0.95-0.99] vs. 0.95 [95 % CI: 0.92-0.97]; P = 0.27) and interobserver reproducibility (κ = 0.93 [95 % CI: 0.90-0.97] vs. 0.92 [95 % CI: 0.88-0.96]; P = 0.64). After consensus reading, concordance between true T2*-weighted images and generative T2*-weighted images was excellent (κ = 0.92; 95 % CI: 0.91-0.96). Generative T2*-weighted images achieved 90 % sensitivity (73/81; 95 % CI: 81-96), 97 % specificity (213/219; 95 % CI: 94-99) and 95 % accuracy (286/300; 95 % CI: 92-97) for the diagnosis of any cerebral hemorrhage (hemorrhagic infarct or parenchymal hemorrhage). CONCLUSION Generative T2*-weighted images and true T2*-weighted images have non-different diagnostic performances for hemorrhage detection in patients with acute stroke and may be used to shorten MRI protocols.
Collapse
Affiliation(s)
- Antoine Pelcat
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-BRAIN, 75014 Paris, France
| | - Alice Le Berre
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-BRAIN, 75014 Paris, France; GHU Paris Psychiatrie et Neurosciences, Hôpital Sainte Anne, Department of Neuroradiology, 75014 Paris, France
| | - Wagih Ben Hassen
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-BRAIN, 75014 Paris, France; GHU Paris Psychiatrie et Neurosciences, Hôpital Sainte Anne, Department of Neuroradiology, 75014 Paris, France
| | - Clement Debacker
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-BRAIN, 75014 Paris, France; GHU Paris Psychiatrie et Neurosciences, Hôpital Sainte Anne, Department of Neuroradiology, 75014 Paris, France
| | - Sylvain Charron
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-BRAIN, 75014 Paris, France
| | - Bertrand Thirion
- INRIA, CEA, Université Paris-Saclay, MIND Team, 91400 Palaiseau, France
| | - Laurence Legrand
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-BRAIN, 75014 Paris, France; GHU Paris Psychiatrie et Neurosciences, Hôpital Sainte Anne, Department of Neuroradiology, 75014 Paris, France
| | - Guillaume Turc
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, Stroke Team, 75014 Paris, France; GHU Paris Psychiatrie et Neurosciences, Hôpital Sainte Anne, Department of Neurology, 75014 Paris, France
| | - Catherine Oppenheim
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-BRAIN, 75014 Paris, France; GHU Paris Psychiatrie et Neurosciences, Hôpital Sainte Anne, Department of Neuroradiology, 75014 Paris, France
| | - Joseph Benzakoun
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-BRAIN, 75014 Paris, France; GHU Paris Psychiatrie et Neurosciences, Hôpital Sainte Anne, Department of Neuroradiology, 75014 Paris, France.
| |
Collapse
|
4
|
Cao B, Qi G, Zhao J, Zhu P, Hu Q, Gao X. RTF: Recursive TransFusion for Multi-Modal Image Synthesis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:1573-1587. [PMID: 40031796 DOI: 10.1109/tip.2025.3541877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Multi-modal image synthesis is crucial for obtaining complete modalities due to the imaging restrictions in reality. Current methods, primarily CNN-based models, find it challenging to extract global representations because of local inductive bias, leading to synthetic structure deformation or color distortion. Despite the significant global representation ability of transformer in capturing long-range dependencies, its huge parameter size requires considerable training data. Multi-modal synthesis solely based on one of the two structures makes it hard to extract comprehensive information from each modality with limited data. To tackle this dilemma, we propose a simple yet effective Recursive TransFusion (RTF) framework for multi-modal image synthesis. Specifically, we develop a TransFusion unit to integrate local knowledge extracted from the individual modality by connecting a CNN-based local representation block (LRB) and a transformer-based global fusion block (GFB) via a feature translating gate (FTG). Considering the numerous parameters introduced by the transformer, we further unfold a TransFusion unit with recursive constraint repeatedly, forming recursive TransFusion (RTF), which progressively extracts multi-modal information at different depths. Our RTF remarkably reduces network parameters while maintaining superior performance. Extensive experiments validate our superiority against the competing methods on multiple benchmarks. The source code will be available at https://github.com/guoliangq/RTF.
Collapse
|
5
|
Soltanpour S, Chang A, Madularu D, Kulkarni P, Ferris C, Joslin C. 3D Wasserstein Generative Adversarial Network with Dense U-Net-Based Discriminator for Preclinical fMRI Denoising. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01434-5. [PMID: 39939477 DOI: 10.1007/s10278-025-01434-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2024] [Revised: 01/21/2025] [Accepted: 01/28/2025] [Indexed: 02/14/2025]
Abstract
Functional magnetic resonance imaging (fMRI) is extensively used in clinical and preclinical settings to study brain function; however, fMRI data is inherently noisy due to physiological processes, hardware, and external noise. Denoising is one of the main preprocessing steps in any fMRI analysis pipeline. This process is challenging in preclinical data in comparison to clinical data due to variations in brain geometry, image resolution, and low signal-to-noise ratios. In this paper, we propose a structure-preserved algorithm based on a 3D Wasserstein generative adversarial network with a 3D dense U-net-based discriminator called 3D U-WGAN. We apply a 4D data configuration to effectively denoise temporal and spatial information in analyzing preclinical fMRI data. GAN-based denoising methods often utilize a discriminator to identify significant differences between denoised and noise-free images, focusing on global or local features. To refine the fMRI denoising model, our method employs a 3D dense U-Net discriminator to learn both global and local distinctions. To tackle potential oversmoothing, we introduce an adversarial loss and enhance perceptual similarity by measuring feature space distances. Experiments illustrate that 3D U-WGAN significantly improves image quality in resting-state and task preclinical fMRI data, enhancing signal-to-noise ratio without introducing excessive structural changes in existing methods. The proposed method outperforms state-of-the-art methods when applied to simulated and real data in a fMRI analysis pipeline.
Collapse
Affiliation(s)
- Sima Soltanpour
- School of Information Technology, Carleton University, 1125 Colonel By Dr, Ottawa, Ontario, K1S 5B6, Canada.
| | - Arnold Chang
- Center for Translational NeuroImaging (CTNI), Northeastern University, 360 Huntington Ave, Boston, MA, 02115, USA
| | - Dan Madularu
- Department of Psychology, Carleton University, 1125 Colonel By Dr, Ottawa, Ontario, K1S 5B6, Canada
- Tessellis Ltd., 350 Legget Drive, Ottawa, Ontario, K2K 0G7, Canada
| | - Praveen Kulkarni
- Center for Translational NeuroImaging (CTNI), Northeastern University, 360 Huntington Ave, Boston, MA, 02115, USA
| | - Craig Ferris
- Center for Translational NeuroImaging (CTNI), Northeastern University, 360 Huntington Ave, Boston, MA, 02115, USA
| | - Chris Joslin
- School of Information Technology, Carleton University, 1125 Colonel By Dr, Ottawa, Ontario, K1S 5B6, Canada
| |
Collapse
|
6
|
Zhang T, Pang H, Wu Y, Xu J, Liu L, Li S, Xia S, Chen R, Liang Z, Qi S. BreathVisionNet: A pulmonary-function-guided CNN-transformer hybrid model for expiratory CT image synthesis. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 259:108516. [PMID: 39571504 DOI: 10.1016/j.cmpb.2024.108516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 10/15/2024] [Accepted: 11/13/2024] [Indexed: 12/11/2024]
Abstract
BACKGROUND AND OBJECTIVE Chronic obstructive pulmonary disease (COPD) has high heterogeneity in etiologies and clinical manifestations. Expiratory Computed tomography (CT) can effectively assess air trapping, aiding in disease diagnosis. However, due to concerns about radiation exposure and cost, expiratory CT is not routinely performed. Recent work on synthesizing expiratory CT has primarily focused on imaging features while neglecting patient-specific pulmonary function. METHODS To address these issues, we developed a novel model named BreathVisionNet that incorporates pulmonary function data to guide the synthesis of expiratory CT from inspiratory CT. An architecture combining a convolutional neural network and transformer is introduced to leverage the irregular phenotypic distribution in COPD patients. The model can better understand the long-range and global contexts by incorporating global information into the encoder. The utilization of edge information and multi-view data further enhances the quality of the synthesized CT. Parametric response mapping (PRM) can be estimated by using synthesized expiratory CT and inspiratory CT to quantify COPD phenotypes of the normal, emphysema, and functional small airway disease (fSAD), including their percentages, spatial distributions, and voxel distribution maps. RESULTS BreathVisionNet outperforms other generative models in terms of synthesized image quality. It achieves a mean absolute error, normalized mean square error, structural similarity index and peak signal-to-noise ratio of 78.207 HU, 0.643, 0.847 and 25.828 dB, respectively. Comparing the predicted and real PRM, the Dice coefficient can reach 0.732 (emphysema) and 0.560 (fSAD). The mean of differences between true and predicted fSAD percentage is 4.42 for the development dataset (low radiation dose CT scans), and 9.05 for an independent external validation dataset (routine dose), indicating that model has great generalizability. A classifier trained on voxel distribution maps can achieve an accuracy of 0.891 in predicting the presence of COPD. CONCLUSIONS BreathVisionNet can accurately synthesize expiratory CT images from inspiratory CT and predict their voxel distribution. The estimated PRM can help to quantify COPD phenotypes of the normal, emphysema, and fSAD. This capability provides additional insights into COPD diversity while only inspiratory CT images are available.
Collapse
Affiliation(s)
- Tiande Zhang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, China
| | - Haowen Pang
- School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing, China
| | - Yanan Wu
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
| | - Jiaxuan Xu
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, The National Center for Respiratory Medicine, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Lingkai Liu
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, China
| | - Shang Li
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, China
| | - Shuyue Xia
- Department of Respiratory and Critical Care Medicine, Central Hospital Affiliated to Shenyang Medical College, Shenyang, China
| | - Rongchang Chen
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, The National Center for Respiratory Medicine, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China; Hetao Institute of Guangzhou National Laboratory, Guangzhou China
| | - Zhenyu Liang
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, The National Center for Respiratory Medicine, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
| | - Shouliang Qi
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, China; Department of Respiratory and Critical Care Medicine, Central Hospital Affiliated to Shenyang Medical College, Shenyang, China.
| |
Collapse
|
7
|
Zhong L, Xiao R, Shu H, Zheng K, Li X, Wu Y, Ma J, Feng Q, Yang W. NCCT-to-CECT synthesis with contrast-enhanced knowledge and anatomical perception for multi-organ segmentation in non-contrast CT images. Med Image Anal 2025; 100:103397. [PMID: 39612807 DOI: 10.1016/j.media.2024.103397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 09/06/2024] [Accepted: 11/15/2024] [Indexed: 12/01/2024]
Abstract
Contrast-enhanced computed tomography (CECT) is constantly used for delineating organs-at-risk (OARs) in radiation therapy planning. The delineated OARs are needed to transfer from CECT to non-contrast CT (NCCT) for dose calculation. Yet, the use of iodinated contrast agents (CA) in CECT and the dose calculation errors caused by the spatial misalignment between NCCT and CECT images pose risks of adverse side effects. A promising solution is synthesizing CECT images from NCCT scans, which can improve the visibility of organs and abnormalities for more effective multi-organ segmentation in NCCT images. However, existing methods neglect the difference between tissues induced by CA and lack the ability to synthesize the details of organ edges and blood vessels. To address these issues, we propose a contrast-enhanced knowledge and anatomical perception network (CKAP-Net) for NCCT-to-CECT synthesis. CKAP-Net leverages a contrast-enhanced knowledge learning network to capture both similarities and dissimilarities in domain characteristics attributable to CA. Specifically, a CA-based perceptual loss function is introduced to enhance the synthesis of CA details. Furthermore, we design a multi-scale anatomical perception transformer that utilizes multi-scale anatomical information from NCCT images, enabling the precise synthesis of tissue details. Our CKAP-Net is evaluated on a multi-center abdominal NCCT-CECT dataset, a head an neck NCCT-CECT dataset, and an NCMRI-CEMRI dataset. It achieves a MAE of 25.96 ± 2.64, a SSIM of 0.855 ± 0.017, and a PSNR of 32.60 ± 0.02 for CECT synthesis, and a DSC of 81.21 ± 4.44 for segmentation on the internal dataset. Extensive experiments demonstrate that CKAP-Net outperforms state-of-the-art CA synthesis methods and has better generalizability across different datasets.
Collapse
Affiliation(s)
- Liming Zhong
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Guangzhou, 510515, China
| | - Ruolin Xiao
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Guangzhou, 510515, China
| | - Hai Shu
- Department of Biostatistics, School of Global Public Health, New York University, NY, USA
| | - Kaiyi Zheng
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Guangzhou, 510515, China
| | - Xinming Li
- Department of Radiology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Yuankui Wu
- Department of Medical Imaging Center, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, China
| | - Jianhua Ma
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Qianjin Feng
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Guangzhou, 510515, China
| | - Wei Yang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Guangzhou, 510515, China.
| |
Collapse
|
8
|
Feng Y, Deng S, Lyu J, Cai J, Wei M, Qin J. Bridging MRI Cross-Modality Synthesis and Multi-Contrast Super-Resolution by Fine-Grained Difference Learning. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:373-383. [PMID: 39159018 DOI: 10.1109/tmi.2024.3445969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/21/2024]
Abstract
In multi-modal magnetic resonance imaging (MRI), the tasks of imputing or reconstructing the target modality share a common obstacle: the accurate modeling of fine-grained inter-modal differences, which has been sparingly addressed in current literature. These differences stem from two sources: 1) spatial misalignment remaining after coarse registration and 2) structural distinction arising from modality-specific signal manifestations. This paper integrates the previously separate research trajectories of cross-modality synthesis (CMS) and multi-contrast super-resolution (MCSR) to address this pervasive challenge within a unified framework. Connected through generalized down-sampling ratios, this unification not only emphasizes their common goal in reducing structural differences, but also identifies the key task distinguishing MCSR from CMS: modeling the structural distinctions using the limited information from the misaligned target input. Specifically, we propose a composite network architecture with several key components: a label correction module to align the coordinates of multi-modal training pairs, a CMS module serving as the base model, an SR branch to handle target inputs, and a difference projection discriminator for structural distinction-centered adversarial training. When training the SR branch as the generator, the adversarial learning is enhanced with distinction-aware incremental modulation to ensure better-controlled generation. Moreover, the SR branch integrates deformable convolutions to address cross-modal spatial misalignment at the feature level. Experiments conducted on three public datasets demonstrate that our approach effectively balances structural accuracy and realism, exhibiting overall superiority in comprehensive evaluations for both tasks over current state-of-the-art approaches. The code is available at https://github.com/papshare/FGDL.
Collapse
|
9
|
Zhang Y, Li L, Wang J, Yang X, Zhou H, He J, Xie Y, Jiang Y, Sun W, Zhang X, Zhou G, Zhang Z. Texture-preserving diffusion model for CBCT-to-CT synthesis. Med Image Anal 2025; 99:103362. [PMID: 39393132 DOI: 10.1016/j.media.2024.103362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 08/07/2024] [Accepted: 09/29/2024] [Indexed: 10/13/2024]
Abstract
Cone beam computed tomography (CBCT) serves as a vital imaging modality in diverse clinical applications, but is constrained by inherent limitations such as reduced image quality and increased noise. In contrast, computed tomography (CT) offers superior resolution and tissue contrast. Bridging the gap between these modalities through CBCT-to-CT synthesis becomes imperative. Deep learning techniques have enhanced this synthesis, yet challenges with generative adversarial networks persist. Denoising Diffusion Probabilistic Models have emerged as a promising alternative in image synthesis. In this study, we propose a novel texture-preserving diffusion model for CBCT-to-CT synthesis that incorporates adaptive high-frequency optimization and a dual-mode feature fusion module. Our method aims to enhance high-frequency details, effectively fuse cross-modality features, and preserve fine image structures. Extensive validation demonstrates superior performance over existing methods, showcasing better generalization. The proposed model offers a transformative pathway to augment diagnostic accuracy and refine treatment planning across various clinical settings. This work represents a pivotal step toward non-invasive, safer, and high-quality CBCT-to-CT synthesis, advancing personalized diagnostic imaging practices.
Collapse
Affiliation(s)
| | - Li Li
- JancsiLab, JancsiTech, Hong Kong, China
| | - Jie Wang
- JancsiLab, JancsiTech, Hong Kong, China
| | | | | | - Jiahui He
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Guangdong, 518055, China; School of Computer Science, Faculty of Science and Engineering, University of Nottingham Ningbo China, Zhejiang 315100, China
| | - Yaoqin Xie
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Guangdong, 518055, China
| | - Yuming Jiang
- Department of Radiation Oncology, Wake Forest University School of Medicine, Winston Salem, NC, USA
| | - Wei Sun
- University of Science and Technology of China, Anhui, 230026, China
| | - Xinyuan Zhang
- School of Biomedical Engineering, Southern Medical University, Guangdong, China
| | - Guanqun Zhou
- JancsiLab, JancsiTech, Hong Kong, China; Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Guangdong, 518055, China.
| | - Zhicheng Zhang
- JancsiLab, JancsiTech, Hong Kong, China; Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Guangdong, 518055, China.
| |
Collapse
|
10
|
Cui J, Zeng P, Zeng X, Xu Y, Wang P, Zhou J, Wang Y, Shen D. Prior Knowledge-Guided Triple-Domain Transformer-GAN for Direct PET Reconstruction From Low-Count Sinograms. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:4174-4189. [PMID: 38869996 DOI: 10.1109/tmi.2024.3413832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]
Abstract
To obtain high-quality positron emission tomography (PET) images while minimizing radiation exposure, numerous methods have been dedicated to acquiring standard-count PET (SPET) from low-count PET (LPET). However, current methods have failed to take full advantage of the different emphasized information from multiple domains, i.e., the sinogram, image, and frequency domains, resulting in the loss of crucial details. Meanwhile, they overlook the unique inner-structure of the sinograms, thereby failing to fully capture its structural characteristics and relationships. To alleviate these problems, in this paper, we proposed a prior knowledge-guided transformer-GAN that unites triple domains of sinogram, image, and frequency to directly reconstruct SPET images from LPET sinograms, namely PK-TriDo. Our PK-TriDo consists of a Sinogram Inner-Structure-based Denoising Transformer (SISD-Former) to denoise the input LPET sinogram, a Frequency-adapted Image Reconstruction Transformer (FaIR-Former) to reconstruct high-quality SPET images from the denoised sinograms guided by the image domain prior knowledge, and an Adversarial Network (AdvNet) to further enhance the reconstruction quality via adversarial training. Specifically tailored for the PET imaging mechanism, we injected a sinogram embedding module that partitions the sinograms by rows and columns to obtain 1D sequences of angles and distances to faithfully preserve the inner-structure of the sinograms. Moreover, to mitigate high-frequency distortions and enhance reconstruction details, we integrated global-local frequency parsers (GLFPs) into FaIR-Former to calibrate the distributions and proportions of different frequency bands, thus compelling the network to preserve high-frequency details. Evaluations on three datasets with different dose levels and imaging scenarios demonstrated that our PK-TriDo outperforms the state-of-the-art methods.
Collapse
|
11
|
Ozyoruk KB, Harmon SA, Lay NS, Yilmaz EC, Bagci U, Citrin DE, Wood BJ, Pinto PA, Choyke PL, Turkbey B. AI-ADC: Channel and Spatial Attention-Based Contrastive Learning to Generate ADC Maps from T2W MRI for Prostate Cancer Detection. J Pers Med 2024; 14:1047. [PMID: 39452554 PMCID: PMC11508265 DOI: 10.3390/jpm14101047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Revised: 09/28/2024] [Accepted: 10/06/2024] [Indexed: 10/26/2024] Open
Abstract
BACKGROUND/OBJECTIVES Apparent Diffusion Coefficient (ADC) maps in prostate MRI can reveal tumor characteristics, but their accuracy can be compromised by artifacts related with patient motion or rectal gas associated distortions. To address these challenges, we propose a novel approach that utilizes a Generative Adversarial Network to synthesize ADC maps from T2-weighted magnetic resonance images (T2W MRI). METHODS By leveraging contrastive learning, our model accurately maps axial T2W MRI to ADC maps within the cropped region of the prostate organ boundary, capturing subtle variations and intricate structural details by learning similar and dissimilar pairs from two imaging modalities. We trained our model on a comprehensive dataset of unpaired T2-weighted images and ADC maps from 506 patients. In evaluating our model, named AI-ADC, we compared it against three state-of-the-art methods: CycleGAN, CUT, and StyTr2. RESULTS Our model demonstrated a higher mean Structural Similarity Index (SSIM) of 0.863 on a test dataset of 3240 2D MRI slices from 195 patients, compared to values of 0.855, 0.797, and 0.824 for CycleGAN, CUT, and StyTr2, respectively. Similarly, our model achieved a significantly lower Fréchet Inception Distance (FID) value of 31.992, compared to values of 43.458, 179.983, and 58.784 for the other three models, indicating its superior performance in generating ADC maps. Furthermore, we evaluated our model on 147 patients from the publicly available ProstateX dataset, where it demonstrated a higher SSIM of 0.647 and a lower FID of 113.876 compared to the other three models. CONCLUSIONS These results highlight the efficacy of our proposed model in generating ADC maps from T2W MRI, showcasing its potential for enhancing clinical diagnostics and radiological workflows.
Collapse
Affiliation(s)
- Kutsev Bengisu Ozyoruk
- Artificial Intelligence Resource, Molecular Imaging Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA; (K.B.O.); (S.A.H.); (N.S.L.); (E.C.Y.); (P.L.C.)
| | - Stephanie A. Harmon
- Artificial Intelligence Resource, Molecular Imaging Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA; (K.B.O.); (S.A.H.); (N.S.L.); (E.C.Y.); (P.L.C.)
| | - Nathan S. Lay
- Artificial Intelligence Resource, Molecular Imaging Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA; (K.B.O.); (S.A.H.); (N.S.L.); (E.C.Y.); (P.L.C.)
| | - Enis C. Yilmaz
- Artificial Intelligence Resource, Molecular Imaging Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA; (K.B.O.); (S.A.H.); (N.S.L.); (E.C.Y.); (P.L.C.)
| | - Ulas Bagci
- Radiology and Biomedical Engineering Department, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA;
| | - Deborah E. Citrin
- Radiation Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20814, USA;
| | - Bradford J. Wood
- Center for Interventional Oncology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20814, USA;
- Department of Radiology, Clinical Center, National Institutes of Health, Bethesda, MD 20814, USA
| | - Peter A. Pinto
- Urologic Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20814, USA;
| | - Peter L. Choyke
- Artificial Intelligence Resource, Molecular Imaging Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA; (K.B.O.); (S.A.H.); (N.S.L.); (E.C.Y.); (P.L.C.)
| | - Baris Turkbey
- Artificial Intelligence Resource, Molecular Imaging Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA; (K.B.O.); (S.A.H.); (N.S.L.); (E.C.Y.); (P.L.C.)
| |
Collapse
|
12
|
He Y, Liu Z, Qi M, Ding S, Zhang P, Song F, Ma C, Wu H, Cai R, Feng Y, Zhang H, Zhang T, Zhang G. PST-Diff: Achieving High-Consistency Stain Transfer by Diffusion Models With Pathological and Structural Constraints. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:3634-3647. [PMID: 39024079 DOI: 10.1109/tmi.2024.3430825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Histopathological examinations heavily rely on hematoxylin and eosin (HE) and immunohistochemistry (IHC) staining. IHC staining can offer more accurate diagnostic details but it brings significant financial and time costs. Furthermore, either re-staining HE-stained slides or using adjacent slides for IHC may compromise the accuracy of pathological diagnosis due to information loss. To address these challenges, we develop PST-Diff, a method for generating virtual IHC images from HE images based on diffusion models, which allows pathologists to simultaneously view multiple staining results from the same tissue slide. To maintain the pathological consistency of the stain transfer, we propose the asymmetric attention mechanism (AAM) and latent transfer (LT) module in PST-Diff. Specifically, the AAM can retain more local pathological information of the source domain images, while ensuring the model's flexibility in generating virtual stained images that highly confirm to the target domain. Subsequently, the LT module transfers the implicit representations across different domains, effectively alleviating the bias introduced by direct connection and further enhancing the pathological consistency of PST-Diff. Furthermore, to maintain the structural consistency of the stain transfer, the conditional frequency guidance (CFG) module is proposed to precisely control image generation and preserve structural details according to the frequency recovery process. To conclude, the pathological and structural consistency constraints provide PST-Diff with effectiveness and superior generalization in generating stable and functionally pathological IHC images with the best evaluation score. In general, PST-Diff offers prospective application in clinical virtual staining and pathological image analysis.
Collapse
|
13
|
Cao S, Hu Z, Xie X, Wang Y, Yu J, Yang B, Shi Z, Wu G. Integrated diagnosis of glioma based on magnetic resonance images with incomplete ground truth labels. Comput Biol Med 2024; 180:108968. [PMID: 39106670 DOI: 10.1016/j.compbiomed.2024.108968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 07/26/2024] [Accepted: 07/29/2024] [Indexed: 08/09/2024]
Abstract
BACKGROUND Since the 2016 WHO guidelines, glioma diagnosis has entered an era of integrated diagnosis, combining tissue pathology and molecular pathology. The WHO has focused on promoting the application of molecular diagnosis in the classification of central nervous system tumors. Genetic information such as IDH1 and 1p/19q are important molecular markers, and pathological grading is also a key clinical indicator. However, obtaining genetic pathology labels is more costly than conventional MRI images, resulting in a large number of missing labels in realistic modeling. METHOD We propose a training strategy based on label encoding and a corresponding loss function to enable the model to effectively utilize data with missing labels. Additionally, we integrate a graph model with genes and pathology-related clinical prior knowledge into the ResNet backbone to further improve the efficacy of diagnosis. Ten-fold cross-validation experiments were conducted on a large dataset of 1072 patients. RESULTS The classification area under the curve (AUC) values are 0.93, 0.91, and 0.90 for IDH1, 1p/19q status, and grade (LGG/HGG), respectively. When the label miss rate reached 59.3 %, the method improved the AUC by 0.09, 0.10, and 0.04 for IDH1, 1p/19q, and pathological grade, respectively, compared to the same backbone without the missing label strategy. CONCLUSIONS Our method effectively utilizes data with missing labels and integrates clinical prior knowledge, resulting in improved diagnostic performance for glioma genetic and pathological markers, even with high rates of missing labels.
Collapse
Affiliation(s)
- Shiwen Cao
- School of Information Science and Technology, Fudan University, Shanghai, China
| | - Zhaoyu Hu
- School of Information Science and Technology, Fudan University, Shanghai, China
| | - Xuan Xie
- School of Information Science and Technology, Fudan University, Shanghai, China
| | - Yuanyuan Wang
- School of Information Science and Technology, Fudan University, Shanghai, China
| | - Jinhua Yu
- School of Information Science and Technology, Fudan University, Shanghai, China
| | - Bojie Yang
- Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, China.
| | - Zhifeng Shi
- Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, China.
| | - Guoqing Wu
- School of Information Science and Technology, Fudan University, Shanghai, China.
| |
Collapse
|
14
|
Chaudhary MFA, Gerard SE, Christensen GE, Cooper CB, Schroeder JD, Hoffman EA, Reinhardt JM. LungViT: Ensembling Cascade of Texture Sensitive Hierarchical Vision Transformers for Cross-Volume Chest CT Image-to-Image Translation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2448-2465. [PMID: 38373126 PMCID: PMC11227912 DOI: 10.1109/tmi.2024.3367321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
Chest computed tomography (CT) at inspiration is often complemented by an expiratory CT to identify peripheral airways disease. Additionally, co-registered inspiratory-expiratory volumes can be used to derive various markers of lung function. Expiratory CT scans, however, may not be acquired due to dose or scan time considerations or may be inadequate due to motion or insufficient exhale; leading to a missed opportunity to evaluate underlying small airways disease. Here, we propose LungViT- a generative adversarial learning approach using hierarchical vision transformers for translating inspiratory CT intensities to corresponding expiratory CT intensities. LungViT addresses several limitations of the traditional generative models including slicewise discontinuities, limited size of generated volumes, and their inability to model texture transfer at volumetric level. We propose a shifted-window hierarchical vision transformer architecture with squeeze-and-excitation decoder blocks for modeling dependencies between features. We also propose a multiview texture similarity distance metric for texture and style transfer in 3D. To incorporate global information into the training process and refine the output of our model, we use ensemble cascading. LungViT is able to generate large 3D volumes of size 320×320×320 . We train and validate our model using a diverse cohort of 1500 subjects with varying disease severity. To assess model generalizability beyond the development set biases, we evaluate our model on an out-of-distribution external validation set of 200 subjects. Clinical validation on internal and external testing sets shows that synthetic volumes could be reliably adopted for deriving clinical endpoints of chronic obstructive pulmonary disease.
Collapse
|
15
|
Meng X, Sun K, Xu J, He X, Shen D. Multi-Modal Modality-Masked Diffusion Network for Brain MRI Synthesis With Random Modality Missing. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2587-2598. [PMID: 38393846 DOI: 10.1109/tmi.2024.3368664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2024]
Abstract
Synthesis of unavailable imaging modalities from available ones can generate modality-specific complementary information and enable multi-modality based medical images diagnosis or treatment. Existing generative methods for medical image synthesis are usually based on cross-modal translation between acquired and missing modalities. These methods are usually dedicated to specific missing modality and perform synthesis in one shot, which cannot deal with varying number of missing modalities flexibly and construct the mapping across modalities effectively. To address the above issues, in this paper, we propose a unified Multi-modal Modality-masked Diffusion Network (M2DN), tackling multi-modal synthesis from the perspective of "progressive whole-modality inpainting", instead of "cross-modal translation". Specifically, our M2DN considers the missing modalities as random noise and takes all the modalities as a unity in each reverse diffusion step. The proposed joint synthesis scheme performs synthesis for the missing modalities and self-reconstruction for the available ones, which not only enables synthesis for arbitrary missing scenarios, but also facilitates the construction of common latent space and enhances the model representation ability. Besides, we introduce a modality-mask scheme to encode availability status of each incoming modality explicitly in a binary mask, which is adopted as condition for the diffusion model to further enhance the synthesis performance of our M2DN for arbitrary missing scenarios. We carry out experiments on two public brain MRI datasets for synthesis and downstream segmentation tasks. Experimental results demonstrate that our M2DN outperforms the state-of-the-art models significantly and shows great generalizability for arbitrary missing modalities.
Collapse
|
16
|
Lu X, Liang X, Liu W, Miao X, Guan X. ReeGAN: MRI image edge-preserving synthesis based on GANs trained with misaligned data. Med Biol Eng Comput 2024; 62:1851-1868. [PMID: 38396277 DOI: 10.1007/s11517-024-03035-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 01/27/2024] [Indexed: 02/25/2024]
Abstract
As a crucial medical examination technique, different modalities of magnetic resonance imaging (MRI) complement each other, offering multi-angle and multi-dimensional insights into the body's internal information. Therefore, research on MRI cross-modality conversion is of great significance, and many innovative techniques have been explored. However, most methods are trained on well-aligned data, and the impact of misaligned data has not received sufficient attention. Additionally, many methods focus on transforming the entire image and ignore crucial edge information. To address these challenges, we propose a generative adversarial network based on multi-feature fusion, which effectively preserves edge information while training on noisy data. Notably, we consider images with limited range random transformations as noisy labels and use an additional small auxiliary registration network to help the generator adapt to the noise distribution. Moreover, we inject auxiliary edge information to improve the quality of synthesized target modality images. Our goal is to find the best solution for cross-modality conversion. Comprehensive experiments and ablation studies demonstrate the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Xiangjiang Lu
- Guangxi Key Lab of Multi-Source Information Mining & Security, School of Computer Science and Engineering & School of Software, Guangxi Normal University, Guilin, 541004, China.
| | - Xiaoshuang Liang
- Guangxi Key Lab of Multi-Source Information Mining & Security, School of Computer Science and Engineering & School of Software, Guangxi Normal University, Guilin, 541004, China
| | - Wenjing Liu
- Guangxi Key Lab of Multi-Source Information Mining & Security, School of Computer Science and Engineering & School of Software, Guangxi Normal University, Guilin, 541004, China
| | - Xiuxia Miao
- Guangxi Key Lab of Multi-Source Information Mining & Security, School of Computer Science and Engineering & School of Software, Guangxi Normal University, Guilin, 541004, China
| | - Xianglong Guan
- Guangxi Key Lab of Multi-Source Information Mining & Security, School of Computer Science and Engineering & School of Software, Guangxi Normal University, Guilin, 541004, China
| |
Collapse
|
17
|
Huang Y, Zhang X, Hu Y, Johnston AR, Jones CK, Zbijewski WB, Siewerdsen JH, Helm PA, Witham TF, Uneri A. Deformable registration of preoperative MR and intraoperative long-length tomosynthesis images for guidance of spine surgery via image synthesis. Comput Med Imaging Graph 2024; 114:102365. [PMID: 38471330 DOI: 10.1016/j.compmedimag.2024.102365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 01/31/2024] [Accepted: 02/22/2024] [Indexed: 03/14/2024]
Abstract
PURPOSE Improved integration and use of preoperative imaging during surgery hold significant potential for enhancing treatment planning and instrument guidance through surgical navigation. Despite its prevalent use in diagnostic settings, MR imaging is rarely used for navigation in spine surgery. This study aims to leverage MR imaging for intraoperative visualization of spine anatomy, particularly in cases where CT imaging is unavailable or when minimizing radiation exposure is essential, such as in pediatric surgery. METHODS This work presents a method for deformable 3D-2D registration of preoperative MR images with a novel intraoperative long-length tomosynthesis imaging modality (viz., Long-Film [LF]). A conditional generative adversarial network is used to translate MR images to an intermediate bone image suitable for registration, followed by a model-based 3D-2D registration algorithm to deformably map the synthesized images to LF images. The algorithm's performance was evaluated on cadaveric specimens with implanted markers and controlled deformation, and in clinical images of patients undergoing spine surgery as part of a large-scale clinical study on LF imaging. RESULTS The proposed method yielded a median 2D projection distance error of 2.0 mm (interquartile range [IQR]: 1.1-3.3 mm) and a 3D target registration error of 1.5 mm (IQR: 0.8-2.1 mm) in cadaver studies. Notably, the multi-scale approach exhibited significantly higher accuracy compared to rigid solutions and effectively managed the challenges posed by piecewise rigid spine deformation. The robustness and consistency of the method were evaluated on clinical images, yielding no outliers on vertebrae without surgical instrumentation and 3% outliers on vertebrae with instrumentation. CONCLUSIONS This work constitutes the first reported approach for deformable MR to LF registration based on deep image synthesis. The proposed framework provides access to the preoperative annotations and planning information during surgery and enables surgical navigation within the context of MR images and/or dual-plane LF images.
Collapse
Affiliation(s)
- Yixuan Huang
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Xiaoxuan Zhang
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Yicheng Hu
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, United States
| | - Ashley R Johnston
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Craig K Jones
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, United States
| | - Wojciech B Zbijewski
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Jeffrey H Siewerdsen
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States; Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | | | - Timothy F Witham
- Department of Neurosurgery, Johns Hopkins Medicine, Baltimore, MD, United States
| | - Ali Uneri
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States.
| |
Collapse
|
18
|
Li L, Yu J, Li Y, Wei J, Fan R, Wu D, Ye Y. Multi-sequence generative adversarial network: better generation for enhanced magnetic resonance imaging images. Front Comput Neurosci 2024; 18:1365238. [PMID: 38841427 PMCID: PMC11151883 DOI: 10.3389/fncom.2024.1365238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 03/27/2024] [Indexed: 06/07/2024] Open
Abstract
Introduction MRI is one of the commonly used diagnostic methods in clinical practice, especially in brain diseases. There are many sequences in MRI, but T1CE images can only be obtained by using contrast agents. Many patients (such as cancer patients) must undergo alignment of multiple MRI sequences for diagnosis, especially the contrast-enhanced magnetic resonance sequence. However, some patients such as pregnant women, children, etc. find it difficult to use contrast agents to obtain enhanced sequences, and contrast agents have many adverse reactions, which can pose a significant risk. With the continuous development of deep learning, the emergence of generative adversarial networks makes it possible to extract features from one type of image to generate another type of image. Methods We propose a generative adversarial network model with multimodal inputs and end-to-end decoding based on the pix2pix model. For the pix2pix model, we used four evaluation metrics: NMSE, RMSE, SSIM, and PNSR to assess the effectiveness of our generated model. Results Through statistical analysis, we compared our proposed new model with pix2pix and found significant differences between the two. Our model outperformed pix2pix, with higher SSIM and PNSR, lower NMSE and RMSE. We also found that the input of T1W images and T2W images had better effects than other combinations, providing new ideas for subsequent work on generating magnetic resonance enhancement sequence images. By using our model, it is possible to generate magnetic resonance enhanced sequence images based on magnetic resonance non-enhanced sequence images. Discussion This has significant implications as it can greatly reduce the use of contrast agents to protect populations such as pregnant women and children who are contraindicated for contrast agents. Additionally, contrast agents are relatively expensive, and this generation method may bring about substantial economic benefits.
Collapse
Affiliation(s)
- Leizi Li
- South China Normal University-Panyu Central Hospital Joint Laboratory of Basic and Translational Medical Research, Guangzhou Panyu Central Hospital, Guangzhou, China
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring and Guangdong Provincial Engineering Technology Research Center for Drug and Food Biological Resources Processing and Comprehensive Utilization, School of Life Sciences, South China Normal University, Guangzhou, China
| | - Jingchun Yu
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring and Guangdong Provincial Engineering Technology Research Center for Drug and Food Biological Resources Processing and Comprehensive Utilization, School of Life Sciences, South China Normal University, Guangzhou, China
| | - Yijin Li
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring and Guangdong Provincial Engineering Technology Research Center for Drug and Food Biological Resources Processing and Comprehensive Utilization, School of Life Sciences, South China Normal University, Guangzhou, China
| | - Jinbo Wei
- South China Normal University-Panyu Central Hospital Joint Laboratory of Basic and Translational Medical Research, Guangzhou Panyu Central Hospital, Guangzhou, China
| | - Ruifang Fan
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring and Guangdong Provincial Engineering Technology Research Center for Drug and Food Biological Resources Processing and Comprehensive Utilization, School of Life Sciences, South China Normal University, Guangzhou, China
| | - Dieen Wu
- South China Normal University-Panyu Central Hospital Joint Laboratory of Basic and Translational Medical Research, Guangzhou Panyu Central Hospital, Guangzhou, China
| | - Yufeng Ye
- South China Normal University-Panyu Central Hospital Joint Laboratory of Basic and Translational Medical Research, Guangzhou Panyu Central Hospital, Guangzhou, China
- Medical Imaging Institute of Panyu, Guangzhou, China
| |
Collapse
|
19
|
Dalmaz O, Mirza MU, Elmas G, Ozbey M, Dar SUH, Ceyani E, Oguz KK, Avestimehr S, Çukur T. One model to unite them all: Personalized federated learning of multi-contrast MRI synthesis. Med Image Anal 2024; 94:103121. [PMID: 38402791 DOI: 10.1016/j.media.2024.103121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 02/20/2024] [Accepted: 02/21/2024] [Indexed: 02/27/2024]
Abstract
Curation of large, diverse MRI datasets via multi-institutional collaborations can help improve learning of generalizable synthesis models that reliably translate source- onto target-contrast images. To facilitate collaborations, federated learning (FL) adopts decentralized model training while mitigating privacy concerns by avoiding sharing of imaging data. However, conventional FL methods can be impaired by the inherent heterogeneity in the data distribution, with domain shifts evident within and across imaging sites. Here we introduce the first personalized FL method for MRI Synthesis (pFLSynth) that improves reliability against data heterogeneity via model specialization to individual sites and synthesis tasks (i.e., source-target contrasts). To do this, pFLSynth leverages an adversarial model equipped with novel personalization blocks that control the statistics of generated feature maps across the spatial/channel dimensions, given latent variables specific to sites and tasks. To further promote communication efficiency and site specialization, partial network aggregation is employed over later generator stages while earlier generator stages and the discriminator are trained locally. As such, pFLSynth enables multi-task training of multi-site synthesis models with high generalization performance across sites and tasks. Comprehensive experiments demonstrate the superior performance and reliability of pFLSynth in MRI synthesis against prior federated methods.
Collapse
Affiliation(s)
- Onat Dalmaz
- Department of Electrical and Electronics Engineering, Bilkent University, Ankara 06800, Turkey; National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara 06800, Turkey
| | - Muhammad U Mirza
- Department of Electrical and Electronics Engineering, Bilkent University, Ankara 06800, Turkey; National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara 06800, Turkey
| | - Gokberk Elmas
- Department of Electrical and Electronics Engineering, Bilkent University, Ankara 06800, Turkey; National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara 06800, Turkey
| | - Muzaffer Ozbey
- Department of Electrical and Electronics Engineering, Bilkent University, Ankara 06800, Turkey; National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara 06800, Turkey
| | - Salman U H Dar
- Department of Electrical and Electronics Engineering, Bilkent University, Ankara 06800, Turkey; National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara 06800, Turkey
| | - Emir Ceyani
- Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA 90089, USA
| | - Kader K Oguz
- Department of Radiology, University of California, Davis Medical Center, Sacramento, CA 95817, USA
| | - Salman Avestimehr
- Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA 90089, USA
| | - Tolga Çukur
- Department of Electrical and Electronics Engineering, Bilkent University, Ankara 06800, Turkey; National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara 06800, Turkey; Neuroscience Program, Bilkent University, Ankara 06800, Turkey.
| |
Collapse
|
20
|
Jiang M, Wang S, Song Z, Song L, Wang Y, Zhu C, Zheng Q. Cross 2SynNet: cross-device-cross-modal synthesis of routine brain MRI sequences from CT with brain lesion. MAGMA (NEW YORK, N.Y.) 2024; 37:241-256. [PMID: 38315352 DOI: 10.1007/s10334-023-01145-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 11/28/2023] [Accepted: 12/27/2023] [Indexed: 02/07/2024]
Abstract
OBJECTIVES CT and MR are often needed to determine the location and extent of brain lesions collectively to improve diagnosis. However, patients with acute brain diseases cannot complete the MRI examination within a short time. The aim of the study is to devise a cross-device and cross-modal medical image synthesis (MIS) method Cross2SynNet for synthesizing routine brain MRI sequences of T1WI, T2WI, FLAIR, and DWI from CT with stroke and brain tumors. MATERIALS AND METHODS For the retrospective study, the participants covered four different diseases of cerebral ischemic stroke (CIS-cohort), cerebral hemorrhage (CH-cohort), meningioma (M-cohort), glioma (G-cohort). The MIS model Cross2SynNet was established on the basic architecture of conditional generative adversarial network (CGAN), of which, the fully convolutional Transformer (FCT) module was adopted into generator to capture the short- and long-range dependencies between healthy and pathological tissues, and the edge loss function was to minimize the difference in gradient magnitude between synthetic image and ground truth. Three metrics of mean square error (MSE), peak signal-to-noise ratio (PSNR), and structure similarity index measure (SSIM) were used for evaluation. RESULTS A total of 230 participants (mean patient age, 59.77 years ± 13.63 [standard deviation]; 163 men [71%] and 67 women [29%]) were included, including CIS-cohort (95 participants between Dec 2019 and Feb 2022), CH-cohort (69 participants between Jan 2020 and Dec 2021), M-cohort (40 participants between Sep 2018 and Dec 2021), and G-cohort (26 participants between Sep 2019 and Dec 2021). The Cross2SynNet achieved averaged values of MSE = 0.008, PSNR = 21.728, and SSIM = 0.758 when synthesizing MRIs from CT, outperforming the CycleGAN, pix2pix, RegGAN, Pix2PixHD, and ResViT. The Cross2SynNet could synthesize the brain lesion on pseudo DWI even if the CT image did not exhibit clear signal in the acute ischemic stroke patients. CONCLUSIONS Cross2SynNet could achieve routine brain MRI synthesis of T1WI, T2WI, FLAIR, and DWI from CT with promising performance given the brain lesion of stroke and brain tumor.
Collapse
Affiliation(s)
- Minbo Jiang
- School of Computer and Control Engineering, Yantai University, No 30, Qingquan Road, Laishan District, Yantai, 264005, Shandong, China
| | - Shuai Wang
- Department of Radiology, Binzhou Medical University Hospital, Binzhou, 256603, China
| | - Zhiwei Song
- School of Computer and Control Engineering, Yantai University, No 30, Qingquan Road, Laishan District, Yantai, 264005, Shandong, China
| | - Limei Song
- School of Medical Imaging, Weifang Medical University, Weifang, 261000, China
| | - Yi Wang
- School of Computer and Control Engineering, Yantai University, No 30, Qingquan Road, Laishan District, Yantai, 264005, Shandong, China
| | - Chuanzhen Zhu
- School of Computer and Control Engineering, Yantai University, No 30, Qingquan Road, Laishan District, Yantai, 264005, Shandong, China
| | - Qiang Zheng
- School of Computer and Control Engineering, Yantai University, No 30, Qingquan Road, Laishan District, Yantai, 264005, Shandong, China.
| |
Collapse
|
21
|
Bottani S, Thibeau-Sutre E, Maire A, Ströer S, Dormont D, Colliot O, Burgos N. Contrast-enhanced to non-contrast-enhanced image translation to exploit a clinical data warehouse of T1-weighted brain MRI. BMC Med Imaging 2024; 24:67. [PMID: 38504179 PMCID: PMC10953143 DOI: 10.1186/s12880-024-01242-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 03/07/2024] [Indexed: 03/21/2024] Open
Abstract
BACKGROUND Clinical data warehouses provide access to massive amounts of medical images, but these images are often heterogeneous. They can for instance include images acquired both with or without the injection of a gadolinium-based contrast agent. Harmonizing such data sets is thus fundamental to guarantee unbiased results, for example when performing differential diagnosis. Furthermore, classical neuroimaging software tools for feature extraction are typically applied only to images without gadolinium. The objective of this work is to evaluate how image translation can be useful to exploit a highly heterogeneous data set containing both contrast-enhanced and non-contrast-enhanced images from a clinical data warehouse. METHODS We propose and compare different 3D U-Net and conditional GAN models to convert contrast-enhanced T1-weighted (T1ce) into non-contrast-enhanced (T1nce) brain MRI. These models were trained using 230 image pairs and tested on 77 image pairs from the clinical data warehouse of the Greater Paris area. RESULTS Validation using standard image similarity measures demonstrated that the similarity between real and synthetic T1nce images was higher than between real T1nce and T1ce images for all the models compared. The best performing models were further validated on a segmentation task. We showed that tissue volumes extracted from synthetic T1nce images were closer to those of real T1nce images than volumes extracted from T1ce images. CONCLUSION We showed that deep learning models initially developed with research quality data could synthesize T1nce from T1ce images of clinical quality and that reliable features could be extracted from the synthetic images, thus demonstrating the ability of such methods to help exploit a data set coming from a clinical data warehouse.
Collapse
Affiliation(s)
- Simona Bottani
- Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié-Salpêtrière, Paris, 75013, France
| | - Elina Thibeau-Sutre
- Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié-Salpêtrière, Paris, 75013, France
| | - Aurélien Maire
- Innovation & Données - Département des Services Numériques, AP-HP, Paris, 75013, France
| | - Sebastian Ströer
- Hôpital Pitié Salpêtrière, Department of Neuroradiology, AP-HP, Paris, 75012, France
| | - Didier Dormont
- Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié-Salpêtrière, DMU DIAMENT, Paris, 75013, France
| | - Olivier Colliot
- Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié-Salpêtrière, Paris, 75013, France
| | - Ninon Burgos
- Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié-Salpêtrière, Paris, 75013, France.
| |
Collapse
|
22
|
Zhang D, Wang C, Chen T, Chen W, Shen Y. Scalable Swin Transformer network for brain tumor segmentation from incomplete MRI modalities. Artif Intell Med 2024; 149:102788. [PMID: 38462288 DOI: 10.1016/j.artmed.2024.102788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 12/19/2023] [Accepted: 01/25/2024] [Indexed: 03/12/2024]
Abstract
BACKGROUND Deep learning methods have shown great potential in processing multi-modal Magnetic Resonance Imaging (MRI) data, enabling improved accuracy in brain tumor segmentation. However, the performance of these methods can suffer when dealing with incomplete modalities, which is a common issue in clinical practice. Existing solutions, such as missing modality synthesis, knowledge distillation, and architecture-based methods, suffer from drawbacks such as long training times, high model complexity, and poor scalability. METHOD This paper proposes IMS2Trans, a novel lightweight scalable Swin Transformer network by utilizing a single encoder to extract latent feature maps from all available modalities. This unified feature extraction process enables efficient information sharing and fusion among the modalities, resulting in efficiency without compromising segmentation performance even in the presence of missing modalities. RESULTS Two datasets, BraTS 2018 and BraTS 2020, containing incomplete modalities for brain tumor segmentation are evaluated against popular benchmarks. On the BraTS 2018 dataset, our model achieved higher average Dice similarity coefficient (DSC) scores for the whole tumor, tumor core, and enhancing tumor regions (86.57, 75.67, and 58.28, respectively), in comparison with a state-of-the-art model, i.e. mmFormer (86.45, 75.51, and 57.79, respectively). Similarly, on the BraTS 2020 dataset, our model scored higher DSC scores in these three brain tumor regions (87.33, 79.09, and 62.11, respectively) compared to mmFormer (86.17, 78.34, and 60.36, respectively). We also conducted a Wilcoxon test on the experimental results, and the generated p-value confirmed that our model's performance was statistically significant. Moreover, our model exhibits significantly reduced complexity with only 4.47 M parameters, 121.89G FLOPs, and a model size of 77.13 MB, whereas mmFormer comprises 34.96 M parameters, 265.79 G FLOPs, and a model size of 559.74 MB. These indicate our model, being light-weighted with significantly reduced parameters, is still able to achieve better performance than a state-of-the-art model. CONCLUSION By leveraging a single encoder for processing the available modalities, IMS2Trans offers notable scalability advantages over methods that rely on multiple encoders. This streamlined approach eliminates the need for maintaining separate encoders for each modality, resulting in a lightweight and scalable network architecture. The source code of IMS2Trans and the associated weights are both publicly available at https://github.com/hudscomdz/IMS2Trans.
Collapse
Affiliation(s)
- Dongsong Zhang
- School of Big Data and Artificial Intelligence, Xinyang College, Xinyang, 464000, Henan, China; School of Computing and Engineering, University of Huddersfield, Huddersfield, HD13DH, UK
| | - Changjian Wang
- National Key Laboratory of Parallel and Distributed Computing, Changsha, 410073, Hunan, China
| | - Tianhua Chen
- School of Computing and Engineering, University of Huddersfield, Huddersfield, HD13DH, UK
| | - Weidao Chen
- Beijing Infervision Technology Co., Ltd., Beijing, 100020, China
| | - Yiqing Shen
- Department of Computer Science, Johns Hopkins University, Baltimore, 21218, MD, USA.
| |
Collapse
|
23
|
Li W, Liu J, Wang S, Feng C. MTFN: multi-temporal feature fusing network with co-attention for DCE-MRI synthesis. BMC Med Imaging 2024; 24:47. [PMID: 38373915 PMCID: PMC10875895 DOI: 10.1186/s12880-024-01201-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 01/15/2024] [Indexed: 02/21/2024] Open
Abstract
BACKGROUND Dynamic Contrast Enhanced Magnetic Resonance Imaging (DCE-MRI) plays an important role in the diagnosis and treatment of breast cancer. However, obtaining complete eight temporal images of DCE-MRI requires a long scanning time, which causes patients' discomfort in the scanning process. Therefore, to reduce the time, the multi temporal feature fusing neural network with Co-attention (MTFN) is proposed to generate the eighth temporal images of DCE-MRI, which enables the acquisition of DCE-MRI images without scanning. In order to reduce the time, multi-temporal feature fusion cooperative attention mechanism neural network (MTFN) is proposed to generate the eighth temporal images of DCE-MRI, which enables DCE-MRI image acquisition without scanning. METHODS In this paper, we propose multi temporal feature fusing neural network with Co-attention (MTFN) for DCE-MRI Synthesis, in which the Co-attention module can fully fuse the features of the first and third temporal image to obtain the hybrid features. The Co-attention explore long-range dependencies, not just relationships between pixels. Therefore, the hybrid features are more helpful to generate the eighth temporal images. RESULTS We conduct experiments on the private breast DCE-MRI dataset from hospitals and the multi modal Brain Tumor Segmentation Challenge2018 dataset (BraTs2018). Compared with existing methods, the experimental results of our method show the improvement and our method can generate more realistic images. In the meanwhile, we also use synthetic images to classify the molecular typing of breast cancer that the accuracy on the original eighth time-series images and the generated images are 89.53% and 92.46%, which have been improved by about 3%, and the classification results verify the practicability of the synthetic images. CONCLUSIONS The results of subjective evaluation and objective image quality evaluation indicators show the effectiveness of our method, which can obtain comprehensive and useful information. The improvement of classification accuracy proves that the images generated by our method are practical.
Collapse
Affiliation(s)
- Wei Li
- Key Laboratory of Intelligent Computing in Medical Image MIIC, Northeastern University, Shenyang, China
| | - Jiaye Liu
- School of Computer Science and Engineering, Northeastern University, Shenyang, China
| | - Shanshan Wang
- School of Computer Science and Engineering, Northeastern University, Shenyang, China.
| | - Chaolu Feng
- Key Laboratory of Intelligent Computing in Medical Image MIIC, Northeastern University, Shenyang, China
| |
Collapse
|
24
|
Hognon C, Conze PH, Bourbonne V, Gallinato O, Colin T, Jaouen V, Visvikis D. Contrastive image adaptation for acquisition shift reduction in medical imaging. Artif Intell Med 2024; 148:102747. [PMID: 38325919 DOI: 10.1016/j.artmed.2023.102747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 10/21/2023] [Accepted: 12/10/2023] [Indexed: 02/09/2024]
Abstract
The domain shift, or acquisition shift in medical imaging, is responsible for potentially harmful differences between development and deployment conditions of medical image analysis techniques. There is a growing need in the community for advanced methods that could mitigate this issue better than conventional approaches. In this paper, we consider configurations in which we can expose a learning-based pixel level adaptor to a large variability of unlabeled images during its training, i.e. sufficient to span the acquisition shift expected during the training or testing of a downstream task model. We leverage the ability of convolutional architectures to efficiently learn domain-agnostic features and train a many-to-one unsupervised mapping between a source collection of heterogeneous images from multiple unknown domains subjected to the acquisition shift and a homogeneous subset of this source set of lower cardinality, potentially constituted of a single image. To this end, we propose a new cycle-free image-to-image architecture based on a combination of three loss functions : a contrastive PatchNCE loss, an adversarial loss and an edge preserving loss allowing for rich domain adaptation to the target image even under strong domain imbalance and low data regimes. Experiments support the interest of the proposed contrastive image adaptation approach for the regularization of downstream deep supervised segmentation and cross-modality synthesis models.
Collapse
Affiliation(s)
- Clément Hognon
- UMR U1101 Inserm LaTIM, IMT Atlantique, Université de Bretagne Occidentale, France; SOPHiA Genetics, Pessac, France
| | - Pierre-Henri Conze
- UMR U1101 Inserm LaTIM, IMT Atlantique, Université de Bretagne Occidentale, France
| | - Vincent Bourbonne
- UMR U1101 Inserm LaTIM, IMT Atlantique, Université de Bretagne Occidentale, France
| | | | | | - Vincent Jaouen
- UMR U1101 Inserm LaTIM, IMT Atlantique, Université de Bretagne Occidentale, France.
| | - Dimitris Visvikis
- UMR U1101 Inserm LaTIM, IMT Atlantique, Université de Bretagne Occidentale, France
| |
Collapse
|
25
|
Dayarathna S, Islam KT, Uribe S, Yang G, Hayat M, Chen Z. Deep learning based synthesis of MRI, CT and PET: Review and analysis. Med Image Anal 2024; 92:103046. [PMID: 38052145 DOI: 10.1016/j.media.2023.103046] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 11/14/2023] [Accepted: 11/29/2023] [Indexed: 12/07/2023]
Abstract
Medical image synthesis represents a critical area of research in clinical decision-making, aiming to overcome the challenges associated with acquiring multiple image modalities for an accurate clinical workflow. This approach proves beneficial in estimating an image of a desired modality from a given source modality among the most common medical imaging contrasts, such as Computed Tomography (CT), Magnetic Resonance Imaging (MRI), and Positron Emission Tomography (PET). However, translating between two image modalities presents difficulties due to the complex and non-linear domain mappings. Deep learning-based generative modelling has exhibited superior performance in synthetic image contrast applications compared to conventional image synthesis methods. This survey comprehensively reviews deep learning-based medical imaging translation from 2018 to 2023 on pseudo-CT, synthetic MR, and synthetic PET. We provide an overview of synthetic contrasts in medical imaging and the most frequently employed deep learning networks for medical image synthesis. Additionally, we conduct a detailed analysis of each synthesis method, focusing on their diverse model designs based on input domains and network architectures. We also analyse novel network architectures, ranging from conventional CNNs to the recent Transformer and Diffusion models. This analysis includes comparing loss functions, available datasets and anatomical regions, and image quality assessments and performance in other downstream tasks. Finally, we discuss the challenges and identify solutions within the literature, suggesting possible future directions. We hope that the insights offered in this survey paper will serve as a valuable roadmap for researchers in the field of medical image synthesis.
Collapse
Affiliation(s)
- Sanuwani Dayarathna
- Department of Data Science and AI, Faculty of Information Technology, Monash University, Clayton VIC 3800, Australia.
| | | | - Sergio Uribe
- Department of Medical Imaging and Radiation Sciences, Faculty of Medicine, Monash University, Clayton VIC 3800, Australia
| | - Guang Yang
- Bioengineering Department and Imperial-X, Imperial College London, W12 7SL, United Kingdom
| | - Munawar Hayat
- Department of Data Science and AI, Faculty of Information Technology, Monash University, Clayton VIC 3800, Australia
| | - Zhaolin Chen
- Department of Data Science and AI, Faculty of Information Technology, Monash University, Clayton VIC 3800, Australia; Monash Biomedical Imaging, Clayton VIC 3800, Australia
| |
Collapse
|
26
|
Kumar S, Saber H, Charron O, Freeman L, Tamir JI. Correcting synthetic MRI contrast-weighted images using deep learning. Magn Reson Imaging 2024; 106:43-54. [PMID: 38092082 DOI: 10.1016/j.mri.2023.11.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 11/30/2023] [Accepted: 11/30/2023] [Indexed: 12/17/2023]
Abstract
Synthetic magnetic resonance imaging (MRI) offers a scanning paradigm where a fast multi-contrast sequence can be used to estimate underlying quantitative tissue parameter maps, which are then used to synthesize any desirable clinical contrast by retrospectively changing scan parameters in silico. Two benefits of this approach are the reduced exam time and the ability to generate arbitrary contrasts offline. However, synthetically generated contrasts are known to deviate from the contrast of experimental scans. The reason for contrast mismatch is the necessary exclusion of some unmodeled physical effects such as partial voluming, diffusion, flow, susceptibility, magnetization transfer, and more. The inclusion of these effects in signal encoding would improve the synthetic images, but would make the quantitative imaging protocol impractical due to long scan times. Therefore, in this work, we propose a novel deep learning approach that generates a multiplicative correction term to capture unmodeled effects and correct the synthetic contrast images to better match experimental contrasts for arbitrary scan parameters. The physics inspired deep learning model implicitly accounts for some unmodeled physical effects occurring during the scan. As a proof of principle, we validate our approach on synthesizing arbitrary inversion recovery fast spin-echo scans using a commercially available 2D multi-contrast sequence. We observe that the proposed correction visually and numerically reduces the mismatch with experimentally collected contrasts compared to conventional synthetic MRI. Finally, we show results of a preliminary reader study and find that the proposed method statistically significantly improves in contrast and SNR as compared to synthetic MR images.
Collapse
Affiliation(s)
- Sidharth Kumar
- Chandra Family Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin 78712, TX, USA.
| | - Hamidreza Saber
- Dell Medical School Department of Neurology, The University of Texas at Austin, Austin 78712, TX, USA; Dell Medical School Department of Neurosurgery, The University of Texas at Austin, Austin 78712, TX, USA
| | - Odelin Charron
- Dell Medical School Department of Neurology, The University of Texas at Austin, Austin 78712, TX, USA
| | - Leorah Freeman
- Dell Medical School Department of Neurology, The University of Texas at Austin, Austin 78712, TX, USA; Dell Medical School Department of Diagnostic Medicine, The University of Texas at Austin, Austin 78712, TX, USA
| | - Jonathan I Tamir
- Chandra Family Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin 78712, TX, USA; Dell Medical School Department of Diagnostic Medicine, The University of Texas at Austin, Austin 78712, TX, USA; Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin 78712, TX, USA
| |
Collapse
|
27
|
Wang Y, Luo Y, Zu C, Zhan B, Jiao Z, Wu X, Zhou J, Shen D, Zhou L. 3D multi-modality Transformer-GAN for high-quality PET reconstruction. Med Image Anal 2024; 91:102983. [PMID: 37926035 DOI: 10.1016/j.media.2023.102983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Revised: 08/06/2023] [Accepted: 09/28/2023] [Indexed: 11/07/2023]
Abstract
Positron emission tomography (PET) scans can reveal abnormal metabolic activities of cells and provide favorable information for clinical patient diagnosis. Generally, standard-dose PET (SPET) images contain more diagnostic information than low-dose PET (LPET) images but higher-dose scans can also bring higher potential radiation risks. To reduce the radiation risk while acquiring high-quality PET images, in this paper, we propose a 3D multi-modality edge-aware Transformer-GAN for high-quality SPET reconstruction using the corresponding LPET images and T1 acquisitions from magnetic resonance imaging (T1-MRI). Specifically, to fully excavate the metabolic distributions in LPET and anatomical structural information in T1-MRI, we first use two separate CNN-based encoders to extract local spatial features from the two modalities, respectively, and design a multimodal feature integration module to effectively integrate the two kinds of features given the diverse contributions of features at different locations. Then, as CNNs can describe local spatial information well but have difficulty in modeling long-range dependencies in images, we further apply a Transformer-based encoder to extract global semantic information in the input images and use a CNN decoder to transform the encoded features into SPET images. Finally, a patch-based discriminator is applied to ensure the similarity of patch-wise data distribution between the reconstructed and real images. Considering the importance of edge information in anatomical structures for clinical disease diagnosis, besides voxel-level estimation error and adversarial loss, we also introduce an edge-aware loss to retain more edge detail information in the reconstructed SPET images. Experiments on the phantom dataset and clinical dataset validate that our proposed method can effectively reconstruct high-quality SPET images and outperform current state-of-the-art methods in terms of qualitative and quantitative metrics.
Collapse
Affiliation(s)
- Yan Wang
- School of Computer Science, Sichuan University, Chengdu, China
| | - Yanmei Luo
- School of Computer Science, Sichuan University, Chengdu, China
| | - Chen Zu
- Department of Risk Controlling Research, JD.COM, China
| | - Bo Zhan
- School of Computer Science, Sichuan University, Chengdu, China
| | - Zhengyang Jiao
- School of Computer Science, Sichuan University, Chengdu, China
| | - Xi Wu
- School of Computer Science, Chengdu University of Information Technology, China
| | - Jiliu Zhou
- School of Computer Science, Sichuan University, Chengdu, China
| | - Dinggang Shen
- School of Biomedical Engineering, ShanghaiTech University, Shanghai, China; Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China.
| | - Luping Zhou
- School of Electrical and Information Engineering, University of Sydney, Australia.
| |
Collapse
|
28
|
Ozbey M, Dalmaz O, Dar SUH, Bedel HA, Ozturk S, Gungor A, Cukur T. Unsupervised Medical Image Translation With Adversarial Diffusion Models. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3524-3539. [PMID: 37379177 DOI: 10.1109/tmi.2023.3290149] [Citation(s) in RCA: 71] [Impact Index Per Article: 35.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]
Abstract
Imputation of missing images via source-to-target modality translation can improve diversity in medical imaging protocols. A pervasive approach for synthesizing target images involves one-shot mapping through generative adversarial networks (GAN). Yet, GAN models that implicitly characterize the image distribution can suffer from limited sample fidelity. Here, we propose a novel method based on adversarial diffusion modeling, SynDiff, for improved performance in medical image translation. To capture a direct correlate of the image distribution, SynDiff leverages a conditional diffusion process that progressively maps noise and source images onto the target image. For fast and accurate image sampling during inference, large diffusion steps are taken with adversarial projections in the reverse diffusion direction. To enable training on unpaired datasets, a cycle-consistent architecture is devised with coupled diffusive and non-diffusive modules that bilaterally translate between two modalities. Extensive assessments are reported on the utility of SynDiff against competing GAN and diffusion models in multi-contrast MRI and MRI-CT translation. Our demonstrations indicate that SynDiff offers quantitatively and qualitatively superior performance against competing baselines.
Collapse
|
29
|
Honkamaa J, Khan U, Koivukoski S, Valkonen M, Latonen L, Ruusuvuori P, Marttinen P. Deformation equivariant cross-modality image synthesis with paired non-aligned training data. Med Image Anal 2023; 90:102940. [PMID: 37666115 DOI: 10.1016/j.media.2023.102940] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 08/14/2023] [Accepted: 08/18/2023] [Indexed: 09/06/2023]
Abstract
Cross-modality image synthesis is an active research topic with multiple medical clinically relevant applications. Recently, methods allowing training with paired but misaligned data have started to emerge. However, no robust and well-performing methods applicable to a wide range of real world data sets exist. In this work, we propose a generic solution to the problem of cross-modality image synthesis with paired but non-aligned data by introducing new deformation equivariance encouraging loss functions. The method consists of joint training of an image synthesis network together with separate registration networks and allows adversarial training conditioned on the input even with misaligned data. The work lowers the bar for new clinical applications by allowing effortless training of cross-modality image synthesis networks for more difficult data sets.
Collapse
Affiliation(s)
- Joel Honkamaa
- Department of Computer Science, Aalto University, Finland.
| | - Umair Khan
- Institute of Biomedicine, University of Turku, Finland
| | - Sonja Koivukoski
- Institute of Biomedicine, University of Eastern Finland, Kuopio, Finland
| | - Mira Valkonen
- Faculty of Medicine and Health Technology, Tampere University, Finland
| | - Leena Latonen
- Institute of Biomedicine, University of Eastern Finland, Kuopio, Finland
| | - Pekka Ruusuvuori
- Institute of Biomedicine, University of Turku, Finland; Faculty of Medicine and Health Technology, Tampere University, Finland
| | | |
Collapse
|
30
|
Wang K, Doneva M, Meineke J, Amthor T, Karasan E, Tan F, Tamir JI, Yu SX, Lustig M. High-fidelity direct contrast synthesis from magnetic resonance fingerprinting. Magn Reson Med 2023; 90:2116-2129. [PMID: 37332200 DOI: 10.1002/mrm.29766] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 05/03/2023] [Accepted: 05/31/2023] [Indexed: 06/20/2023]
Abstract
PURPOSE This work was aimed at proposing a supervised learning-based method that directly synthesizes contrast-weighted images from the Magnetic Resonance Fingerprinting (MRF) data without performing quantitative mapping and spin-dynamics simulations. METHODS To implement our direct contrast synthesis (DCS) method, we deploy a conditional generative adversarial network (GAN) framework with a multi-branch U-Net as the generator and a multilayer CNN (PatchGAN) as the discriminator. We refer to our proposed approach as N-DCSNet. The input MRF data are used to directly synthesize T1-weighted, T2-weighted, and fluid-attenuated inversion recovery (FLAIR) images through supervised training on paired MRF and target spin echo-based contrast-weighted scans. The performance of our proposed method is demonstrated on in vivo MRF scans from healthy volunteers. Quantitative metrics, including normalized root mean square error (nRMSE), peak signal-to-noise ratio (PSNR), structural similarity (SSIM), learned perceptual image patch similarity (LPIPS), and Fréchet inception distance (FID), were used to evaluate the performance of the proposed method and compare it with others. RESULTS In-vivo experiments demonstrated excellent image quality with respect to that of simulation-based contrast synthesis and previous DCS methods, both visually and according to quantitative metrics. We also demonstrate cases in which our trained model is able to mitigate the in-flow and spiral off-resonance artifacts typically seen in MRF reconstructions, and thus more faithfully represent conventional spin echo-based contrast-weighted images. CONCLUSION We present N-DCSNet to directly synthesize high-fidelity multicontrast MR images from a single MRF acquisition. This method can significantly decrease examination time. By directly training a network to generate contrast-weighted images, our method does not require any model-based simulation and therefore can avoid reconstruction errors due to dictionary matching and contrast simulation (code available at:https://github.com/mikgroup/DCSNet).
Collapse
Affiliation(s)
- Ke Wang
- Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, California, USA
- International Computer Science Institute, University of California at Berkeley, Berkeley, California, USA
| | | | | | | | - Ekin Karasan
- Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, California, USA
| | - Fei Tan
- Bioengineering, UC Berkeley-UCSF, San Francisco, California, USA
| | - Jonathan I Tamir
- Chandra Family Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, Texas, USA
| | - Stella X Yu
- Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, California, USA
- International Computer Science Institute, University of California at Berkeley, Berkeley, California, USA
- Computer Science and Engineering, University of Michigan, Ann Arbor, Michigan, USA
| | | |
Collapse
|
31
|
Li Y, Zhou T, He K, Zhou Y, Shen D. Multi-Scale Transformer Network With Edge-Aware Pre-Training for Cross-Modality MR Image Synthesis. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3395-3407. [PMID: 37339020 DOI: 10.1109/tmi.2023.3288001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/22/2023]
Abstract
Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones. Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model. However, it is often challenging to obtain sufficient paired data for supervised training. In reality, we often have a small number of paired data while a large number of unpaired data. To take advantage of both paired and unpaired data, in this paper, we propose a Multi-scale Transformer Network (MT-Net) with edge-aware pre-training for cross-modality MR image synthesis. Specifically, an Edge-preserving Masked AutoEncoder (Edge-MAE) is first pre-trained in a self-supervised manner to simultaneously perform 1) image imputation for randomly masked patches in each image and 2) whole edge map estimation, which effectively learns both contextual and structural information. Besides, a novel patch-wise loss is proposed to enhance the performance of Edge-MAE by treating different masked patches differently according to the difficulties of their respective imputations. Based on this proposed pre-training, in the subsequent fine-tuning stage, a Dual-scale Selective Fusion (DSF) module is designed (in our MT-Net) to synthesize missing-modality images by integrating multi-scale features extracted from the encoder of the pre-trained Edge-MAE. Furthermore, this pre-trained encoder is also employed to extract high-level features from the synthesized image and corresponding ground-truth image, which are required to be similar (consistent) in the training. Experimental results show that our MT-Net achieves comparable performance to the competing methods even using 70% of all available paired data. Our code will be released at https://github.com/lyhkevin/MT-Net.
Collapse
|
32
|
Wang J, Wu QMJ, Pourpanah F. DC-cycleGAN: Bidirectional CT-to-MR synthesis from unpaired data. Comput Med Imaging Graph 2023; 108:102249. [PMID: 37290374 DOI: 10.1016/j.compmedimag.2023.102249] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 05/02/2023] [Accepted: 05/23/2023] [Indexed: 06/10/2023]
Abstract
Magnetic resonance (MR) and computer tomography (CT) images are two typical types of medical images that provide mutually-complementary information for accurate clinical diagnosis and treatment. However, obtaining both images may be limited due to some considerations such as cost, radiation dose and modality missing. Recently, medical image synthesis has aroused gaining research interest to cope with this limitation. In this paper, we propose a bidirectional learning model, denoted as dual contrast cycleGAN (DC-cycleGAN), to synthesize medical images from unpaired data. Specifically, a dual contrast loss is introduced into the discriminators to indirectly build constraints between real source and synthetic images by taking advantage of samples from the source domain as negative samples and enforce the synthetic images to fall far away from the source domain. In addition, cross-entropy and structural similarity index (SSIM) are integrated into the DC-cycleGAN in order to consider both the luminance and structure of samples when synthesizing images. The experimental results indicate that DC-cycleGAN is able to produce promising results as compared with other cycleGAN-based medical image synthesis methods such as cycleGAN, RegGAN, DualGAN, and NiceGAN. Code is available at https://github.com/JiayuanWang-JW/DC-cycleGAN.
Collapse
Affiliation(s)
- Jiayuan Wang
- Department of Electrical and Computer Engineering, University of Windsor, Windsor, ON, Canada.
| | - Q M Jonathan Wu
- Department of Electrical and Computer Engineering, University of Windsor, Windsor, ON, Canada.
| | - Farhad Pourpanah
- Department of Electrical and Computer Engineering, Queen's University, Kingston, ON, Canada.
| |
Collapse
|
33
|
Koohi-Moghadam M, Bae KT. Generative AI in Medical Imaging: Applications, Challenges, and Ethics. J Med Syst 2023; 47:94. [PMID: 37651022 DOI: 10.1007/s10916-023-01987-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 08/21/2023] [Indexed: 09/01/2023]
Abstract
Medical imaging is playing an important role in diagnosis and treatment of diseases. Generative artificial intelligence (AI) have shown great potential in enhancing medical imaging tasks such as data augmentation, image synthesis, image-to-image translation, and radiology report generation. This commentary aims to provide an overview of generative AI in medical imaging, discussing applications, challenges, and ethical considerations, while highlighting future research directions in this rapidly evolving field.
Collapse
Affiliation(s)
- Mohamad Koohi-Moghadam
- Department of Diagnostic Radiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pok Fu Lam, Hong Kong.
| | - Kyongtae Ty Bae
- Department of Diagnostic Radiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pok Fu Lam, Hong Kong.
| |
Collapse
|
34
|
Dashtbani Moghari M, Sanaat A, Young N, Moore K, Zaidi H, Evans A, Fulton RR, Kyme AZ. Reduction of scan duration and radiation dose in cerebral CT perfusion imaging of acute stroke using a recurrent neural network. Phys Med Biol 2023; 68:165005. [PMID: 37327792 DOI: 10.1088/1361-6560/acdf3a] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Accepted: 06/16/2023] [Indexed: 06/18/2023]
Abstract
Objective. Cerebral CT perfusion (CTP) imaging is most commonly used to diagnose acute ischaemic stroke and support treatment decisions. Shortening CTP scan duration is desirable to reduce the accumulated radiation dose and the risk of patient head movement. In this study, we present a novel application of a stochastic adversarial video prediction approach to reduce CTP imaging acquisition time.Approach. A variational autoencoder and generative adversarial network (VAE-GAN) were implemented in a recurrent framework in three scenarios: to predict the last 8 (24 s), 13 (31.5 s) and 18 (39 s) image frames of the CTP acquisition from the first 25 (36 s), 20 (28.5 s) and 15 (21 s) acquired frames, respectively. The model was trained using 65 stroke cases and tested on 10 unseen cases. Predicted frames were assessed against ground-truth in terms of image quality and haemodynamic maps, bolus shape characteristics and volumetric analysis of lesions.Main results. In all three prediction scenarios, the mean percentage error between the area, full-width-at-half-maximum and maximum enhancement of the predicted and ground-truth bolus curve was less than 4 ± 4%. The best peak signal-to-noise ratio and structural similarity of predicted haemodynamic maps was obtained for cerebral blood volume followed (in order) by cerebral blood flow, mean transit time and time to peak. For the 3 prediction scenarios, average volumetric error of the lesion was overestimated by 7%-15%, 11%-28% and 7%-22% for the infarct, penumbra and hypo-perfused regions, respectively, and the corresponding spatial agreement for these regions was 67%-76%, 76%-86% and 83%-92%.Significance. This study suggests that a recurrent VAE-GAN could potentially be used to predict a portion of CTP frames from truncated acquisitions, preserving the majority of clinical content in the images, and potentially reducing the scan duration and radiation dose simultaneously by 65% and 54.5%, respectively.
Collapse
Affiliation(s)
- Mahdieh Dashtbani Moghari
- School of Biomedical Engineering, Faculty of Engineering and Information Technologies, The University of Sydney, Sydney, Australia
| | - Amirhossein Sanaat
- Geneva University Hospitals, Division of Nuclear Medicine & Molecular Imaging, CH-1205 Geneva, Switzerland
| | - Noel Young
- Department of Radiology, Westmead Hospital, Sydney, Australia
- Medical imaging group, School of Medicine, Western Sydney University, Sydney, Australia
| | - Krystal Moore
- Department of Radiology, Westmead Hospital, Sydney, Australia
| | - Habib Zaidi
- Geneva University Hospitals, Division of Nuclear Medicine & Molecular Imaging, CH-1205 Geneva, Switzerland
| | - Andrew Evans
- Department of Aged Care & Stroke, Westmead Hospital, Sydney, Australia
- School of Health Sciences, University of Sydney, Sydney, Australia
| | - Roger R Fulton
- School of Health Sciences, University of Sydney, Sydney, Australia
- Department of Medical Physics, Westmead Hospital, Sydney, Australia
- The Brain & Mind Centre, The University of Sydney, Sydney, Australia
| | - Andre Z Kyme
- School of Biomedical Engineering, Faculty of Engineering and Information Technologies, The University of Sydney, Sydney, Australia
- The Brain & Mind Centre, The University of Sydney, Sydney, Australia
| |
Collapse
|
35
|
Jiao C, Ling D, Bian S, Vassantachart A, Cheng K, Mehta S, Lock D, Zhu Z, Feng M, Thomas H, Scholey JE, Sheng K, Fan Z, Yang W. Contrast-Enhanced Liver Magnetic Resonance Image Synthesis Using Gradient Regularized Multi-Modal Multi-Discrimination Sparse Attention Fusion GAN. Cancers (Basel) 2023; 15:3544. [PMID: 37509207 PMCID: PMC10377331 DOI: 10.3390/cancers15143544] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/03/2023] [Accepted: 07/05/2023] [Indexed: 07/30/2023] Open
Abstract
PURPOSES To provide abdominal contrast-enhanced MR image synthesis, we developed an gradient regularized multi-modal multi-discrimination sparse attention fusion generative adversarial network (GRMM-GAN) to avoid repeated contrast injections to patients and facilitate adaptive monitoring. METHODS With IRB approval, 165 abdominal MR studies from 61 liver cancer patients were retrospectively solicited from our institutional database. Each study included T2, T1 pre-contrast (T1pre), and T1 contrast-enhanced (T1ce) images. The GRMM-GAN synthesis pipeline consists of a sparse attention fusion network, an image gradient regularizer (GR), and a generative adversarial network with multi-discrimination. The studies were randomly divided into 115 for training, 20 for validation, and 30 for testing. The two pre-contrast MR modalities, T2 and T1pre images, were adopted as inputs in the training phase. The T1ce image at the portal venous phase was used as an output. The synthesized T1ce images were compared with the ground truth T1ce images. The evaluation metrics include peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and mean squared error (MSE). A Turing test and experts' contours evaluated the image synthesis quality. RESULTS The proposed GRMM-GAN model achieved a PSNR of 28.56, an SSIM of 0.869, and an MSE of 83.27. The proposed model showed statistically significant improvements in all metrics tested with p-values < 0.05 over the state-of-the-art model comparisons. The average Turing test score was 52.33%, which is close to random guessing, supporting the model's effectiveness for clinical application. In the tumor-specific region analysis, the average tumor contrast-to-noise ratio (CNR) of the synthesized MR images was not statistically significant from the real MR images. The average DICE from real vs. synthetic images was 0.90 compared to the inter-operator DICE of 0.91. CONCLUSION We demonstrated the function of a novel multi-modal MR image synthesis neural network GRMM-GAN for T1ce MR synthesis based on pre-contrast T1 and T2 MR images. GRMM-GAN shows promise for avoiding repeated contrast injections during radiation therapy treatment.
Collapse
Affiliation(s)
- Changzhe Jiao
- Department of Radiation Oncology, Keck School of Medicine of USC, Los Angeles, CA 90033, USA (A.V.); (S.M.)
- Department of Radiation Oncology, UC San Francisco, San Francisco, CA 94143, USA
| | - Diane Ling
- Department of Radiation Oncology, Keck School of Medicine of USC, Los Angeles, CA 90033, USA (A.V.); (S.M.)
| | - Shelly Bian
- Department of Radiation Oncology, Keck School of Medicine of USC, Los Angeles, CA 90033, USA (A.V.); (S.M.)
| | - April Vassantachart
- Department of Radiation Oncology, Keck School of Medicine of USC, Los Angeles, CA 90033, USA (A.V.); (S.M.)
| | - Karen Cheng
- Department of Radiation Oncology, Keck School of Medicine of USC, Los Angeles, CA 90033, USA (A.V.); (S.M.)
| | - Shahil Mehta
- Department of Radiation Oncology, Keck School of Medicine of USC, Los Angeles, CA 90033, USA (A.V.); (S.M.)
| | - Derrick Lock
- Department of Radiation Oncology, Keck School of Medicine of USC, Los Angeles, CA 90033, USA (A.V.); (S.M.)
| | - Zhenyu Zhu
- Guangzhou Institute of Technology, Xidian University, Guangzhou 510555, China;
| | - Mary Feng
- Department of Radiation Oncology, UC San Francisco, San Francisco, CA 94143, USA
| | - Horatio Thomas
- Department of Radiation Oncology, UC San Francisco, San Francisco, CA 94143, USA
| | - Jessica E. Scholey
- Department of Radiation Oncology, UC San Francisco, San Francisco, CA 94143, USA
| | - Ke Sheng
- Department of Radiation Oncology, UC San Francisco, San Francisco, CA 94143, USA
| | - Zhaoyang Fan
- Department of Radiology, Keck School of Medicine of USC, Los Angeles, CA 90033, USA
| | - Wensha Yang
- Department of Radiation Oncology, Keck School of Medicine of USC, Los Angeles, CA 90033, USA (A.V.); (S.M.)
- Department of Radiation Oncology, UC San Francisco, San Francisco, CA 94143, USA
| |
Collapse
|
36
|
Gong C, Jing C, Chen X, Pun CM, Huang G, Saha A, Nieuwoudt M, Li HX, Hu Y, Wang S. Generative AI for brain image computing and brain network computing: a review. Front Neurosci 2023; 17:1203104. [PMID: 37383107 PMCID: PMC10293625 DOI: 10.3389/fnins.2023.1203104] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 05/22/2023] [Indexed: 06/30/2023] Open
Abstract
Recent years have witnessed a significant advancement in brain imaging techniques that offer a non-invasive approach to mapping the structure and function of the brain. Concurrently, generative artificial intelligence (AI) has experienced substantial growth, involving using existing data to create new content with a similar underlying pattern to real-world data. The integration of these two domains, generative AI in neuroimaging, presents a promising avenue for exploring various fields of brain imaging and brain network computing, particularly in the areas of extracting spatiotemporal brain features and reconstructing the topological connectivity of brain networks. Therefore, this study reviewed the advanced models, tasks, challenges, and prospects of brain imaging and brain network computing techniques and intends to provide a comprehensive picture of current generative AI techniques in brain imaging. This review is focused on novel methodological approaches and applications of related new methods. It discussed fundamental theories and algorithms of four classic generative models and provided a systematic survey and categorization of tasks, including co-registration, super-resolution, enhancement, classification, segmentation, cross-modality, brain network analysis, and brain decoding. This paper also highlighted the challenges and future directions of the latest work with the expectation that future research can be beneficial.
Collapse
Affiliation(s)
- Changwei Gong
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Department of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| | - Changhong Jing
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Department of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| | - Xuhang Chen
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Department of Computer and Information Science, University of Macau, Macau, China
| | - Chi Man Pun
- Department of Computer and Information Science, University of Macau, Macau, China
| | - Guoli Huang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Ashirbani Saha
- Department of Oncology and School of Biomedical Engineering, McMaster University, Hamilton, ON, Canada
| | - Martin Nieuwoudt
- Institute for Biomedical Engineering, Stellenbosch University, Stellenbosch, South Africa
| | - Han-Xiong Li
- Department of Systems Engineering, City University of Hong Kong, Hong Kong, China
| | - Yong Hu
- Department of Orthopaedics and Traumatology, The University of Hong Kong, Hong Kong, China
| | - Shuqiang Wang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Department of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
37
|
Cai G, Liu H, Zou W, Hu N, Wang J. Registration of 3D medical images based on unsupervised cooperative cascade of deep networks. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
38
|
Joseph J, Biji I, Babu N, Pournami PN, Jayaraj PB, Puzhakkal N, Sabu C, Patel V. Fan beam CT image synthesis from cone beam CT image using nested residual UNet based conditional generative adversarial network. Phys Eng Sci Med 2023; 46:703-717. [PMID: 36943626 DOI: 10.1007/s13246-023-01244-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 03/09/2023] [Indexed: 03/23/2023]
Abstract
A radiotherapy technique called Image-Guided Radiation Therapy adopts frequent imaging throughout a treatment session. Fan Beam Computed Tomography (FBCT) based planning followed by Cone Beam Computed Tomography (CBCT) based radiation delivery drastically improved the treatment accuracy. Furtherance in terms of radiation exposure and cost can be achieved if FBCT could be replaced with CBCT. This paper proposes a Conditional Generative Adversarial Network (CGAN) for CBCT-to-FBCT synthesis. Specifically, a new architecture called Nested Residual UNet (NR-UNet) is introduced as the generator of the CGAN. A composite loss function, which comprises adversarial loss, Mean Squared Error (MSE), and Gradient Difference Loss (GDL), is used with the generator. The CGAN utilises the inter-slice dependency in the input by taking three consecutive CBCT slices to generate an FBCT slice. The model is trained using Head-and-Neck (H&N) FBCT-CBCT images of 53 cancer patients. The synthetic images exhibited a Peak Signal-to-Noise Ratio of 34.04±0.93 dB, Structural Similarity Index Measure of 0.9751±0.001 and a Mean Absolute Error of 14.81±4.70 HU. On average, the proposed model guarantees an improvement in Contrast-to-Noise Ratio four times better than the input CBCT images. The model also minimised the MSE and alleviated blurriness. Compared to the CBCT-based plan, the synthetic image results in a treatment plan closer to the FBCT-based plan. The three-slice to single-slice translation captures the three-dimensional contextual information in the input. Besides, it withstands the computational complexity associated with a three-dimensional image synthesis model. Furthermore, the results demonstrate that the proposed model is superior to the state-of-the-art methods.
Collapse
Affiliation(s)
- Jiffy Joseph
- Computer science and Engineering Department, National Institute of Technology Calicut, Kattangal, Calicut, Kerala, 673601, India.
| | - Ivan Biji
- Computer science and Engineering Department, National Institute of Technology Calicut, Kattangal, Calicut, Kerala, 673601, India
| | - Naveen Babu
- Computer science and Engineering Department, National Institute of Technology Calicut, Kattangal, Calicut, Kerala, 673601, India
| | - P N Pournami
- Computer science and Engineering Department, National Institute of Technology Calicut, Kattangal, Calicut, Kerala, 673601, India
| | - P B Jayaraj
- Computer science and Engineering Department, National Institute of Technology Calicut, Kattangal, Calicut, Kerala, 673601, India
| | - Niyas Puzhakkal
- Department of Medical Physics, MVR Cancer Centre & Research Institute, Poolacode, Calicut, Kerala, 673601, India
| | - Christy Sabu
- Computer science and Engineering Department, National Institute of Technology Calicut, Kattangal, Calicut, Kerala, 673601, India
| | - Vedkumar Patel
- Computer science and Engineering Department, National Institute of Technology Calicut, Kattangal, Calicut, Kerala, 673601, India
| |
Collapse
|
39
|
Zhou T, Ruan S, Hu H. A literature survey of MR-based brain tumor segmentation with missing modalities. Comput Med Imaging Graph 2023; 104:102167. [PMID: 36584536 DOI: 10.1016/j.compmedimag.2022.102167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 11/01/2022] [Accepted: 12/22/2022] [Indexed: 12/28/2022]
Abstract
Multimodal MR brain tumor segmentation is one of the hottest issues in the community of medical image processing. However, acquiring the complete set of MR modalities is not always possible in clinical practice, due to the acquisition protocols, image corruption, scanner availability, scanning cost or allergies to certain contrast materials. The missing information can cause some restraints to brain tumor diagnosis, monitoring, treatment planning and prognosis. Thus, it is highly desirable to develop brain tumor segmentation methods to address the missing modalities problem. Based on the recent advancements, in this review, we provide a detailed analysis of the missing modality issue in MR-based brain tumor segmentation. First, we briefly introduce the biomedical background concerning brain tumor, MR imaging techniques, and the current challenges in brain tumor segmentation. Then, we provide a taxonomy of the state-of-the-art methods with five categories, namely, image synthesis-based method, latent feature space-based model, multi-source correlation-based method, knowledge distillation-based method, and domain adaptation-based method. In addition, the principles, architectures, benefits and limitations are elaborated in each method. Following that, the corresponding datasets and widely used evaluation metrics are described. Finally, we analyze the current challenges and provide a prospect for future development trends. This review aims to provide readers with a thorough knowledge of the recent contributions in the field of brain tumor segmentation with missing modalities and suggest potential future directions.
Collapse
Affiliation(s)
- Tongxue Zhou
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou 311121, China
| | - Su Ruan
- Université de Rouen Normandie, LITIS - QuantIF, Rouen 76183, France
| | - Haigen Hu
- College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China; Key Laboratory of Visual Media Intelligent Processing Technology of Zhejiang Province, Hangzhou 310023, China.
| |
Collapse
|
40
|
Chaitanya K, Erdil E, Karani N, Konukoglu E. Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation. Med Image Anal 2023; 87:102792. [PMID: 37054649 DOI: 10.1016/j.media.2023.102792] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 11/25/2022] [Accepted: 03/02/2023] [Indexed: 03/13/2023]
Abstract
Supervised deep learning-based methods yield accurate results for medical image segmentation. However, they require large labeled datasets for this, and obtaining them is a laborious task that requires clinical expertise. Semi/self-supervised learning-based approaches address this limitation by exploiting unlabeled data along with limited annotated data. Recent self-supervised learning methods use contrastive loss to learn good global level representations from unlabeled images and achieve high performance in classification tasks on popular natural image datasets like ImageNet. In pixel-level prediction tasks such as segmentation, it is crucial to also learn good local level representations along with global representations to achieve better accuracy. However, the impact of the existing local contrastive loss-based methods remains limited for learning good local representations because similar and dissimilar local regions are defined based on random augmentations and spatial proximity; not based on the semantic label of local regions due to lack of large-scale expert annotations in the semi/self-supervised setting. In this paper, we propose a local contrastive loss to learn good pixel level features useful for segmentation by exploiting semantic label information obtained from pseudo-labels of unlabeled images alongside limited annotated images with ground truth (GT) labels. In particular, we define the proposed contrastive loss to encourage similar representations for the pixels that have the same pseudo-label/GT label while being dissimilar to the representation of pixels with different pseudo-label/GT label in the dataset. We perform pseudo-label based self-training and train the network by jointly optimizing the proposed contrastive loss on both labeled and unlabeled sets and segmentation loss on only the limited labeled set. We evaluated the proposed approach on three public medical datasets of cardiac and prostate anatomies, and obtain high segmentation performance with a limited labeled set of one or two 3D volumes. Extensive comparisons with the state-of-the-art semi-supervised and data augmentation methods and concurrent contrastive learning methods demonstrate the substantial improvement achieved by the proposed method. The code is made publicly available at https://github.com/krishnabits001/pseudo_label_contrastive_training.
Collapse
Affiliation(s)
- Krishna Chaitanya
- Computer Vision Laboratory, ETH Zurich, Sternwartstrasse 7, Zurich 8092, Switzerland.
| | - Ertunc Erdil
- Computer Vision Laboratory, ETH Zurich, Sternwartstrasse 7, Zurich 8092, Switzerland
| | - Neerav Karani
- Computer Vision Laboratory, ETH Zurich, Sternwartstrasse 7, Zurich 8092, Switzerland
| | - Ender Konukoglu
- Computer Vision Laboratory, ETH Zurich, Sternwartstrasse 7, Zurich 8092, Switzerland
| |
Collapse
|
41
|
Osuala R, Kushibar K, Garrucho L, Linardos A, Szafranowska Z, Klein S, Glocker B, Diaz O, Lekadir K. Data synthesis and adversarial networks: A review and meta-analysis in cancer imaging. Med Image Anal 2023; 84:102704. [PMID: 36473414 DOI: 10.1016/j.media.2022.102704] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 11/02/2022] [Accepted: 11/21/2022] [Indexed: 11/26/2022]
Abstract
Despite technological and medical advances, the detection, interpretation, and treatment of cancer based on imaging data continue to pose significant challenges. These include inter-observer variability, class imbalance, dataset shifts, inter- and intra-tumour heterogeneity, malignancy determination, and treatment effect uncertainty. Given the recent advancements in image synthesis, Generative Adversarial Networks (GANs), and adversarial training, we assess the potential of these technologies to address a number of key challenges of cancer imaging. We categorise these challenges into (a) data scarcity and imbalance, (b) data access and privacy, (c) data annotation and segmentation, (d) cancer detection and diagnosis, and (e) tumour profiling, treatment planning and monitoring. Based on our analysis of 164 publications that apply adversarial training techniques in the context of cancer imaging, we highlight multiple underexplored solutions with research potential. We further contribute the Synthesis Study Trustworthiness Test (SynTRUST), a meta-analysis framework for assessing the validation rigour of medical image synthesis studies. SynTRUST is based on 26 concrete measures of thoroughness, reproducibility, usefulness, scalability, and tenability. Based on SynTRUST, we analyse 16 of the most promising cancer imaging challenge solutions and observe a high validation rigour in general, but also several desirable improvements. With this work, we strive to bridge the gap between the needs of the clinical cancer imaging community and the current and prospective research on data synthesis and adversarial networks in the artificial intelligence community.
Collapse
Affiliation(s)
- Richard Osuala
- Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Spain.
| | - Kaisar Kushibar
- Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Spain
| | - Lidia Garrucho
- Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Spain
| | - Akis Linardos
- Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Spain
| | - Zuzanna Szafranowska
- Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Spain
| | - Stefan Klein
- Biomedical Imaging Group Rotterdam, Department of Radiology & Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands
| | - Ben Glocker
- Biomedical Image Analysis Group, Department of Computing, Imperial College London, UK
| | - Oliver Diaz
- Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Spain
| | - Karim Lekadir
- Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Spain
| |
Collapse
|
42
|
Poonkodi S, Kanchana M. 3D-MedTranCSGAN: 3D Medical Image Transformation using CSGAN. Comput Biol Med 2023; 153:106541. [PMID: 36652868 DOI: 10.1016/j.compbiomed.2023.106541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 11/30/2022] [Accepted: 01/10/2023] [Indexed: 01/15/2023]
Abstract
Computer vision techniques are a rapidly growing area of transforming medical images for various specific medical applications. In an end-to-end application, this paper proposes a 3D Medical Image Transformation Using a CSGAN model named a 3D-MedTranCSGAN. The 3D-MedTranCSGAN model is an integration of non-adversarial loss components and the Cyclic Synthesized Generative Adversarial Networks. The proposed model utilizes PatchGAN's discriminator network, to penalize the difference between the synthesized image and the original image. The model also computes the non-adversary loss functions such as content, perception, and style transfer losses. 3DCascadeNet is a new generator architecture introduced in the paper, which is used to enhance the perceptiveness of the transformed medical image by encoding-decoding pairs. We use the 3D-MedTranCSGAN model to do various tasks without modifying specific applications: PET to CT image transformation; reconstruction of CT to PET; modification of movement artefacts in MR images; and removing noise in PET images. We found that 3D-MedTranCSGAN outperformed other transformation methods in our experiments. For the first task, the proposed model yields SSIM is 0.914, PSNR is 26.12, MSE is 255.5, VIF is 0.4862, UQI is 0.9067 and LPIPs is 0.2284. For the second task, the model yields 0.9197, 25.7, 257.56, 0.4962, 0.9027, 0.2262. For the third task, the model yields 0.8862, 24.94, 0.4071, 0.6410, 0.2196. For the final task, the model yields 0.9521, 33.67, 33.57, 0.6091, 0.9255, 0.0244. Based on the result analysis, the proposed model outperforms the other techniques.
Collapse
Affiliation(s)
- S Poonkodi
- Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Kattankulathur, India
| | - M Kanchana
- Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Kattankulathur, India.
| |
Collapse
|
43
|
Liu H, Gong G, Zou W, Hu N, Wang J. Topologically preserved registration of 3D CT images with deep networks. Phys Med Biol 2023; 68. [PMID: 36623316 DOI: 10.1088/1361-6560/acb197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 01/09/2023] [Indexed: 01/11/2023]
Abstract
Objective. Computed Tomography (CT) image registration makes fast and accurate imaging-based disease diagnosis possible. We aim to develop a framework which can perform accurate local registration of organs in 3D CT images while preserving the topology of transformation.Approach. In this framework, the Faster R-CNN method is first used to detect local areas containing organs from fixed and moving images whose results are then registered with a weakly supervised deep neural network. In this network, a novel 3D channel coordinate attention (CA) module is introduced to reduce the loss of position information. The image edge loss and the organ labelling loss are used to weakly supervise the training process of our deep network, which enables the network learning to focus on registering organs and image structures. An intuitive inverse module is also used to reduce the folding of deformation field. More specifically, the folding is suppressed directly by simultaneously maximizing forward and backward registration accuracy in the image domain rather than indirectly by measuring the consistency of forward and inverse deformation fields as usual.Main results. Our method achieves an average dice similarity coefficient (DSC) of 0.954 and an average Similarity (Sim) of 0.914 on publicly available liver datasets (LiTS for training and Sliver07 for testing) and achieves an average DSC of 0.914 and an average Sim of 0.947 on our home-built left ventricular myocardium (LVM) dataset.Significance. Experimental results show that our proposed method can significantly improve the registration accuracy of organs such as the liver and LVM. Moreover, our inverse module can intuitively improve the inherent topological preservation of transformations.
Collapse
Affiliation(s)
- Huaying Liu
- School of Electronic and Information Engineering, Soochow University, Suzhou 215006, People's Republic of China
| | - Guanzhong Gong
- Department of Radiation Oncology Physics and Technology, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan 250000, People's Republic of China
| | - Wei Zou
- School of Electronic and Information Engineering, Soochow University, Suzhou 215006, People's Republic of China
| | - Nan Hu
- School of Electronic and Information Engineering, Soochow University, Suzhou 215006, People's Republic of China
| | - Jiajun Wang
- School of Electronic and Information Engineering, Soochow University, Suzhou 215006, People's Republic of China
| |
Collapse
|
44
|
Yurt M, Dalmaz O, Dar S, Ozbey M, Tinaz B, Oguz K, Cukur T. Semi-Supervised Learning of MRI Synthesis Without Fully-Sampled Ground Truths. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:3895-3906. [PMID: 35969576 DOI: 10.1109/tmi.2022.3199155] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Learning-based translation between MRI contrasts involves supervised deep models trained using high-quality source- and target-contrast images derived from fully-sampled acquisitions, which might be difficult to collect under limitations on scan costs or time. To facilitate curation of training sets, here we introduce the first semi-supervised model for MRI contrast translation (ssGAN) that can be trained directly using undersampled k-space data. To enable semi-supervised learning on undersampled data, ssGAN introduces novel multi-coil losses in image, k-space, and adversarial domains. The multi-coil losses are selectively enforced on acquired k-space samples unlike traditional losses in single-coil synthesis models. Comprehensive experiments on retrospectively undersampled multi-contrast brain MRI datasets are provided. Our results demonstrate that ssGAN yields on par performance to a supervised model, while outperforming single-coil models trained on coil-combined magnitude images. It also outperforms cascaded reconstruction-synthesis models where a supervised synthesis model is trained following self-supervised reconstruction of undersampled data. Thus, ssGAN holds great promise to improve the feasibility of learning-based multi-contrast MRI synthesis.
Collapse
|
45
|
Lee C, Ha EG, Choi YJ, Jeon KJ, Han SS. Synthesis of T2-weighted images from proton density images using a generative adversarial network in a temporomandibular joint magnetic resonance imaging protocol. Imaging Sci Dent 2022; 52:393-398. [PMID: 36605858 PMCID: PMC9807788 DOI: 10.5624/isd.20220125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/02/2022] [Accepted: 09/24/2022] [Indexed: 11/07/2022] Open
Abstract
Purpose This study proposed a generative adversarial network (GAN) model for T2-weighted image (WI) synthesis from proton density (PD)-WI in a temporomandibular joint (TMJ) magnetic resonance imaging (MRI) protocol. Materials and Methods From January to November 2019, MRI scans for TMJ were reviewed and 308 imaging sets were collected. For training, 277 pairs of PD- and T2-WI sagittal TMJ images were used. Transfer learning of the pix2pix GAN model was utilized to generate T2-WI from PD-WI. Model performance was evaluated with the structural similarity index map (SSIM) and peak signal-to-noise ratio (PSNR) indices for 31 predicted T2-WI (pT2). The disc position was clinically diagnosed as anterior disc displacement with or without reduction, and joint effusion as present or absent. The true T2-WI-based diagnosis was regarded as the gold standard, to which pT2-based diagnoses were compared using Cohen's ĸ coefficient. Results The mean SSIM and PSNR values were 0.4781(±0.0522) and 21.30(±1.51) dB, respectively. The pT2 protocol showed almost perfect agreement (ĸ=0.81) with the gold standard for disc position. The number of discordant cases was higher for normal disc position (17%) than for anterior displacement with reduction (2%) or without reduction (10%). The effusion diagnosis also showed almost perfect agreement (ĸ=0.88), with higher concordance for the presence (85%) than for the absence (77%) of effusion. Conclusion The application of pT2 images for a TMJ MRI protocol useful for diagnosis, although the image quality of pT2 was not fully satisfactory. Further research is expected to enhance pT2 quality.
Collapse
Affiliation(s)
- Chena Lee
- Department of Oral and Maxillofacial Radiology, Yonsei University College of Dentistry, Seoul, Korea
| | - Eun-Gyu Ha
- Department of Oral and Maxillofacial Radiology, Yonsei University College of Dentistry, Seoul, Korea
| | - Yoon Joo Choi
- Department of Oral and Maxillofacial Radiology, Yonsei University College of Dentistry, Seoul, Korea
| | - Kug Jin Jeon
- Department of Oral and Maxillofacial Radiology, Yonsei University College of Dentistry, Seoul, Korea
| | - Sang-Sun Han
- Department of Oral and Maxillofacial Radiology, Yonsei University College of Dentistry, Seoul, Korea
| |
Collapse
|
46
|
A Systematic Literature Review on Applications of GAN-Synthesized Images for Brain MRI. FUTURE INTERNET 2022. [DOI: 10.3390/fi14120351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
With the advances in brain imaging, magnetic resonance imaging (MRI) is evolving as a popular radiological tool in clinical diagnosis. Deep learning (DL) methods can detect abnormalities in brain images without an extensive manual feature extraction process. Generative adversarial network (GAN)-synthesized images have many applications in this field besides augmentation, such as image translation, registration, super-resolution, denoising, motion correction, segmentation, reconstruction, and contrast enhancement. The existing literature was reviewed systematically to understand the role of GAN-synthesized dummy images in brain disease diagnosis. Web of Science and Scopus databases were extensively searched to find relevant studies from the last 6 years to write this systematic literature review (SLR). Predefined inclusion and exclusion criteria helped in filtering the search results. Data extraction is based on related research questions (RQ). This SLR identifies various loss functions used in the above applications and software to process brain MRIs. A comparative study of existing evaluation metrics for GAN-synthesized images helps choose the proper metric for an application. GAN-synthesized images will have a crucial role in the clinical sector in the coming years, and this paper gives a baseline for other researchers in the field.
Collapse
|
47
|
Kim E, Cho HH, Kwon J, Oh YT, Ko ES, Park H. Tumor-Attentive Segmentation-Guided GAN for Synthesizing Breast Contrast-Enhanced MRI Without Contrast Agents. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2022; 11:32-43. [PMID: 36478773 PMCID: PMC9721354 DOI: 10.1109/jtehm.2022.3221918] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/25/2022] [Accepted: 11/10/2022] [Indexed: 11/16/2022]
Abstract
OBJECTIVE Breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a sensitive imaging technique critical for breast cancer diagnosis. However, the administration of contrast agents poses a potential risk. This can be avoided if contrast-enhanced MRI can be obtained without using contrast agents. Thus, we aimed to generate T1-weighted contrast-enhanced MRI (ceT1) images from pre-contrast T1 weighted MRI (preT1) images in the breast. METHODS We proposed a generative adversarial network to synthesize ceT1 from preT1 breast images that adopted a local discriminator and segmentation task network to focus specifically on the tumor region in addition to the whole breast. The segmentation network performed a related task of segmentation of the tumor region, which allowed important tumor-related information to be enhanced. In addition, edge maps were included to provide explicit shape and structural information. Our approach was evaluated and compared with other methods in the local (n = 306) and external validation (n = 140) cohorts. Four evaluation metrics of normalized mean squared error (NRMSE), Pearson cross-correlation coefficients (CC), peak signal-to-noise ratio (PSNR), and structural similarity index map (SSIM) for the whole breast and tumor region were measured. An ablation study was performed to evaluate the incremental benefits of various components in our approach. RESULTS Our approach performed the best with an NRMSE 25.65, PSNR 54.80 dB, SSIM 0.91, and CC 0.88 on average, in the local test set. CONCLUSION Performance gains were replicated in the validation cohort. SIGNIFICANCE We hope that our method will help patients avoid potentially harmful contrast agents. Clinical and Translational Impact Statement-Contrast agents are necessary to obtain DCE-MRI which is essential in breast cancer diagnosis. However, administration of contrast agents may cause side effects such as nephrogenic systemic fibrosis and risk of toxic residue deposits. Our approach can generate DCE-MRI without contrast agents using a generative deep neural network. Thus, our approach could help patients avoid potentially harmful contrast agents resulting in an improved diagnosis and treatment workflow for breast cancer.
Collapse
Affiliation(s)
- Eunjin Kim
- Department of Electrical and Computer EngineeringSungkyunkwan UniversitySuwon16419South Korea
| | - Hwan-Ho Cho
- Department of Electrical and Computer EngineeringSungkyunkwan UniversitySuwon16419South Korea
- Department of Medical Aritifical IntelligenceKonyang UniversityDaejon35365South Korea
| | - Junmo Kwon
- Department of Electrical and Computer EngineeringSungkyunkwan UniversitySuwon16419South Korea
| | - Young-Tack Oh
- Department of Electrical and Computer EngineeringSungkyunkwan UniversitySuwon16419South Korea
| | - Eun Sook Ko
- Samsung Medical CenterDepartment of Radiology, School of MedicineSungkyunkwan UniversitySeoul06351South Korea
| | - Hyunjin Park
- School of Electronic and Electrical EngineeringSungkyunkwan UniversitySuwon16419South Korea
- Center for Neuroscience Imaging ResearchInstitute for Basic ScienceSuwon16419South Korea
| |
Collapse
|
48
|
Li J, Qu Z, Yang Y, Zhang F, Li M, Hu S. TCGAN: a transformer-enhanced GAN for PET synthetic CT. BIOMEDICAL OPTICS EXPRESS 2022; 13:6003-6018. [PMID: 36733758 PMCID: PMC9872870 DOI: 10.1364/boe.467683] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 08/06/2022] [Accepted: 10/05/2022] [Indexed: 06/18/2023]
Abstract
Multimodal medical images can be used in a multifaceted approach to resolve a wide range of medical diagnostic problems. However, these images are generally difficult to obtain due to various limitations, such as cost of capture and patient safety. Medical image synthesis is used in various tasks to obtain better results. Recently, various studies have attempted to use generative adversarial networks for missing modality image synthesis, making good progress. In this study, we propose a generator based on a combination of transformer network and a convolutional neural network (CNN). The proposed method can combine the advantages of transformers and CNNs to promote a better detail effect. The network is designed for positron emission tomography (PET) to computer tomography synthesis, which can be used for PET attenuation correction. We also experimented on two datasets for magnetic resonance T1- to T2-weighted image synthesis. Based on qualitative and quantitative analyses, our proposed method outperforms the existing methods.
Collapse
Affiliation(s)
- Jitao Li
- College of Information Science and Engineering, Linyi University, Linyi, 276000, China
- College of Chemistry and Chemical Engineering, Linyi University, Linyi, 276000, China
- These authors contributed equally
| | - Zongjin Qu
- College of Chemistry and Chemical Engineering, Linyi University, Linyi, 276000, China
- These authors contributed equally
| | - Yue Yang
- College of Information Science and Engineering, Linyi University, Linyi, 276000, China
| | - Fuchun Zhang
- College of Information Science and Engineering, Linyi University, Linyi, 276000, China
| | - Meng Li
- College of Information Science and Engineering, Linyi University, Linyi, 276000, China
| | - Shunbo Hu
- College of Information Science and Engineering, Linyi University, Linyi, 276000, China
| |
Collapse
|
49
|
Iwayama M, Wu S, Liu C, Yoshida R. Functional Output Regression for Machine Learning in Materials Science. J Chem Inf Model 2022; 62:4837-4851. [PMID: 36216342 PMCID: PMC9597664 DOI: 10.1021/acs.jcim.2c00626] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In recent years, there has been a rapid growth in the use of machine learning in material science. Conventionally, a trained predictive model describes a scalar output variable, such as thermodynamic, electronic, or mechanical properties, as a function of input descriptors that vectorize the compositional or structural features of any given material, such as molecules, chemical compositions, or crystalline systems. In machine learning of material data, on the other hand, the output variable is often given as a function. For example, when predicting the optical absorption spectrum of a molecule, the output variable is a spectral function defined in the wavelength domain. Alternatively, in predicting the microstructure of a polymer nanocomposite, the output variable is given as an image from an electron microscope, which can be represented as a two- or three-dimensional function in the image coordinate system. In this study, we consider two unified frameworks to handle such multidimensional or functional output regressions, which are applicable to a wide range of predictive analyses in material science. The first approach employs generative adversarial networks, which are known to exhibit outstanding performance in various computer vision tasks such as image generation, style transfer, and video generation. We also present another type of statistical modeling inspired by a statistical methodology referred to as functional data analysis. This is an extension of kernel regression to deal with functional outputs, and its simple mathematical structure makes it effective in modeling even with small amounts of data. We demonstrate the proposed methods through several case studies in materials science.
Collapse
Affiliation(s)
- Megumi Iwayama
- Department of Statistical Science, The Graduate University for Advanced Studies, Tachikawa190-8562, Japan.,Production Management Headquarters, Process Technology Division, Daicel Corporation, Himeji671-1283, Japan
| | - Stephen Wu
- Department of Statistical Science, The Graduate University for Advanced Studies, Tachikawa190-8562, Japan.,Research Organization of Information and Systems, The Institute of Statistical Mathematics, Tachikawa190-8562, Japan
| | - Chang Liu
- Research Organization of Information and Systems, The Institute of Statistical Mathematics, Tachikawa190-8562, Japan
| | - Ryo Yoshida
- Department of Statistical Science, The Graduate University for Advanced Studies, Tachikawa190-8562, Japan.,Research Organization of Information and Systems, The Institute of Statistical Mathematics, Tachikawa190-8562, Japan.,Research and Service Division of Materials Data and Integrated System, National Institute for Materials Science, Tsukuba305-0047, Japan
| |
Collapse
|
50
|
Dalmaz O, Yurt M, Cukur T. ResViT: Residual Vision Transformers for Multimodal Medical Image Synthesis. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:2598-2614. [PMID: 35436184 DOI: 10.1109/tmi.2022.3167808] [Citation(s) in RCA: 121] [Impact Index Per Article: 40.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Generative adversarial models with convolutional neural network (CNN) backbones have recently been established as state-of-the-art in numerous medical image synthesis tasks. However, CNNs are designed to perform local processing with compact filters, and this inductive bias compromises learning of contextual features. Here, we propose a novel generative adversarial approach for medical image synthesis, ResViT, that leverages the contextual sensitivity of vision transformers along with the precision of convolution operators and realism of adversarial learning. ResViT's generator employs a central bottleneck comprising novel aggregated residual transformer (ART) blocks that synergistically combine residual convolutional and transformer modules. Residual connections in ART blocks promote diversity in captured representations, while a channel compression module distills task-relevant information. A weight sharing strategy is introduced among ART blocks to mitigate computational burden. A unified implementation is introduced to avoid the need to rebuild separate synthesis models for varying source-target modality configurations. Comprehensive demonstrations are performed for synthesizing missing sequences in multi-contrast MRI, and CT images from MRI. Our results indicate superiority of ResViT against competing CNN- and transformer-based methods in terms of qualitative observations and quantitative metrics.
Collapse
|