1
|
Ibrahim M, Khalil YA, Amirrajab S, Sun C, Breeuwer M, Pluim J, Elen B, Ertaylan G, Dumontier M. Generative AI for synthetic data across multiple medical modalities: A systematic review of recent developments and challenges. Comput Biol Med 2025; 189:109834. [PMID: 40023073 DOI: 10.1016/j.compbiomed.2025.109834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 01/03/2025] [Accepted: 02/08/2025] [Indexed: 03/04/2025]
Abstract
This paper presents a comprehensive systematic review of generative models (GANs, VAEs, DMs, and LLMs) used to synthesize various medical data types, including imaging (dermoscopic, mammographic, ultrasound, CT, MRI, and X-ray), text, time-series, and tabular data (EHR). Unlike previous narrowly focused reviews, our study encompasses a broad array of medical data modalities and explores various generative models. Our aim is to offer insights into their current and future applications in medical research, particularly in the context of synthesis applications, generation techniques, and evaluation methods, as well as providing a GitHub repository as a dynamic resource for ongoing collaboration and innovation. Our search strategy queries databases such as Scopus, PubMed, and ArXiv, focusing on recent works from January 2021 to November 2023, excluding reviews and perspectives. This period emphasizes recent advancements beyond GANs, which have been extensively covered in previous reviews. The survey also emphasizes the aspect of conditional generation, which is not focused on in similar work. Key contributions include a broad, multi-modality scope that identifies cross-modality insights and opportunities unavailable in single-modality surveys. While core generative techniques are transferable, we find that synthesis methods often lack sufficient integration of patient-specific context, clinical knowledge, and modality-specific requirements tailored to the unique characteristics of medical data. Conditional models leveraging textual conditioning and multimodal synthesis remain underexplored but offer promising directions for innovation. Our findings are structured around three themes: (1) Synthesis applications, highlighting clinically valid synthesis applications and significant gaps in using synthetic data beyond augmentation, such as for validation and evaluation; (2) Generation techniques, identifying gaps in personalization and cross-modality innovation; and (3) Evaluation methods, revealing the absence of standardized benchmarks, the need for large-scale validation, and the importance of privacy-aware, clinically relevant evaluation frameworks. These findings emphasize the need for benchmarking and comparative studies to promote openness and collaboration.
Collapse
Affiliation(s)
- Mahmoud Ibrahim
- Institute of Data Science, Faculty of Science and Engineering, Maastricht University, Maastricht, The Netherlands; Department of Advanced Computing Sciences, Faculty of Science and Engineering, Maastricht University, Maastricht, The Netherlands; VITO, Belgium.
| | - Yasmina Al Khalil
- Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Sina Amirrajab
- Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Chang Sun
- Institute of Data Science, Faculty of Science and Engineering, Maastricht University, Maastricht, The Netherlands; Department of Advanced Computing Sciences, Faculty of Science and Engineering, Maastricht University, Maastricht, The Netherlands
| | - Marcel Breeuwer
- Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Josien Pluim
- Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
| | | | | | - Michel Dumontier
- Institute of Data Science, Faculty of Science and Engineering, Maastricht University, Maastricht, The Netherlands; Department of Advanced Computing Sciences, Faculty of Science and Engineering, Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
2
|
Pandey PU, Micieli JA, Ong Tone S, Eng KT, Kertes PJ, Wong JCY. Realistic fundus photograph generation for improving automated disease classification. Br J Ophthalmol 2025:bjo-2024-326122. [PMID: 39939121 DOI: 10.1136/bjo-2024-326122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Accepted: 01/09/2025] [Indexed: 02/14/2025]
Abstract
AIMS This study aims to investigate whether denoising diffusion probabilistic models (DDPMs) could generate realistic retinal images, and if they could be used to improve the performance of a deep convolutional neural network (CNN) ensemble for multiple retinal disease classification, which was previously shown to outperform human experts. METHODS We trained DDPMs to generate retinal fundus images representing diabetic retinopathy, age-related macular degeneration, glaucoma or normal eyes. Eight board-certified ophthalmologists evaluated 96 test images to assess the realism of generated images and classified them based on disease labels. Subsequently, between 100 and 1000 generated images were employed to augment training of deep convolutional ensembles for classifying retinal disease. We measured the accuracy of ophthalmologists in correctly identifying real and generated images. We also measured the classification accuracy, F-score and area under the receiver operating curve of a trained CNN in classifying retinal diseases from a test set of 100 fundus images. RESULTS Ophthalmologists exhibited a mean accuracy of 61.1% (range: 51.0%-68.8%) in differentiating real and generated images. Augmenting the training set with 238 generated images in the smallest class statistically significantly improved the F-score and accuracy by 5.3% and 5.8%, respectively (p<0.01) in a retinal disease classification task, compared with a baseline model trained only with real images. CONCLUSIONS Latent diffusion models generated highly realistic retinal images, as validated by human experts. Adding generated images to the training set improved performance of a CNN ensemble without requiring additional real patient data.
Collapse
Affiliation(s)
- Prashant U Pandey
- School of Biomedical Engineering, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Jonathan A Micieli
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
- Kensington Vision and Research Centre and Kensington Research Institute, Toronto, Ontario, Canada
- Department of Ophthalmology, St. Michael's Hospital, Unity Health, Toronto, Ontario, Canada
| | - Stephan Ong Tone
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
- Kensington Vision and Research Centre and Kensington Research Institute, Toronto, Ontario, Canada
- Department of Ophthalmology and Vision Sciences, Sunnybrook Research Institute, Toronto, Ontario, Canada
- John and Liz Tory Eye Centre, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
| | - Kenneth T Eng
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
- John and Liz Tory Eye Centre, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
| | - Peter J Kertes
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
- Kensington Vision and Research Centre and Kensington Research Institute, Toronto, Ontario, Canada
- John and Liz Tory Eye Centre, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
| | - Jovi C Y Wong
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
3
|
Tourdias T, Bani-Sadr A, Lecler A. Can generative T2*-weighted images replace true T2*-weighted images in brain MRI? Diagn Interv Imaging 2025:S2211-5684(25)00071-3. [PMID: 40204535 DOI: 10.1016/j.diii.2025.03.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2025] [Accepted: 03/31/2025] [Indexed: 04/11/2025]
Affiliation(s)
- Thomas Tourdias
- Univ. Bordeaux, INSERM, Neurocentre Magendie, U1215, 33000 Bordeaux, France; CHU de Bordeaux, Neuroimagerie Diagnostique et Thérapeutique, 33000 Bordeaux, France.
| | - Alexandre Bani-Sadr
- Department of Neuroradiology, Neurological Hospital, Hospices Civils de Lyon, 69029 Bron, France; Univ. Lyon 1, CREATIS Laboratory, CNRS 5220 - UMR U1294, 69100 Villeurbanne, France
| | - Augustin Lecler
- Université Paris Cité, Faculté de Médecine, 75006 Paris, France; Department of Neuroradiology, Hôpital Fondation Adolphe de Rothschild, 75019 Paris, France
| |
Collapse
|
4
|
Liu W, Kim HG. The visual communication using generative artificial intelligence in the context of new media. Sci Rep 2025; 15:11577. [PMID: 40185848 PMCID: PMC11971437 DOI: 10.1038/s41598-025-96869-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 04/01/2025] [Indexed: 04/07/2025] Open
Abstract
The purpose of this work is to explore methods of visual communication based on generative artificial intelligence in the context of new media. This work proposes an image automatic generation and recognition model that integrates the Conditional Generative Adversarial Network (CGAN) with the Transformer algorithm. The generator component of the model takes noise vectors and conditional variables as inputs. Subsequently, a Transformer module is incorporated, where the multi-head self-attention mechanism enables the model to establish complex relationships among different data points. This is further refined through linear transformations and activation functions to enhance feature representations. Ultimately, the self-attention mechanism captures the long-range dependencies within images, facilitating the generation of high-quality images that meet specific conditions. The model's performance is assessed, and the findings show that the accuracy of the proposed model reaches 95.69%, exceeding the baseline algorithm Generative Adversarial Network by more than 4%. Additionally, the Peak Signal-to-Noise Ratio of the model's algorithm is 33dB, and the Structural Similarity Index is 0.83, indicating higher image generation quality and recognition accuracy. Therefore, the model proposed achieves high recognition and prediction accuracy of generated images, and higher image quality, promising significant application value in visual communication in the new media era.
Collapse
Affiliation(s)
- Weinan Liu
- Graduate School of Advanced Imaging Science, Multimedia and Film, Chung-Ang University, Seoul, 06974, South Korea
| | - Hyung-Gi Kim
- Graduate School of Advanced Imaging Science, Multimedia and Film, Chung-Ang University, Seoul, 06974, South Korea.
| |
Collapse
|
5
|
Brodsky V, Ullah E, Bychkov A, Song AH, Walk EE, Louis P, Rasool G, Singh RS, Mahmood F, Bui MM, Parwani AV. Generative Artificial Intelligence in Anatomic Pathology. Arch Pathol Lab Med 2025; 149:298-318. [PMID: 39836377 DOI: 10.5858/arpa.2024-0215-ra] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/20/2024] [Indexed: 01/22/2025]
Abstract
CONTEXT.— Generative artificial intelligence (AI) has emerged as a transformative force in various fields, including anatomic pathology, where it offers the potential to significantly enhance diagnostic accuracy, workflow efficiency, and research capabilities. OBJECTIVE.— To explore the applications, benefits, and challenges of generative AI in anatomic pathology, with a focus on its impact on diagnostic processes, workflow efficiency, education, and research. DATA SOURCES.— A comprehensive review of current literature and recent advancements in the application of generative AI within anatomic pathology, categorized into unimodal and multimodal applications, and evaluated for clinical utility, ethical considerations, and future potential. CONCLUSIONS.— Generative AI demonstrates significant promise in various domains of anatomic pathology, including diagnostic accuracy enhanced through AI-driven image analysis, virtual staining, and synthetic data generation; workflow efficiency, with potential for improvement by automating routine tasks, quality control, and reflex testing; education and research, facilitated by AI-generated educational content, synthetic histology images, and advanced data analysis methods; and clinical integration, with preliminary surveys indicating cautious optimism for nondiagnostic AI tasks and growing engagement in academic settings. Ethical and practical challenges require rigorous validation, prompt engineering, federated learning, and synthetic data generation to help ensure trustworthy, reliable, and unbiased AI applications. Generative AI can potentially revolutionize anatomic pathology, enhancing diagnostic accuracy, improving workflow efficiency, and advancing education and research. Successful integration into clinical practice will require continued interdisciplinary collaboration, careful validation, and adherence to ethical standards to ensure the benefits of AI are realized while maintaining the highest standards of patient care.
Collapse
Affiliation(s)
- Victor Brodsky
- From the Department of Pathology and Immunology, Washington University School of Medicine in St Louis, St Louis, Missouri (Brodsky)
| | - Ehsan Ullah
- the Department of Surgery, Health New Zealand, Counties Manukau, New Zealand (Ullah)
| | - Andrey Bychkov
- the Department of Pathology, Kameda Medical Center, Kamogawa City, Chiba Prefecture, Japan (Bychkov)
- the Department of Pathology, Nagasaki University, Nagasaki, Japan (Bychkov)
| | - Andrew H Song
- the Department of Pathology, Brigham and Women's Hospital, Boston, Massachusetts (Song, Mahmood)
| | - Eric E Walk
- Office of the Chief Medical Officer, PathAI, Boston, Massachusetts (Walk)
| | - Peter Louis
- the Department of Pathology and Laboratory Medicine, Rutgers Robert Wood Johnson Medical School, New Brunswick, New Jersey (Louis)
| | - Ghulam Rasool
- the Department of Oncologic Sciences, Morsani College of Medicine and Department of Electrical Engineering, University of South Florida, Tampa (Rasool)
- the Department of Machine Learning, Moffitt Cancer Center and Research Institute, Tampa, Florida (Rasool)
- Department of Machine Learning, Neuro-Oncology, Moffitt Cancer Center and Research Institute, Tampa, Florida (Rasool)
| | - Rajendra S Singh
- Dermatopathology and Digital Pathology, Summit Health, Berkley Heights, New Jersey (Singh)
| | - Faisal Mahmood
- the Department of Pathology, Brigham and Women's Hospital, Boston, Massachusetts (Song, Mahmood)
| | - Marilyn M Bui
- Department of Machine Learning, Pathology, Moffitt Cancer Center and Research Institute, Tampa, Florida (Bui)
| | - Anil V Parwani
- the Department of Pathology, The Ohio State University, Columbus (Parwani)
| |
Collapse
|
6
|
Bluethgen C, Chambon P, Delbrouck JB, van der Sluijs R, Połacin M, Zambrano Chaves JM, Abraham TM, Purohit S, Langlotz CP, Chaudhari AS. A vision-language foundation model for the generation of realistic chest X-ray images. Nat Biomed Eng 2025; 9:494-506. [PMID: 39187663 PMCID: PMC11861387 DOI: 10.1038/s41551-024-01246-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 07/28/2024] [Indexed: 08/28/2024]
Abstract
The paucity of high-quality medical imaging datasets could be mitigated by machine learning models that generate compositionally diverse images that faithfully represent medical concepts and pathologies. However, large vision-language models are trained on natural images, and the diversity distribution of the generated images substantially differs from that of medical images. Moreover, medical language involves specific and semantically rich vocabulary. Here we describe a domain-adaptation strategy for large vision-language models that overcomes distributional shifts. Specifically, by leveraging publicly available datasets of chest X-ray images and the corresponding radiology reports, we adapted a latent diffusion model pre-trained on pairs of natural images and text descriptors to generate diverse and visually plausible synthetic chest X-ray images (as confirmed by board-certified radiologists) whose appearance can be controlled with free-form medical text prompts. The domain-adaptation strategy for the text-conditioned synthesis of medical images can be used to augment training datasets and is a viable alternative to the sharing of real medical images for model training and fine-tuning.
Collapse
Affiliation(s)
- Christian Bluethgen
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA.
- Department of Radiology, Stanford University, Palo Alto, CA, USA.
- Diagnostic and Interventional Radiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland.
| | - Pierre Chambon
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Palo Alto, CA, USA
| | - Jean-Benoit Delbrouck
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Palo Alto, CA, USA
| | - Rogier van der Sluijs
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Palo Alto, CA, USA
| | - Małgorzata Połacin
- Department of Radiology, Stanford University, Palo Alto, CA, USA
- Diagnostic and Interventional Radiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Juan Manuel Zambrano Chaves
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | | | | | - Curtis P Langlotz
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | - Akshay S Chaudhari
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| |
Collapse
|
7
|
Okolie A, Dirrichs T, Huck LC, Nebelung S, Arasteh ST, Nolte T, Han T, Kuhl CK, Truhn D. Accelerating breast MRI acquisition with generative AI models. Eur Radiol 2025; 35:1092-1100. [PMID: 39088043 PMCID: PMC11782449 DOI: 10.1007/s00330-024-10853-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 04/27/2024] [Accepted: 06/03/2024] [Indexed: 08/02/2024]
Abstract
OBJECTIVES To investigate the use of the score-based diffusion model to accelerate breast MRI reconstruction. MATERIALS AND METHODS We trained a score-based model on 9549 MRI examinations of the female breast and employed it to reconstruct undersampled MRI images with undersampling factors of 2, 5, and 20. Images were evaluated by two experienced radiologists who rated the images based on their overall quality and diagnostic value on an independent test set of 100 additional MRI examinations. RESULTS The score-based model produces MRI images of high quality and diagnostic value. Both T1- and T2-weighted MRI images could be reconstructed to a high degree of accuracy. Two radiologists rated the images as almost indistinguishable from the original images (rating 4 or 5 on a scale of 5) in 100% (radiologist 1) and 99% (radiologist 2) of cases when the acceleration factor was 2. This fraction dropped to 88% and 70% for an acceleration factor of 5 and to 5% and 21% with an extreme acceleration factor of 20. CONCLUSION Score-based models can reconstruct MRI images at high fidelity, even at comparatively high acceleration factors, but further work on a larger scale of images is needed to ensure that diagnostic quality holds. CLINICAL RELEVANCE STATEMENT The number of MRI examinations of the breast is expected to rise with MRI screening recommended for women with dense breasts. Accelerated image acquisition methods can help in making this examination more accessible. KEY POINTS Accelerating breast MRI reconstruction remains a significant challenge in clinical settings. Score-based diffusion models can achieve near-perfect reconstruction for moderate undersampling factors. Faster breast MRI scans with maintained image quality could revolutionize clinic workflows and patient experience.
Collapse
Affiliation(s)
- Augustine Okolie
- Department of Radiology, University Hospital RWTH Aachen, Aachen, Germany.
| | - Timm Dirrichs
- Department of Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | | | - Sven Nebelung
- Department of Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | | | - Teresa Nolte
- Department of Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Tianyu Han
- Department of Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | | | - Daniel Truhn
- Department of Radiology, University Hospital RWTH Aachen, Aachen, Germany
| |
Collapse
|
8
|
Dohmen M, Klemens MA, Baltruschat IM, Truong T, Lenga M. Similarity and quality metrics for MR image-to-image translation. Sci Rep 2025; 15:3853. [PMID: 39890963 PMCID: PMC11785996 DOI: 10.1038/s41598-025-87358-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Accepted: 01/17/2025] [Indexed: 02/03/2025] Open
Abstract
Image-to-image translation can create large impact in medical imaging, as images can be synthetically transformed to other modalities, sequence types, higher resolutions or lower noise levels. To ensure patient safety, these methods should be validated by human readers, which requires a considerable amount of time and costs. Quantitative metrics can effectively complement such studies and provide reproducible and objective assessment of synthetic images. If a reference is available, the similarity of MR images is frequently evaluated by SSIM and PSNR metrics, even though these metrics are not or too sensitive regarding specific distortions. When reference images to compare with are not available, non-reference quality metrics can reliably detect specific distortions, such as blurriness. To provide an overview on distortion sensitivity, we quantitatively analyze 11 similarity (reference) and 12 quality (non-reference) metrics for assessing synthetic images. We additionally include a metric on a downstream segmentation task. We investigate the sensitivity regarding 11 kinds of distortions and typical MR artifacts, and analyze the influence of different normalization methods on each metric and distortion. Finally, we derive recommendations for effective usage of the analyzed similarity and quality metrics for evaluation of image-to-image translation models.
Collapse
|
9
|
Sriwatana K, Puttanawarut C, Suwan Y, Achakulvisut T. Explainable Deep Learning for Glaucomatous Visual Field Prediction: Artifact Correction Enhances Transformer Models. Transl Vis Sci Technol 2025; 14:22. [PMID: 39847375 PMCID: PMC11758932 DOI: 10.1167/tvst.14.1.22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 12/10/2024] [Indexed: 01/24/2025] Open
Abstract
Purpose The purpose of this study was to develop a deep learning approach that restores artifact-laden optical coherence tomography (OCT) scans and predicts functional loss on the 24-2 Humphrey Visual Field (HVF) test. Methods This cross-sectional, retrospective study used 1674 visual field (VF)-OCT pairs from 951 eyes for training and 429 pairs from 345 eyes for testing. Peripapillary retinal nerve fiber layer (RNFL) thickness map artifacts were corrected using a generative diffusion model. Three convolutional neural networks and 2 transformer-based models were trained on original and artifact-corrected datasets to estimate 54 sensitivity thresholds of the 24-2 HVF test. Results Predictive performances were calculated using root mean square error (RMSE) and mean absolute error (MAE), with explainability evaluated through GradCAM, attention maps, and dimensionality reduction techniques. The Distillation with No Labels (DINO) Vision Transformers (ViT) trained on artifact-corrected datasets achieved the highest accuracy (RMSE, 95% confidence interval [CI] = 4.44, 95% CI = 4.07, 4.82 decibel [dB], MAE = 3.46, 95% CI = 3.14, 3.79 dB), and the greatest interpretability, showing improvements of 0.15 dB in global RMSE and MAE (P < 0.05) compared to the performance on original maps. Feature maps and visualization tools indicate that artifacts compromise DINO-ViT's predictive ability but improve with artifact correction. Conclusions Combining self-supervised ViTs with generative artifact correction enhances the correlation between glaucomatous structures and functions. Translational Relevance Our approach offers a comprehensive tool for glaucoma management, facilitates the exploration of structure-function correlations in research, and underscores the importance of addressing artifacts in the clinical interpretation of OCT.
Collapse
Affiliation(s)
- Kornchanok Sriwatana
- Department of Biomedical Engineering, Faculty of Engineering, Mahidol University, Nakhon Pathom, Thailand
- Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Chanon Puttanawarut
- Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Samut Prakan, Thailand
- Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Yanin Suwan
- Department of Ophthalmology, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Titipat Achakulvisut
- Department of Biomedical Engineering, Faculty of Engineering, Mahidol University, Nakhon Pathom, Thailand
| |
Collapse
|
10
|
Deshpande R, Kelkar VA, Gotsis D, Kc P, Zeng R, Myers KJ, Brooks FJ, Anastasio MA. Report on the AAPM grand challenge on deep generative modeling for learning medical image statistics. Med Phys 2025; 52:4-20. [PMID: 39447007 PMCID: PMC11699998 DOI: 10.1002/mp.17473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 08/22/2024] [Accepted: 09/18/2024] [Indexed: 10/26/2024] Open
Abstract
BACKGROUND The findings of the 2023 AAPM Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics are reported in this Special Report. PURPOSE The goal of this challenge was to promote the development of deep generative models for medical imaging and to emphasize the need for their domain-relevant assessments via the analysis of relevant image statistics. METHODS As part of this Grand Challenge, a common training dataset and an evaluation procedure was developed for benchmarking deep generative models for medical image synthesis. To create the training dataset, an established 3D virtual breast phantom was adapted. The resulting dataset comprised about 108 000 images of size 512 × $\times$ 512. For the evaluation of submissions to the Challenge, an ensemble of 10 000 DGM-generated images from each submission was employed. The evaluation procedure consisted of two stages. In the first stage, a preliminary check for memorization and image quality (via the Fréchet Inception Distance [FID]) was performed. Submissions that passed the first stage were then evaluated for the reproducibility of image statistics corresponding to several feature families including texture, morphology, image moments, fractal statistics, and skeleton statistics. A summary measure in this feature space was employed to rank the submissions. Additional analyses of submissions was performed to assess DGM performance specific to individual feature families, the four classes in the training data, and also to identify various artifacts. RESULTS Fifty-eight submissions from 12 unique users were received for this Challenge. Out of these 12 submissions, 9 submissions passed the first stage of evaluation and were eligible for ranking. The top-ranked submission employed a conditional latent diffusion model, whereas the joint runners-up employed a generative adversarial network, followed by another network for image superresolution. In general, we observed that the overall ranking of the top 9 submissions according to our evaluation method (i) did not match the FID-based ranking, and (ii) differed with respect to individual feature families. Another important finding from our additional analyses was that different DGMs demonstrated similar kinds of artifacts. CONCLUSIONS This Grand Challenge highlighted the need for domain-specific evaluation to further DGM design as well as deployment. It also demonstrated that the specification of a DGM may differ depending on its intended use.
Collapse
Affiliation(s)
- Rucha Deshpande
- Dept. of Biomedical EngineeringWashington University in St. LouisSt. LouisMissouriUSA
| | - Varun A. Kelkar
- Dept. of Electrical and Computer EngineeringUniversity of Illinois Urbana‐ChampaignUrbanaIllinoisUSA
- Present address:
Analog Devices, Inc.BostonMAUSA
| | - Dimitrios Gotsis
- Dept. of Electrical and Computer EngineeringUniversity of Illinois Urbana‐ChampaignUrbanaIllinoisUSA
| | - Prabhat Kc
- Center for Devices and Radiological HealthFood and Drug AdministrationSilver SpringMarylandUSA
| | - Rongping Zeng
- Center for Devices and Radiological HealthFood and Drug AdministrationSilver SpringMarylandUSA
| | | | - Frank J. Brooks
- Center for Label‐free Imaging and Multiscale Biophotonics (CLIMB)University of Illinois Urbana‐ChampaignUrbanaIllinoisUSA
- Dept. of BioengineeringUniversity of Illinois at Urbana‐ChampaignUrbanaIllinoisUSA
| | - Mark A. Anastasio
- Dept. of Electrical and Computer EngineeringUniversity of Illinois Urbana‐ChampaignUrbanaIllinoisUSA
- Center for Label‐free Imaging and Multiscale Biophotonics (CLIMB)University of Illinois Urbana‐ChampaignUrbanaIllinoisUSA
- Dept. of BioengineeringUniversity of Illinois at Urbana‐ChampaignUrbanaIllinoisUSA
| |
Collapse
|
11
|
Shah J, Che Y, Sohankar J, Luo J, Li B, Su Y, Wu T. Enhancing Amyloid PET Quantification: MRI-Guided Super-Resolution Using Latent Diffusion Models. Life (Basel) 2024; 14:1580. [PMID: 39768288 PMCID: PMC11678505 DOI: 10.3390/life14121580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 11/25/2024] [Accepted: 11/26/2024] [Indexed: 01/11/2025] Open
Abstract
Amyloid PET imaging plays a crucial role in the diagnosis and research of Alzheimer's disease (AD), allowing non-invasive detection of amyloid-β plaques in the brain. However, the low spatial resolution of PET scans limits the accurate quantification of amyloid deposition due to partial volume effects (PVE). In this study, we propose a novel approach to addressing PVE using a latent diffusion model for resolution recovery (LDM-RR) of PET imaging. We leverage a synthetic data generation pipeline to create high-resolution PET digital phantoms for model training. The proposed LDM-RR model incorporates a weighted combination of L1, L2, and MS-SSIM losses at both noise and image scales to enhance MRI-guided reconstruction. We evaluated the model's performance in improving statistical power for detecting longitudinal changes and enhancing agreement between amyloid PET measurements from different tracers. The results demonstrate that the LDM-RR approach significantly improves PET quantification accuracy, reduces inter-tracer variability, and enhances the detection of subtle changes in amyloid deposition over time. We show that deep learning has the potential to improve PET quantification in AD, effectively contributing to the early detection and monitoring of disease progression.
Collapse
Affiliation(s)
- Jay Shah
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA; (J.S.); (Y.C.); (B.L.); (T.W.)
- ASU-Mayo Center for Innovative Imaging, Arizona State University, Tempe, AZ 85287, USA
| | - Yiming Che
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA; (J.S.); (Y.C.); (B.L.); (T.W.)
- ASU-Mayo Center for Innovative Imaging, Arizona State University, Tempe, AZ 85287, USA
| | - Javad Sohankar
- Banner Alzheimer’s Institute, Banner Health, Phoenix, AZ 85006, USA; (J.S.); (J.L.)
| | - Ji Luo
- Banner Alzheimer’s Institute, Banner Health, Phoenix, AZ 85006, USA; (J.S.); (J.L.)
| | - Baoxin Li
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA; (J.S.); (Y.C.); (B.L.); (T.W.)
- ASU-Mayo Center for Innovative Imaging, Arizona State University, Tempe, AZ 85287, USA
| | - Yi Su
- Banner Alzheimer’s Institute, Banner Health, Phoenix, AZ 85006, USA; (J.S.); (J.L.)
| | - Teresa Wu
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA; (J.S.); (Y.C.); (B.L.); (T.W.)
- ASU-Mayo Center for Innovative Imaging, Arizona State University, Tempe, AZ 85287, USA
| | | |
Collapse
|
12
|
Akpinar MH, Sengur A, Salvi M, Seoni S, Faust O, Mir H, Molinari F, Acharya UR. Synthetic Data Generation via Generative Adversarial Networks in Healthcare: A Systematic Review of Image- and Signal-Based Studies. IEEE OPEN JOURNAL OF ENGINEERING IN MEDICINE AND BIOLOGY 2024; 6:183-192. [PMID: 39698120 PMCID: PMC11655107 DOI: 10.1109/ojemb.2024.3508472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 11/13/2024] [Accepted: 11/26/2024] [Indexed: 12/20/2024] Open
Abstract
Generative Adversarial Networks (GANs) have emerged as a powerful tool in artificial intelligence, particularly for unsupervised learning. This systematic review analyzes GAN applications in healthcare, focusing on image and signal-based studies across various clinical domains. Following Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines, we reviewed 72 relevant journal articles. Our findings reveal that magnetic resonance imaging (MRI) and electrocardiogram (ECG) signal acquisition techniques were most utilized, with brain studies (22%), cardiology (18%), cancer (15%), ophthalmology (12%), and lung studies (10%) being the most researched areas. We discuss key GAN architectures, including cGAN (31%) and CycleGAN (18%), along with datasets, evaluation metrics, and performance outcomes. The review highlights promising data augmentation, anonymization, and multi-task learning results. We identify current limitations, such as the lack of standardized metrics and direct comparisons, and propose future directions, including the development of no-reference metrics, immersive simulation scenarios, and enhanced interpretability.
Collapse
Affiliation(s)
- Muhammed Halil Akpinar
- Vocational School of Technical SciencesIstanbul University-Cerrahpasa34320IstanbulTürkiye
| | | | - Massimo Salvi
- Department of Electronics and TelecommunicationsPolitecnico di Torino10129TurinItaly
| | - Silvia Seoni
- Department of Electronics and TelecommunicationsPolitecnico di Torino10129TurinItaly
| | - Oliver Faust
- Anglia Ruskin University Cambridge CampusCB1 1PTCambridgeU.K.
| | - Hasan Mir
- American University of SharjahSharjah26666UAE
| | - Filippo Molinari
- Department of Electronics and TelecommunicationsPolitecnico di Torino10129TurinItaly
| | | |
Collapse
|
13
|
Pozzi M, Noei S, Robbi E, Cima L, Moroni M, Munari E, Torresani E, Jurman G. Generating and evaluating synthetic data in digital pathology through diffusion models. Sci Rep 2024; 14:28435. [PMID: 39557989 PMCID: PMC11574254 DOI: 10.1038/s41598-024-79602-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 11/11/2024] [Indexed: 11/20/2024] Open
Abstract
Synthetic data is becoming a valuable tool for computational pathologists, aiding in tasks like data augmentation and addressing data scarcity and privacy. However, its use necessitates careful planning and evaluation to prevent the creation of clinically irrelevant artifacts.This manuscript introduces a comprehensive pipeline for generating and evaluating synthetic pathology data using a diffusion model. The pipeline features a multifaceted evaluation strategy with an integrated explainability procedure, addressing two key aspects of synthetic data use in the medical domain.The evaluation of the generated data employs an ensemble-like approach. The first step includes assessing the similarity between real and synthetic data using established metrics. The second step involves evaluating the usability of the generated images in deep learning models accompanied with explainable AI methods. The final step entails verifying their histopathological realism through questionnaires answered by professional pathologists. We show that each of these evaluation steps are necessary as they provide complementary information on the generated data's quality.The pipeline is demonstrated on the public GTEx dataset of 650 Whole Slide Images (WSIs), including five different tissues. An equal number of tiles from each tissue are generated and their reliability is assessed using the proposed evaluation pipeline, yielding promising results.In summary, the proposed workflow offers a comprehensive solution for generative AI in digital pathology, potentially aiding the community in their transition towards digitalization and data-driven modeling.
Collapse
Affiliation(s)
- Matteo Pozzi
- Data Science for Health Unit, Fondazione Bruno Kessler, Via Sommarive 18, Povo, Trento, 38123, Italy
- Department for Computational and Integrative Biology, Università degli Studi di Trento, Via Sommarive, 9, Povo, Trento, 38123, Italy
| | - Shahryar Noei
- Data Science for Health Unit, Fondazione Bruno Kessler, Via Sommarive 18, Povo, Trento, 38123, Italy
| | - Erich Robbi
- Data Science for Health Unit, Fondazione Bruno Kessler, Via Sommarive 18, Povo, Trento, 38123, Italy
- Department of Information Engineering and Computer Science, Università degli Studi di Trento, Via Sommarive, 9, Povo, Trento, 38123, Italy
| | - Luca Cima
- Department of Diagnostic and Public Health, Section of Pathology, University and Hospital Trust of Verona, Verona, Italy
| | - Monica Moroni
- Data Science for Health Unit, Fondazione Bruno Kessler, Via Sommarive 18, Povo, Trento, 38123, Italy
| | - Enrico Munari
- Department of Diagnostic and Public Health, Section of Pathology, University and Hospital Trust of Verona, Verona, Italy
| | - Evelin Torresani
- Pathology Unit, Department of Laboratory Medicine, Santa Chiara Hospital, APSS, Trento, Italy
| | - Giuseppe Jurman
- Data Science for Health Unit, Fondazione Bruno Kessler, Via Sommarive 18, Povo, Trento, 38123, Italy.
| |
Collapse
|
14
|
Littlefield N, Amirian S, Biehl J, Andrews EG, Kann M, Myers N, Reid L, Yates AJ, McGrory BJ, Parmanto B, Seyler TM, Plate JF, Rashidi HH, Tafti AP. Generative AI in orthopedics: an explainable deep few-shot image augmentation pipeline for plain knee radiographs and Kellgren-Lawrence grading. J Am Med Inform Assoc 2024; 31:2668-2678. [PMID: 39311859 PMCID: PMC11491597 DOI: 10.1093/jamia/ocae246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 08/28/2024] [Accepted: 09/04/2024] [Indexed: 10/22/2024] Open
Abstract
OBJECTIVES Recently, deep learning medical image analysis in orthopedics has become highly active. However, progress has been restricted by the absence of large-scale and standardized ground-truth images. To the best of our knowledge, this study is the first to propose an innovative solution, namely a deep few-shot image augmentation pipeline, that addresses this challenge by synthetically generating knee radiographs for training downstream tasks, with a specific focus on knee osteoarthritis Kellgren-Lawrence (KL) grading. MATERIALS AND METHODS This study leverages a deep few-shot image augmentation pipeline to generate synthetic knee radiographs. Despite the limited availability of training samples, we demonstrate the capability of our proposed computational strategy to produce high-fidelity plain knee radiographs and use them to successfully train a KL grade classifier. RESULTS Our experimental results showcase the effectiveness of the proposed computational pipeline. The generated synthetic radiographs exhibit remarkable fidelity, evidenced by the achieved average Frechet Inception Distance (FID) score of 26.33 for KL grading and 22.538 for bilateral knee radiographs. For KL grading classification, the classifier achieved a test Cohen's Kappa and accuracy of 0.451 and 0.727, respectively. Our computational strategy also resulted in a publicly and freely available imaging dataset of 86 000 synthetic knee radiographs. CONCLUSIONS Our approach demonstrates the capability to produce top-notch synthetic knee radiographs and use them for KL grading classification, even when working with a constrained training dataset. The results obtained emphasize the effectiveness of the pipeline in augmenting datasets for knee osteoarthritis research, opening doors for broader applications in orthopedics, medical image analysis, and AI-powered diagnosis.
Collapse
Affiliation(s)
- Nickolas Littlefield
- Intelligent Systems Program, School of Computing and Information, University of Pittsburgh, Pittsburgh, PA 15260, United States
- Computational Pathology & AI Center of Excellence, University of Pittsburgh, Pittsburgh, PA 15261, United States
| | - Soheyla Amirian
- Seidenberg School of Computer Science and Information Systems, Pace University, New York, NY 10038, United States
| | - Jacob Biehl
- School of Computing and Information, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Edward G Andrews
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Michael Kann
- School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Nicole Myers
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Leah Reid
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Adolph J Yates
- Department of Orthopaedic Surgery, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Brian J McGrory
- Department of Orthopaedic Surgery, Tufts University, Medford, MA 02111, United States
| | - Bambang Parmanto
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Thorsten M Seyler
- Department of Orthopaedic Surgery, Duke University, Durham, NC27560, United States
| | - Johannes F Plate
- Department of Orthopaedic Surgery, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Hooman H Rashidi
- Computational Pathology & AI Center of Excellence, University of Pittsburgh, Pittsburgh, PA 15261, United States
- School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Ahmad P Tafti
- Intelligent Systems Program, School of Computing and Information, University of Pittsburgh, Pittsburgh, PA 15260, United States
- Computational Pathology & AI Center of Excellence, University of Pittsburgh, Pittsburgh, PA 15261, United States
- School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, United States
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
| |
Collapse
|
15
|
Allan-Blitz LT, Ambepitiya S, Prathapa J, Rietmeijer CA, Kularathne Y, Klausner JD. Synergistic pairing of synthetic image generation with disease classification modeling permits rapid digital classification tool development. Sci Rep 2024; 14:25632. [PMID: 39465329 PMCID: PMC11514197 DOI: 10.1038/s41598-024-77565-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 10/23/2024] [Indexed: 10/29/2024] Open
Abstract
Machine-learning disease classification models have the potential to support diagnosis of various diseases. Pairing classification models with synthetic image generation may overcome barriers to developing classification models and permit their use in numerous contexts. Using 10 images of penises with human papilloma virus (HPV)-related disease, we trained a denoising diffusion probabilistic model. Combined with text-to-image generation, we produced 630 synthetic images, of which 500 were deemed plausible by expert clinicians. We used those images to train a Vision Transformer model. We assessed the model's performance on clinical images of HPV-related disease (n = 70), diseases other than HPV (n = 70), and non-diseased images (n = 70), calculating recall, precision, F1-score, and Area Under the Receiver Operating Characteristics Curve (AUC). The model correctly classified 64 of 70 images of HPV-related disease, with a recall of 91.4% (95% CI 82.3%-96.8%). The precision of the model for HPV-related disease was 95.5% (95% CI 87.5%-99.1%), and the F1-score was 93.4%. The AUC for HPV-related disease was 0.99 (95% CI 0.98-1.0). Overall, the HPV-related disease classification model demonstrated excellent performance on clinical images, which was trained exclusively using synthetic images.
Collapse
Affiliation(s)
- Lao-Tzu Allan-Blitz
- Division of Global Health Equity, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA.
| | | | | | | | | | - Jeffrey D Klausner
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90033, USA.
| |
Collapse
|
16
|
Zhou Z, Guo Y, Tang R, Liang H, He J, Xu F. Privacy enhancing and generalizable deep learning with synthetic data for mediastinal neoplasm diagnosis. NPJ Digit Med 2024; 7:293. [PMID: 39427092 PMCID: PMC11490545 DOI: 10.1038/s41746-024-01290-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Accepted: 10/07/2024] [Indexed: 10/21/2024] Open
Abstract
The success of deep learning (DL) relies heavily on training data from which DL models encapsulate information. Consequently, the development and deployment of DL models expose data to potential privacy breaches, which are particularly critical in data-sensitive contexts like medicine. We propose a new technique named DiffGuard that generates realistic and diverse synthetic medical images with annotations, even indistinguishable for experts, to replace real data for DL model training, which cuts off their direct connection and enhances privacy safety. We demonstrate that DiffGuard enhances privacy safety with much less data leakage and better resistance against privacy attacks on data and model. It also improves the accuracy and generalizability of DL models for segmentation and classification of mediastinal neoplasms in multi-center evaluation. We expect that our solution would enlighten the road to privacy-preserving DL for precision medicine, promote data and model sharing, and inspire more innovation on artificial-intelligence-generated-content technologies for medicine.
Collapse
Affiliation(s)
- Zhanping Zhou
- School of Software, Tsinghua University, Beijing, China
- Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing, China
| | - Yuchen Guo
- Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing, China.
| | - Ruijie Tang
- School of Software, Tsinghua University, Beijing, China
- Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing, China
| | - Hengrui Liang
- Department of Thoracic Oncology and Surgery, China State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Disease, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Jianxing He
- Department of Thoracic Oncology and Surgery, China State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Disease, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Feng Xu
- School of Software, Tsinghua University, Beijing, China.
- Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing, China.
| |
Collapse
|
17
|
Guo X, Xiang Y, Yang Y, Ye C, Yu Y, Ma T. Accelerating denoising diffusion probabilistic model via truncated inverse processes for medical image segmentation. Comput Biol Med 2024; 180:108933. [PMID: 39096612 DOI: 10.1016/j.compbiomed.2024.108933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/07/2024] [Accepted: 07/19/2024] [Indexed: 08/05/2024]
Abstract
Medical image segmentation demands precise accuracy and the capability to assess segmentation uncertainty for informed clinical decision-making. Denoising Diffusion Probability Models (DDPMs), with their advancements in image generation, can treat segmentation as a conditional generation task, providing accurate segmentation and uncertainty estimation. However, current DDPMs used in medical image segmentation suffer from low inference efficiency and prediction errors caused by excessive noise at the end of the forward process. To address this issue, we propose an accelerated denoising diffusion probabilistic model via truncated inverse processes (ADDPM) that is specifically designed for medical image segmentation. The inverse process of ADDPM starts from a non-Gaussian distribution and terminates early once a prediction with relatively low noise is obtained after multiple iterations of denoising. We employ a separate powerful segmentation network to obtain pre-segmentation and construct the non-Gaussian distribution of the segmentation based on the forward diffusion rule. By further adopting a separate denoising network, the final segmentation can be obtained with just one denoising step from the predictions with low noise. ADDPM greatly reduces the number of denoising steps to approximately one-tenth of that in vanilla DDPMs. Our experiments on four segmentation tasks demonstrate that ADDPM outperforms both vanilla DDPMs and existing representative accelerating DDPMs methods. Moreover, ADDPM can be easily integrated with existing advanced segmentation models to improve segmentation performance and provide uncertainty estimation. Implementation code: https://github.com/Guoxt/ADDPM.
Collapse
Affiliation(s)
- Xutao Guo
- School of Electronics and Information Engineering, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China; The Peng Cheng Laboratory, Shenzhen, Guangdong, China
| | - Yang Xiang
- The Peng Cheng Laboratory, Shenzhen, Guangdong, China
| | - Yanwu Yang
- School of Electronics and Information Engineering, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China; The Peng Cheng Laboratory, Shenzhen, Guangdong, China
| | - Chenfei Ye
- The International Research Institute for Artificial Intelligence, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China
| | - Yue Yu
- The Peng Cheng Laboratory, Shenzhen, Guangdong, China
| | - Ting Ma
- School of Electronics and Information Engineering, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China; The Peng Cheng Laboratory, Shenzhen, Guangdong, China; Guangdong Provincial Key Laboratory of Aerospace Communication and Networking Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China.
| |
Collapse
|
18
|
Gao Y, Xie H, Chang CW, Peng J, Pan S, Qiu RL, Wang T, Ghavidel B, Roper J, Zhou J, Yang X. CT-based synthetic iodine map generation using conditional denoising diffusion probabilistic model. Med Phys 2024; 51:6246-6258. [PMID: 38889368 PMCID: PMC11489029 DOI: 10.1002/mp.17258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 04/17/2024] [Accepted: 06/03/2024] [Indexed: 06/20/2024] Open
Abstract
BACKGROUND Iodine maps, derived from image-processing of contrast-enhanced dual-energy computed tomography (DECT) scans, highlight the differences in tissue iodine intake. It finds multiple applications in radiology, including vascular imaging, pulmonary evaluation, kidney assessment, and cancer diagnosis. In radiation oncology, it can contribute to designing more accurate and personalized treatment plans. However, DECT scanners are not commonly available in radiation therapy centers. Additionally, the use of iodine contrast agents is not suitable for all patients, especially those allergic to iodine agents, posing further limitations to the accessibility of this technology. PURPOSE The purpose of this work is to generate synthetic iodine map images from non-contrast single-energy CT (SECT) images using conditional denoising diffusion probabilistic model (DDPM). METHODS One-hundered twenty-six head-and-neck patients' images were retrospectively investigated in this work. Each patient underwent non-contrast SECT and contrast DECT scans. Ground truth iodine maps were generated from contrast DECT scans using commercial software syngo.via installed in the clinic. A conditional DDPM was implemented in this work to synthesize iodine maps. Three-fold cross-validation was conducted, with each iteration selecting the data from 42 patients as the test dataset and the remainder as the training dataset. Pixel-to-pixel generative adversarial network (GAN) and CycleGAN served as reference methods for evaluating the proposed DDPM method. RESULTS The accuracy of the proposed DDPM was evaluated using three quantitative metrics: mean absolute error (MAE) (1.039 ± 0.345 mg/mL), structural similarity index measure (SSIM) (0.89 ± 0.10) and peak signal-to-noise ratio (PSNR) (25.4 ± 3.5 db) respectively. Compared to the reference methods, the proposed technique showcased superior performance across the evaluated metrics, further validated by the paired two-tailed t-tests. CONCLUSION The proposed conditional DDPM framework has demonstrated the feasibility of generating synthetic iodine map images from non-contrast SECT images. This method presents a potential clinical application, which is providing accurate iodine contrast map in instances where only non-contrast SECT is accessible.
Collapse
Affiliation(s)
- Yuan Gao
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA
| | - Huiqiao Xie
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Chih-Wei Chang
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA
| | - Junbo Peng
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA
| | - Shaoyan Pan
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA
| | - Richard L.J. Qiu
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA
| | - Tonghe Wang
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Beth Ghavidel
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA
| | - Justin Roper
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA
| | - Jun Zhou
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA
| | - Xiaofeng Yang
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA
| |
Collapse
|
19
|
Gao Y, Qiu RLJ, Xie H, Chang CW, Wang T, Ghavidel B, Roper J, Zhou J, Yang X. CT-based synthetic contrast-enhanced dual-energy CT generation using conditional denoising diffusion probabilistic model. Phys Med Biol 2024; 69:165015. [PMID: 39053511 PMCID: PMC11294926 DOI: 10.1088/1361-6560/ad67a1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 06/26/2024] [Accepted: 07/25/2024] [Indexed: 07/27/2024]
Abstract
Objective.The study aimed to generate synthetic contrast-enhanced Dual-energy CT (CE-DECT) images from non-contrast single-energy CT (SECT) scans, addressing the limitations posed by the scarcity of DECT scanners and the health risks associated with iodinated contrast agents, particularly for high-risk patients.Approach.A conditional denoising diffusion probabilistic model (C-DDPM) was utilized to create synthetic images. Imaging data were collected from 130 head-and-neck (HN) cancer patients who had undergone both non-contrast SECT and CE-DECT scans.Main Results.The performance of the C-DDPM was evaluated using Mean Absolute Error (MAE), Structural Similarity Index (SSIM), and Peak Signal-to-Noise Ratio (PSNR). The results showed MAE values of 27.37±3.35 Hounsfield Units (HU) for high-energy CT (H-CT) and 24.57±3.35HU for low-energy CT (L-CT), SSIM values of 0.74±0.22 for H-CT and 0.78±0.22 for L-CT, and PSNR values of 18.51±4.55 decibels (dB) for H-CT and 18.91±4.55 dB for L-CT.Significance.The study demonstrates the efficacy of the deep learning model in producing high-quality synthetic CE-DECT images, which significantly benefits radiation therapy planning. This approach provides a valuable alternative imaging solution for facilities lacking DECT scanners and for patients who are unsuitable for iodine contrast imaging, thereby enhancing the reach and effectiveness of advanced imaging in cancer treatment planning.
Collapse
Affiliation(s)
- Yuan Gao
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, United States of America
| | - Richard L J Qiu
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, United States of America
| | - Huiqiao Xie
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, United States of America
| | - Chih-Wei Chang
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, United States of America
| | - Tonghe Wang
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, United States of America
| | - Beth Ghavidel
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, United States of America
| | - Justin Roper
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, United States of America
| | - Jun Zhou
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, United States of America
| | - Xiaofeng Yang
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, United States of America
| |
Collapse
|
20
|
Osorio P, Jimenez-Perez G, Montalt-Tordera J, Hooge J, Duran-Ballester G, Singh S, Radbruch M, Bach U, Schroeder S, Siudak K, Vienenkoetter J, Lawrenz B, Mohammadi S. Latent Diffusion Models with Image-Derived Annotations for Enhanced AI-Assisted Cancer Diagnosis in Histopathology. Diagnostics (Basel) 2024; 14:1442. [PMID: 39001331 PMCID: PMC11241396 DOI: 10.3390/diagnostics14131442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 06/19/2024] [Accepted: 06/26/2024] [Indexed: 07/16/2024] Open
Abstract
Artificial Intelligence (AI)-based image analysis has immense potential to support diagnostic histopathology, including cancer diagnostics. However, developing supervised AI methods requires large-scale annotated datasets. A potentially powerful solution is to augment training data with synthetic data. Latent diffusion models, which can generate high-quality, diverse synthetic images, are promising. However, the most common implementations rely on detailed textual descriptions, which are not generally available in this domain. This work proposes a method that constructs structured textual prompts from automatically extracted image features. We experiment with the PCam dataset, composed of tissue patches only loosely annotated as healthy or cancerous. We show that including image-derived features in the prompt, as opposed to only healthy and cancerous labels, improves the Fréchet Inception Distance (FID) by 88.6. We also show that pathologists find it challenging to detect synthetic images, with a median sensitivity/specificity of 0.55/0.55. Finally, we show that synthetic data effectively train AI models.
Collapse
Affiliation(s)
- Pedro Osorio
- Decision Science & Advanced Analytics, Bayer AG, 13353 Berlin, Germany
| | | | | | - Jens Hooge
- Decision Science & Advanced Analytics, Bayer AG, 13353 Berlin, Germany
| | | | - Shivam Singh
- Decision Science & Advanced Analytics, Bayer AG, 13353 Berlin, Germany
| | - Moritz Radbruch
- Pathology and Clinical Pathology, Bayer AG, 13353 Berlin, Germany
| | - Ute Bach
- Pathology and Clinical Pathology, Bayer AG, 13353 Berlin, Germany
| | | | - Krystyna Siudak
- Pathology and Clinical Pathology, Bayer AG, 13353 Berlin, Germany
| | | | - Bettina Lawrenz
- Pathology and Clinical Pathology, Bayer AG, 13353 Berlin, Germany
| | - Sadegh Mohammadi
- Decision Science & Advanced Analytics, Bayer AG, 13353 Berlin, Germany
| |
Collapse
|
21
|
Li S, Jiang X, Tivnan M, Gang GJ, Shen Y, Stayman JW. CT reconstruction using diffusion posterior sampling conditioned on a nonlinear measurement model. J Med Imaging (Bellingham) 2024; 11:043504. [PMID: 39220597 PMCID: PMC11362816 DOI: 10.1117/1.jmi.11.4.043504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 07/21/2024] [Accepted: 07/31/2024] [Indexed: 09/04/2024] Open
Abstract
Purpose Recently, diffusion posterior sampling (DPS), where score-based diffusion priors are combined with likelihood models, has been used to produce high-quality computed tomography (CT) images given low-quality measurements. This technique permits one-time, unsupervised training of a CT prior, which can then be incorporated with an arbitrary data model. However, current methods rely on a linear model of X-ray CT physics to reconstruct. Although it is common to linearize the transmission tomography reconstruction problem, this is an approximation to the true and inherently nonlinear forward model. We propose a DPS method that integrates a general nonlinear measurement model. Approach We implement a traditional unconditional diffusion model by training a prior score function estimator and apply Bayes' rule to combine this prior with a measurement likelihood score function derived from the nonlinear physical model to arrive at a posterior score function that can be used to sample the reverse-time diffusion process. We develop computational enhancements for the approach and evaluate the reconstruction approach in several simulation studies. Results The proposed nonlinear DPS provides improved performance over traditional reconstruction methods and DPS with a linear model. Moreover, as compared with a conditionally trained deep learning approach, the nonlinear DPS approach shows a better ability to provide high-quality images for different acquisition protocols. Conclusion This plug-and-play method allows the incorporation of a diffusion-based prior with a general nonlinear CT measurement model. This permits the application of the approach to different systems, protocols, etc., without the need for any additional training.
Collapse
Affiliation(s)
- Shudong Li
- Johns Hopkins University, Department of Biomedical Engineering, Baltimore, Maryland, United States
- Tsinghua University, Department of Electronic Engineering, Beijing, China
| | - Xiao Jiang
- Johns Hopkins University, Department of Biomedical Engineering, Baltimore, Maryland, United States
| | - Matthew Tivnan
- Massachusetts General Hospital, Harvard Medical School, Department of Radiology, Boston, Massachusetts, United States
| | - Grace J. Gang
- University of Pennsylvania, Department of Radiology, Philadelphia, Pennsylvania, United States
| | - Yuan Shen
- Tsinghua University, Department of Electronic Engineering, Beijing, China
| | - J. Webster Stayman
- Johns Hopkins University, Department of Biomedical Engineering, Baltimore, Maryland, United States
| |
Collapse
|
22
|
Luo Y, Yang Q, Liu Z, Shi Z, Huang W, Zheng G, Cheng J. Target-Guided Diffusion Models for Unpaired Cross-Modality Medical Image Translation. IEEE J Biomed Health Inform 2024; 28:4062-4071. [PMID: 38662561 DOI: 10.1109/jbhi.2024.3393870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]
Abstract
In a clinical setting, the acquisition of certain medical image modality is often unavailable due to various considerations such as cost, radiation, etc. Therefore, unpaired cross-modality translation techniques, which involve training on the unpaired data and synthesizing the target modality with the guidance of the acquired source modality, are of great interest. Previous methods for synthesizing target medical images are to establish one-shot mapping through generative adversarial networks (GANs). As promising alternatives to GANs, diffusion models have recently received wide interests in generative tasks. In this paper, we propose a target-guided diffusion model (TGDM) for unpaired cross-modality medical image translation. For training, to encourage our diffusion model to learn more visual concepts, we adopted a perception prioritized weight scheme (P2W) to the training objectives. For sampling, a pre-trained classifier is adopted in the reverse process to relieve modality-specific remnants from source data. Experiments on both brain MRI-CT and prostate MRI-US datasets demonstrate that the proposed method achieves a visually realistic result that mimics a vivid anatomical section of the target organ. In addition, we have also conducted a subjective assessment based on the synthesized samples to further validate the clinical value of TGDM.
Collapse
|
23
|
Graikos A, Yellapragada S, Le MQ, Kapse S, Prasanna P, Saltz J, Samaras D. Learned representation-guided diffusion models for large-image generation. PROCEEDINGS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2024; 2024:8532-8542. [PMID: 39606708 PMCID: PMC11601131 DOI: 10.1109/cvpr52733.2024.00815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized domains like histopathology and satellite imagery; it is often performed by domain experts and involves hundreds of millions of patches. Modern-day self-supervised learning (SSL) representations encode rich semantic and visual information. In this paper, we posit that such representations are expressive enough to act as proxies to fine-grained human labels. We introduce a novel approach that trains diffusion models conditioned on embeddings from SSL. Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. In addition, we construct larger images by assembling spatially consistent patches inferred from SSL embeddings, preserving long-range dependencies. Augmenting real data by generating variations of real images improves downstream classifier accuracy for patch-level and larger, image-scale classification tasks. Our models are effective even on datasets not encountered during training, demonstrating their robustness and generalizability. Generating images from learned embeddings is agnostic to the source of the embeddings. The SSL embeddings used to generate a large image can either be extracted from a reference image, or sampled from an auxiliary model conditioned on any related modality (e.g. class labels, text, genomic data). As proof of concept, we introduce the text-to-large image synthesis paradigm where we successfully synthesize large pathology and satellite images out of text descriptions.
Collapse
|
24
|
Kikuchi T, Hanaoka S, Nakao T, Takenaga T, Nomura Y, Mori H, Yoshikawa T. Synthesis of Hybrid Data Consisting of Chest Radiographs and Tabular Clinical Records Using Dual Generative Models for COVID-19 Positive Cases. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:1217-1227. [PMID: 38351224 PMCID: PMC11169290 DOI: 10.1007/s10278-024-01015-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 12/21/2023] [Accepted: 12/22/2023] [Indexed: 06/13/2024]
Abstract
To generate synthetic medical data incorporating image-tabular hybrid data by merging an image encoding/decoding model with a table-compatible generative model and assess their utility. We used 1342 cases from the Stony Brook University Covid-19-positive cases, comprising chest X-ray radiographs (CXRs) and tabular clinical data as a private dataset (pDS). We generated a synthetic dataset (sDS) through the following steps: (I) dimensionally reducing CXRs in the pDS using a pretrained encoder of the auto-encoding generative adversarial networks (αGAN) and integrating them with the correspondent tabular clinical data; (II) training the conditional tabular GAN (CTGAN) on this combined data to generate synthetic records, encompassing encoded image features and clinical data; and (III) reconstructing synthetic images from these encoded image features in the sDS using a pretrained decoder of the αGAN. The utility of sDS was assessed by the performance of the prediction models for patient outcomes (deceased or discharged). For the pDS test set, the area under the receiver operating characteristic (AUC) curve was calculated to compare the performance of prediction models trained separately with pDS, sDS, or a combination of both. We created an sDS comprising CXRs with a resolution of 256 × 256 pixels and tabular data containing 13 variables. The AUC for the outcome was 0.83 when the model was trained with the pDS, 0.74 with the sDS, and 0.87 when combining pDS and sDS for training. Our method is effective for generating synthetic records consisting of both images and tabular clinical data.
Collapse
Affiliation(s)
- Tomohiro Kikuchi
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan.
- Department of Radiology, School of Medicine, Jichi Medical University, 3311-1 Yakushiji, Shimotsuke, Tochigi, 329-0498, Japan.
| | - Shouhei Hanaoka
- Department of Radiology, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan
| | - Takahiro Nakao
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Tomomi Takenaga
- Department of Radiology, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan
| | - Yukihiro Nomura
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
- Center for Frontier Medical Engineering, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba, 263-8522, Japan
| | - Harushi Mori
- Department of Radiology, School of Medicine, Jichi Medical University, 3311-1 Yakushiji, Shimotsuke, Tochigi, 329-0498, Japan
| | - Takeharu Yoshikawa
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| |
Collapse
|
25
|
Niehues JM, Müller-Franzes G, Schirris Y, Wagner SJ, Jendrusch M, Kloor M, Pearson AT, Muti HS, Hewitt KJ, Veldhuizen GP, Zigutyte L, Truhn D, Kather JN. Using histopathology latent diffusion models as privacy-preserving dataset augmenters improves downstream classification performance. Comput Biol Med 2024; 175:108410. [PMID: 38678938 DOI: 10.1016/j.compbiomed.2024.108410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 03/23/2024] [Accepted: 04/02/2024] [Indexed: 05/01/2024]
Abstract
Latent diffusion models (LDMs) have emerged as a state-of-the-art image generation method, outperforming previous Generative Adversarial Networks (GANs) in terms of training stability and image quality. In computational pathology, generative models are valuable for data sharing and data augmentation. However, the impact of LDM-generated images on histopathology tasks compared to traditional GANs has not been systematically studied. We trained three LDMs and a styleGAN2 model on histology tiles from nine colorectal cancer (CRC) tissue classes. The LDMs include 1) a fine-tuned version of stable diffusion v1.4, 2) a Kullback-Leibler (KL)-autoencoder (KLF8-DM), and 3) a vector quantized (VQ)-autoencoder deploying LDM (VQF8-DM). We assessed image quality through expert ratings, dimensional reduction methods, distribution similarity measures, and their impact on training a multiclass tissue classifier. Additionally, we investigated image memorization in the KLF8-DM and styleGAN2 models. All models provided a high image quality, with the KLF8-DM achieving the best Frechet Inception Distance (FID) and expert rating scores for complex tissue classes. For simpler classes, the VQF8-DM and styleGAN2 models performed better. Image memorization was negligible for both styleGAN2 and KLF8-DM models. Classifiers trained on a mix of KLF8-DM generated and real images achieved a 4% improvement in overall classification accuracy, highlighting the usefulness of these images for dataset augmentation. Our systematic study of generative methods showed that KLF8-DM produces the highest quality images with negligible image memorization. The higher classifier performance in the generatively augmented dataset suggests that this augmentation technique can be employed to enhance histopathology classifiers for various tasks.
Collapse
Affiliation(s)
- Jan M Niehues
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany
| | - Gustav Müller-Franzes
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Yoni Schirris
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany; Netherlands Cancer Institute, 1066 CX, Amsterdam, the Netherlands; University of Amsterdam, 1012 WP, Amsterdam, the Netherlands
| | - Sophia Janine Wagner
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany; Helmholtz Munich - German Research Center for Environment and Health, Munich, Germany; School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Michael Jendrusch
- Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany
| | - Matthias Kloor
- Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany
| | | | - Hannah Sophie Muti
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany; Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany
| | - Katherine J Hewitt
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany; Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany
| | - Gregory P Veldhuizen
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany; Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany
| | - Laura Zigutyte
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany
| | - Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany; Pathology & Data Analytics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom; Department of Medicine I, University Hospital Dresden, Dresden, Germany; Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany.
| |
Collapse
|
26
|
Fan L, Bang A, Bonomi L. Evaluating Generative Models in Medical Imaging. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2024; 2024:553-555. [PMID: 39464171 PMCID: PMC11508590 DOI: 10.1109/ichi61247.2024.00084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Data synthesis can address important data availability challenges in biomedical informatics. Quantitative evaluation of generative models may help understand their applications to synthesizing biomedical data. This poster paper examines state-of-the-art generative models used in medical imaging, such as StyleGAN and DDPM models, and evaluates their performance in learning data manifolds and in the visible features of generated samples. Results show that existing generative models have much to improve based on the studied measures.
Collapse
Affiliation(s)
- Liyue Fan
- Department of Computer Science, UNC Charlotte, Charlotte, NC
| | | | - Luca Bonomi
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
27
|
Deshpande R, Kelkar VA, Gotsis D, Kc P, Zeng R, Myers KJ, Brooks FJ, Anastasio MA. Report on the AAPM Grand Challenge on deep generative modeling for learning medical image statistics. ARXIV 2024:arXiv:2405.01822v1. [PMID: 38745699 PMCID: PMC11092676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Background The findings of the 2023 AAPM Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics are reported in this Special Report. Purpose The goal of this challenge was to promote the development of deep generative models for medical imaging and to emphasize the need for their domain-relevant assessments via the analysis of relevant image statistics. Methods As part of this Grand Challenge, a common training dataset and an evaluation procedure was developed for benchmarking deep generative models for medical image synthesis. To create the training dataset, an established 3D virtual breast phantom was adapted. The resulting dataset comprised about 108,000 images of size 512×512. For the evaluation of submissions to the Challenge, an ensemble of 10,000 DGM-generated images from each submission was employed. The evaluation procedure consisted of two stages. In the first stage, a preliminary check for memorization and image quality (via the Fréchet Inception Distance (FID)) was performed. Submissions that passed the first stage were then evaluated for the reproducibility of image statistics corresponding to several feature families including texture, morphology, image moments, fractal statistics and skeleton statistics. A summary measure in this feature space was employed to rank the submissions. Additional analyses of submissions was performed to assess DGM performance specific to individual feature families, the four classes in the training data, and also to identify various artifacts. Results Fifty-eight submissions from 12 unique users were received for this Challenge. Out of these 12 submissions, 9 submissions passed the first stage of evaluation and were eligible for ranking. The top-ranked submission employed a conditional latent diffusion model, whereas the joint runners-up employed a generative adversarial network, followed by another network for image superresolution. In general, we observed that the overall ranking of the top 9 submissions according to our evaluation method (i) did not match the FID-based ranking, and (ii) differed with respect to individual feature families. Another important finding from our additional analyses was that different DGMs demonstrated similar kinds of artifacts. Conclusions This Grand Challenge highlighted the need for domain-specific evaluation to further DGM design as well as deployment. It also demonstrated that the specification of a DGM may differ depending on its intended use.
Collapse
Affiliation(s)
- Rucha Deshpande
- Dept. of Biomedical Engineering, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Varun A. Kelkar
- Dept. of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Dimitrios Gotsis
- Dept. of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Prabhat Kc
- Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, Maryland, USA
| | - Rongping Zeng
- Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, Maryland, USA
| | | | - Frank J. Brooks
- Center for Label-free Imaging and Multiscale Biophotonics (CLIMB), University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Dept. of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Mark A. Anastasio
- Dept. of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Center for Label-free Imaging and Multiscale Biophotonics (CLIMB), University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Dept. of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|
28
|
Wang T, Yang X. Take CT, get PET free: AI-powered breakthrough in lung cancer diagnosis and prognosis. Cell Rep Med 2024; 5:101486. [PMID: 38631288 PMCID: PMC11031371 DOI: 10.1016/j.xcrm.2024.101486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 02/21/2024] [Accepted: 03/04/2024] [Indexed: 04/19/2024]
Abstract
PET scans provide additional clinical value but are costly and not universally accessible. Salehjahromi et al.1 developed an AI-based pipeline to synthesize PET images from diagnostic CT scans, demonstrating its potential clinical utility across various clinical tasks for lung cancer.
Collapse
Affiliation(s)
- Tonghe Wang
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| | - Xiaofeng Yang
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA, USA.
| |
Collapse
|
29
|
Zhang B, Xu P, Chen X, Zhuang Q. Generative Quantum Machine Learning via Denoising Diffusion Probabilistic Models. PHYSICAL REVIEW LETTERS 2024; 132:100602. [PMID: 38518310 DOI: 10.1103/physrevlett.132.100602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Accepted: 01/31/2024] [Indexed: 03/24/2024]
Abstract
Deep generative models are key-enabling technology to computer vision, text generation, and large language models. Denoising diffusion probabilistic models (DDPMs) have recently gained much attention due to their ability to generate diverse and high-quality samples in many computer vision tasks, as well as to incorporate flexible model architectures and a relatively simple training scheme. Quantum generative models, empowered by entanglement and superposition, have brought new insight to learning classical and quantum data. Inspired by the classical counterpart, we propose the quantum denoising diffusion probabilistic model (QuDDPM) to enable efficiently trainable generative learning of quantum data. QuDDPM adopts sufficient layers of circuits to guarantee expressivity, while it introduces multiple intermediate training tasks as interpolation between the target distribution and noise to avoid barren plateau and guarantee efficient training. We provide bounds on the learning error and demonstrate QuDDPM's capability in learning correlated quantum noise model, quantum many-body phases, and topological structure of quantum data. The results provide a paradigm for versatile and efficient quantum generative learning.
Collapse
Affiliation(s)
- Bingzhi Zhang
- Department of Physics and Astronomy, University of Southern California, Los Angeles, California 90089, USA
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California 90089, USA
| | - Peng Xu
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois 61820, USA
| | - Xiaohui Chen
- Department of Mathematics, University of Southern California, Los Angeles, California 90089, USA
| | - Quntao Zhuang
- Department of Physics and Astronomy, University of Southern California, Los Angeles, California 90089, USA
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California 90089, USA
| |
Collapse
|
30
|
Posselt C, Avci MY, Yigitsoy M, Schuenke P, Kolbitsch C, Schaeffter T, Remmele S. Simulation of acquisition shifts in T2 weighted fluid-attenuated inversion recovery magnetic resonance images to stress test artificial intelligence segmentation networks. J Med Imaging (Bellingham) 2024; 11:024013. [PMID: 38666039 PMCID: PMC11042016 DOI: 10.1117/1.jmi.11.2.024013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 03/01/2024] [Accepted: 03/29/2024] [Indexed: 04/28/2024] Open
Abstract
Purpose To provide a simulation framework for routine neuroimaging test data, which allows for "stress testing" of deep segmentation networks against acquisition shifts that commonly occur in clinical practice for T2 weighted (T2w) fluid-attenuated inversion recovery magnetic resonance imaging protocols. Approach The approach simulates "acquisition shift derivatives" of MR images based on MR signal equations. Experiments comprise the validation of the simulated images by real MR scans and example stress tests on state-of-the-art multiple sclerosis lesion segmentation networks to explore a generic model function to describe the F1 score in dependence of the contrast-affecting sequence parameters echo time (TE) and inversion time (TI). Results The differences between real and simulated images range up to 19% in gray and white matter for extreme parameter settings. For the segmentation networks under test, the F1 score dependency on TE and TI can be well described by quadratic model functions (R 2 > 0.9 ). The coefficients of the model functions indicate that changes of TE have more influence on the model performance than TI. Conclusions We show that these deviations are in the range of values as may be caused by erroneous or individual differences in relaxation times as described by literature. The coefficients of the F1 model function allow for a quantitative comparison of the influences of TE and TI. Limitations arise mainly from tissues with a low baseline signal (like cerebrospinal fluid) and when the protocol contains contrast-affecting measures that cannot be modeled due to missing information in the DICOM header.
Collapse
Affiliation(s)
- Christiane Posselt
- University of Applied Sciences, Faculty of Electrical and Industrial Engineering, Landshut, Germany
| | | | | | - Patrick Schuenke
- Physikalisch‐Technische Bundesanstalt (PTB), Braunschweig and Berlin, Germany
| | - Christoph Kolbitsch
- Physikalisch‐Technische Bundesanstalt (PTB), Braunschweig and Berlin, Germany
| | - Tobias Schaeffter
- Physikalisch‐Technische Bundesanstalt (PTB), Braunschweig and Berlin, Germany
- Technical University of Berlin, Department of Medical Engineering, Berlin, Germany
| | - Stefanie Remmele
- University of Applied Sciences, Faculty of Electrical and Industrial Engineering, Landshut, Germany
| |
Collapse
|
31
|
Choi JY, Ryu IH, Kim JK, Lee IS, Yoo TK. Development of a generative deep learning model to improve epiretinal membrane detection in fundus photography. BMC Med Inform Decis Mak 2024; 24:25. [PMID: 38273286 PMCID: PMC10811871 DOI: 10.1186/s12911-024-02431-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 01/17/2024] [Indexed: 01/27/2024] Open
Abstract
BACKGROUND The epiretinal membrane (ERM) is a common retinal disorder characterized by abnormal fibrocellular tissue at the vitreomacular interface. Most patients with ERM are asymptomatic at early stages. Therefore, screening for ERM will become increasingly important. Despite the high prevalence of ERM, few deep learning studies have investigated ERM detection in the color fundus photography (CFP) domain. In this study, we built a generative model to enhance ERM detection performance in the CFP. METHODS This deep learning study retrospectively collected 302 ERM and 1,250 healthy CFP data points from a healthcare center. The generative model using StyleGAN2 was trained using single-center data. EfficientNetB0 with StyleGAN2-based augmentation was validated using independent internal single-center data and external datasets. We randomly assigned healthcare center data to the development (80%) and internal validation (20%) datasets. Data from two publicly accessible sources were used as external validation datasets. RESULTS StyleGAN2 facilitated realistic CFP synthesis with the characteristic cellophane reflex features of the ERM. The proposed method with StyleGAN2-based augmentation outperformed the typical transfer learning without a generative adversarial network. The proposed model achieved an area under the receiver operating characteristic (AUC) curve of 0.926 for internal validation. AUCs of 0.951 and 0.914 were obtained for the two external validation datasets. Compared with the deep learning model without augmentation, StyleGAN2-based augmentation improved the detection performance and contributed to the focus on the location of the ERM. CONCLUSIONS We proposed an ERM detection model by synthesizing realistic CFP images with the pathological features of ERM through generative deep learning. We believe that our deep learning framework will help achieve a more accurate detection of ERM in a limited data setting.
Collapse
Affiliation(s)
- Joon Yul Choi
- Department of Biomedical Engineering, Yonsei University, Wonju, South Korea
| | - Ik Hee Ryu
- Department of Refractive Surgery, B&VIIT Eye Center, B2 GT Tower, 1317-23 Seocho-Dong, Seocho-Gu, Seoul, South Korea
- Research and development department, VISUWORKS, Seoul, South Korea
| | - Jin Kuk Kim
- Department of Refractive Surgery, B&VIIT Eye Center, B2 GT Tower, 1317-23 Seocho-Dong, Seocho-Gu, Seoul, South Korea
- Research and development department, VISUWORKS, Seoul, South Korea
| | - In Sik Lee
- Department of Refractive Surgery, B&VIIT Eye Center, B2 GT Tower, 1317-23 Seocho-Dong, Seocho-Gu, Seoul, South Korea
| | - Tae Keun Yoo
- Department of Refractive Surgery, B&VIIT Eye Center, B2 GT Tower, 1317-23 Seocho-Dong, Seocho-Gu, Seoul, South Korea.
- Research and development department, VISUWORKS, Seoul, South Korea.
| |
Collapse
|
32
|
Yellapragada S, Graikos A, Prasanna P, Kurc T, Saltz J, Samaras D. PathLDM: Text conditioned Latent Diffusion Model for Histopathology. IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION. IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION 2024; 2024:5170-5179. [PMID: 38808304 PMCID: PMC11131586 DOI: 10.1109/wacv57701.2024.00510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2024]
Abstract
To achieve high-quality results, diffusion models must be trained on large datasets. This can be notably prohibitive for models in specialized domains, such as computational pathology. Conditioning on labeled data is known to help in data-efficient model training. Therefore, histopathology reports, which are rich in valuable clinical information, are an ideal choice as guidance for a histopathology generative model. In this paper, we introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images. Leveraging the rich contextual information provided by pathology text reports, our approach fuses image and textual data to enhance the generation process. By utilizing GPT's capabilities to distill and summarize complex text reports, we establish an effective conditioning mechanism. Through strategic conditioning and necessary architectural enhancements, we achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
Collapse
|
33
|
Schaudt D, Späte C, von Schwerin R, Reichert M, von Schwerin M, Beer M, Kloth C. A Critical Assessment of Generative Models for Synthetic Data Augmentation on Limited Pneumonia X-ray Data. Bioengineering (Basel) 2023; 10:1421. [PMID: 38136012 PMCID: PMC10741143 DOI: 10.3390/bioengineering10121421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 11/28/2023] [Accepted: 12/12/2023] [Indexed: 12/24/2023] Open
Abstract
In medical imaging, deep learning models serve as invaluable tools for expediting diagnoses and aiding specialized medical professionals in making clinical decisions. However, effectively training deep learning models typically necessitates substantial quantities of high-quality data, a resource often lacking in numerous medical imaging scenarios. One way to overcome this deficiency is to artificially generate such images. Therefore, in this comparative study we train five generative models to artificially increase the amount of available data in such a scenario. This synthetic data approach is evaluated on a a downstream classification task, predicting four causes for pneumonia as well as healthy cases on 1082 chest X-ray images. Quantitative and medical assessments show that a Generative Adversarial Network (GAN)-based approach significantly outperforms more recent diffusion-based approaches on this limited dataset with better image quality and pathological plausibility. We show that better image quality surprisingly does not translate to improved classification performance by evaluating five different classification models and varying the amount of additional training data. Class-specific metrics like precision, recall, and F1-score show a substantial improvement by using synthetic images, emphasizing the data rebalancing effect of less frequent classes. However, overall performance does not improve for most models and configurations, except for a DreamBooth approach which shows a +0.52 improvement in overall accuracy. The large variance of performance impact in this study suggests a careful consideration of utilizing generative models for limited data scenarios, especially with an unexpected negative correlation between image quality and downstream classification improvement.
Collapse
Affiliation(s)
- Daniel Schaudt
- Institute of Databases and Information Systems, Ulm University, James-Franck-Ring, 89081 Ulm, Germany
| | - Christian Späte
- DASU Transferzentrum für Digitalisierung, Analytics und Data Science Ulm, Olgastraße 94, 89073 Ulm, Germany
| | - Reinhold von Schwerin
- Department of Computer Science, Ulm University of Applied Science, Albert–Einstein–Allee 55, 89081 Ulm, Germany
| | - Manfred Reichert
- Institute of Databases and Information Systems, Ulm University, James-Franck-Ring, 89081 Ulm, Germany
| | - Marianne von Schwerin
- Department of Computer Science, Ulm University of Applied Science, Albert–Einstein–Allee 55, 89081 Ulm, Germany
| | - Meinrad Beer
- Department of Radiology, University Hospital of Ulm, Albert–Einstein–Allee 23, 89081 Ulm, Germany
| | - Christopher Kloth
- Department of Radiology, University Hospital of Ulm, Albert–Einstein–Allee 23, 89081 Ulm, Germany
| |
Collapse
|
34
|
Mennella C, Maniscalco U, De Pietro G, Esposito M. Generating a novel synthetic dataset for rehabilitation exercises using pose-guided conditioned diffusion models: A quantitative and qualitative evaluation. Comput Biol Med 2023; 167:107665. [PMID: 37925908 DOI: 10.1016/j.compbiomed.2023.107665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 10/11/2023] [Accepted: 10/31/2023] [Indexed: 11/07/2023]
Abstract
Machine learning has emerged as a promising approach to enhance rehabilitation therapy monitoring and evaluation, providing personalized insights. However, the scarcity of data remains a significant challenge in developing robust machine learning models for rehabilitation. This paper introduces a novel synthetic dataset for rehabilitation exercises, leveraging pose-guided person image generation using conditioned diffusion models. By processing a pre-labeled dataset of class movements for 6 rehabilitation exercises, the described method generates realistic human movement images of elderly subjects engaging in home-based exercises. A total of 22,352 images were generated to accurately capture the spatial consistency of human joint relationships for predefined exercise movements. This novel dataset significantly amplified variability in the physical and demographic attributes of the main subject and the background environment. Quantitative metrics used for image assessment revealed highly favorable results. The generated images successfully maintained intra-class and inter-class consistency in motion data, producing outstanding outcomes with distance correlation values exceeding the 0.90. This innovative approach empowers researchers to enhance the value of existing limited datasets by generating high-fidelity synthetic images that precisely augment the anthropometric and biomechanical attributes of individuals engaged in rehabilitation exercises.
Collapse
Affiliation(s)
- Ciro Mennella
- Institute for High-Performance Computing and Networking (ICAR) - Research National Council of Italy (CNR), Italy.
| | - Umberto Maniscalco
- Institute for High-Performance Computing and Networking (ICAR) - Research National Council of Italy (CNR), Italy.
| | - Giuseppe De Pietro
- Institute for High-Performance Computing and Networking (ICAR) - Research National Council of Italy (CNR), Italy
| | - Massimo Esposito
- Institute for High-Performance Computing and Networking (ICAR) - Research National Council of Italy (CNR), Italy
| |
Collapse
|
35
|
Nouri H, Nasri R, Abtahi SH. Addressing inter-device variations in optical coherence tomography angiography: will image-to-image translation systems help? Int J Retina Vitreous 2023; 9:51. [PMID: 37644613 PMCID: PMC10466880 DOI: 10.1186/s40942-023-00491-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 08/17/2023] [Indexed: 08/31/2023] Open
Abstract
BACKGROUND Optical coherence tomography angiography (OCTA) is an innovative technology providing visual and quantitative data on retinal microvasculature in a non-invasive manner. MAIN BODY Due to variations in the technical specifications of different OCTA devices, there are significant inter-device differences in OCTA data, which can limit their comparability and generalizability. These variations can also result in a domain shift problem that may interfere with applicability of machine learning models on data obtained from different OCTA machines. One possible approach to address this issue may be unsupervised deep image-to-image translation leveraging systems such as Cycle-Consistent Generative Adversarial Networks (Cycle-GANs) and Denoising Diffusion Probabilistic Models (DDPMs). Through training on unpaired images from different device domains, Cycle-GANs and DDPMs may enable cross-domain translation of images. They have been successfully applied in various medical imaging tasks, including segmentation, denoising, and cross-modality image-to-image translation. In this commentary, we briefly describe how Cycle-GANs and DDPMs operate, and review the recent experiments with these models on medical and ocular imaging data. We then discuss the benefits of applying such techniques for inter-device translation of OCTA data and the potential challenges ahead. CONCLUSION Retinal imaging technologies and deep learning-based domain adaptation techniques are rapidly evolving. We suggest exploring the potential of image-to-image translation methods in improving the comparability of OCTA data from different centers or devices. This may facilitate more efficient analysis of heterogeneous data and broader applicability of machine learning models trained on limited datasets in this field.
Collapse
Affiliation(s)
- Hosein Nouri
- Ophthalmic Research Center, Research Institute for Ophthalmology and Vision Science, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
- School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
| | - Reza Nasri
- School of Engineering, University of Isfahan, Isfahan, Iran
| | - Seyed-Hossein Abtahi
- Ophthalmic Research Center, Research Institute for Ophthalmology and Vision Science, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
- Department of Ophthalmology, Torfe Medical Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|