1
|
Carrillo-Perez F, Pizurica M, Zheng Y, Nandi TN, Madduri R, Shen J, Gevaert O. Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models. Nat Biomed Eng 2024:10.1038/s41551-024-01193-8. [PMID: 38514775 DOI: 10.1038/s41551-024-01193-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 02/29/2024] [Indexed: 03/23/2024]
Abstract
Training machine-learning models with synthetically generated data can alleviate the problem of data scarcity when acquiring diverse and sufficiently large datasets is costly and challenging. Here we show that cascaded diffusion models can be used to synthesize realistic whole-slide image tiles from latent representations of RNA-sequencing data from human tumours. Alterations in gene expression affected the composition of cell types in the generated synthetic image tiles, which accurately preserved the distribution of cell types and maintained the cell fraction observed in bulk RNA-sequencing data, as we show for lung adenocarcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, colon adenocarcinoma and glioblastoma. Machine-learning models pretrained with the generated synthetic data performed better than models trained from scratch. Synthetic data may accelerate the development of machine-learning models in scarce-data settings and allow for the imputation of missing data modalities.
Collapse
Affiliation(s)
- Francisco Carrillo-Perez
- Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, Stanford, CA, USA
| | - Marija Pizurica
- Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, Stanford, CA, USA
- Internet technology and Data science Lab (IDLab), Ghent University, Ghent, Belgium
| | - Yuanning Zheng
- Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, Stanford, CA, USA
| | - Tarak Nath Nandi
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA
| | - Ravi Madduri
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA
| | - Jeanne Shen
- Department of Pathology, Stanford University, School of Medicine, Palo Alto, CA, USA
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford University, School of Medicine, Stanford, CA, USA.
| |
Collapse
|
2
|
Vollmer A, Hartmann S, Vollmer M, Shavlokhova V, Brands RC, Kübler A, Wollborn J, Hassel F, Couillard-Despres S, Lang G, Saravi B. Multimodal artificial intelligence-based pathogenomics improves survival prediction in oral squamous cell carcinoma. Sci Rep 2024; 14:5687. [PMID: 38453964 PMCID: PMC10920832 DOI: 10.1038/s41598-024-56172-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 03/03/2024] [Indexed: 03/09/2024] Open
Abstract
In this study, we aimed to develop a novel prognostic algorithm for oral squamous cell carcinoma (OSCC) using a combination of pathogenomics and AI-based techniques. We collected comprehensive clinical, genomic, and pathology data from a cohort of OSCC patients in the TCGA dataset and used machine learning and deep learning algorithms to identify relevant features that are predictive of survival outcomes. Our analyses included 406 OSCC patients. Initial analyses involved gene expression analyses, principal component analyses, gene enrichment analyses, and feature importance analyses. These insights were foundational for subsequent model development. Furthermore, we applied five machine learning/deep learning algorithms (Random Survival Forest, Gradient Boosting Survival Analysis, Cox PH, Fast Survival SVM, and DeepSurv) for survival prediction. Our initial analyses revealed relevant gene expression variations and biological pathways, laying the groundwork for robust feature selection in model building. The results showed that the multimodal model outperformed the unimodal models across all methods, with c-index values of 0.722 for RSF, 0.633 for GBSA, 0.625 for FastSVM, 0.633 for CoxPH, and 0.515 for DeepSurv. When considering only important features, the multimodal model continued to outperform the unimodal models, with c-index values of 0.834 for RSF, 0.747 for GBSA, 0.718 for FastSVM, 0.742 for CoxPH, and 0.635 for DeepSurv. Our results demonstrate the potential of pathogenomics and AI-based techniques in improving the accuracy of prognostic prediction in OSCC, which may ultimately aid in the development of personalized treatment strategies for patients with this devastating disease.
Collapse
Affiliation(s)
- Andreas Vollmer
- Department of Oral and Maxillofacial Plastic Surgery, University Hospital of Würzburg, 97070, Würzburg, Franconia, Germany.
| | - Stefan Hartmann
- Department of Oral and Maxillofacial Plastic Surgery, University Hospital of Würzburg, 97070, Würzburg, Franconia, Germany
| | - Michael Vollmer
- Department of Oral and Maxillofacial Surgery, Tuebingen University Hospital, Osianderstrasse 2-8, 72076, Tuebingen, Germany
| | - Veronika Shavlokhova
- Maxillofacial Surgery University Hospital Ruppin-Brandenburg, Fehrbelliner Straße 38, 16816, Neuruppin, Germany
| | - Roman C Brands
- Department of Oral and Maxillofacial Plastic Surgery, University Hospital of Würzburg, 97070, Würzburg, Franconia, Germany
| | - Alexander Kübler
- Department of Oral and Maxillofacial Plastic Surgery, University Hospital of Würzburg, 97070, Würzburg, Franconia, Germany
| | - Jakob Wollborn
- Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, USA
| | - Frank Hassel
- Department of Spine Surgery, Loretto Hospital, Freiburg, Germany
| | - Sebastien Couillard-Despres
- Institute of Experimental Neuroregeneration, Paracelsus Medical University, 5020, Salzburg, Austria
- Austrian Cluster for Tissue Regeneration, Vienna, Austria
| | - Gernot Lang
- Department of Orthopedics and Trauma Surgery, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Babak Saravi
- Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, USA
- Department of Spine Surgery, Loretto Hospital, Freiburg, Germany
- Institute of Experimental Neuroregeneration, Paracelsus Medical University, 5020, Salzburg, Austria
- Department of Orthopedics and Trauma Surgery, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| |
Collapse
|
3
|
Boubnovski Martell M, Linton-Reid K, Hindocha S, Chen M, Moreno P, Álvarez-Benito M, Salvatierra Á, Lee R, Posma JM, Calzado MA, Aboagye EO. Deep representation learning of tissue metabolome and computed tomography annotates NSCLC classification and prognosis. NPJ Precis Oncol 2024; 8:28. [PMID: 38310164 PMCID: PMC10838282 DOI: 10.1038/s41698-024-00502-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 01/04/2024] [Indexed: 02/05/2024] Open
Abstract
The rich chemical information from tissue metabolomics provides a powerful means to elaborate tissue physiology or tumor characteristics at cellular and tumor microenvironment levels. However, the process of obtaining such information requires invasive biopsies, is costly, and can delay clinical patient management. Conversely, computed tomography (CT) is a clinical standard of care but does not intuitively harbor histological or prognostic information. Furthermore, the ability to embed metabolome information into CT to subsequently use the learned representation for classification or prognosis has yet to be described. This study develops a deep learning-based framework -- tissue-metabolomic-radiomic-CT (TMR-CT) by combining 48 paired CT images and tumor/normal tissue metabolite intensities to generate ten image embeddings to infer metabolite-derived representation from CT alone. In clinical NSCLC settings, we ascertain whether TMR-CT results in an enhanced feature generation model solving histology classification/prognosis tasks in an unseen international CT dataset of 742 patients. TMR-CT non-invasively determines histological classes - adenocarcinoma/squamous cell carcinoma with an F1-score = 0.78 and further asserts patients' prognosis with a c-index = 0.72, surpassing the performance of radiomics models and deep learning on single modality CT feature extraction. Additionally, our work shows the potential to generate informative biology-inspired CT-led features to explore connections between hard-to-obtain tissue metabolic profiles and routine lesion-derived image data.
Collapse
Affiliation(s)
| | | | - Sumeet Hindocha
- Early Diagnosis and Detection Centre, National Institute for Health and Care Research Biomedical Research Centre at the Royal Marsden and Institute of Cancer Research, London, SW3 6JJ, UK
| | - Mitchell Chen
- Imperial College London Hammersmith Campus, London, SW7 2AZ, UK
| | - Paula Moreno
- Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Córdoba, 14004, Spain
- Departamento de Cirugía Toráxica y Trasplante de Pulmón, Hospital Universitario Reina Sofía, Córdoba, 14014, Spain
| | - Marina Álvarez-Benito
- Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Córdoba, 14004, Spain
- Unidad de Radiodiagnóstico y Cáncer de Mama, Hospital Universitario Reina Sofía, Córdoba, 14004, Spain
| | - Ángel Salvatierra
- Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Córdoba, 14004, Spain
- Unidad de Radiodiagnóstico y Cáncer de Mama, Hospital Universitario Reina Sofía, Córdoba, 14004, Spain
| | - Richard Lee
- Early Diagnosis and Detection Centre, National Institute for Health and Care Research Biomedical Research Centre at the Royal Marsden and Institute of Cancer Research, London, SW3 6JJ, UK
- National Heart and Lung Institute, Imperial College London, Guy Scadding Building, Dovehouse Street, London, SW3 6LY, UK
| | - Joram M Posma
- Imperial College London Hammersmith Campus, London, SW7 2AZ, UK
| | - Marco A Calzado
- Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Córdoba, 14004, Spain.
- Departamento de Biología Celular, Fisiología e Inmunología, Universidad de Córdoba, Córdoba, 14014, Spain.
| | - Eric O Aboagye
- Imperial College London Hammersmith Campus, London, SW7 2AZ, UK.
| |
Collapse
|
4
|
Mondello A, Dal Bo M, Toffoli G, Polano M. Machine learning in onco-pharmacogenomics: a path to precision medicine with many challenges. Front Pharmacol 2024; 14:1260276. [PMID: 38264526 PMCID: PMC10803549 DOI: 10.3389/fphar.2023.1260276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 12/26/2023] [Indexed: 01/25/2024] Open
Abstract
Over the past two decades, Next-Generation Sequencing (NGS) has revolutionized the approach to cancer research. Applications of NGS include the identification of tumor specific alterations that can influence tumor pathobiology and also impact diagnosis, prognosis and therapeutic options. Pharmacogenomics (PGx) studies the role of inheritance of individual genetic patterns in drug response and has taken advantage of NGS technology as it provides access to high-throughput data that can, however, be difficult to manage. Machine learning (ML) has recently been used in the life sciences to discover hidden patterns from complex NGS data and to solve various PGx problems. In this review, we provide a comprehensive overview of the NGS approaches that can be employed and the different PGx studies implicating the use of NGS data. We also provide an excursus of the ML algorithms that can exert a role as fundamental strategies in the PGx field to improve personalized medicine in cancer.
Collapse
Affiliation(s)
| | | | | | - Maurizio Polano
- Experimental and Clinical Pharmacology Unit, Centro di Riferimento Oncologico di Aviano (CRO), Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Aviano, Italy
| |
Collapse
|
5
|
Dwivedi K, Rajpal A, Rajpal S, Kumar V, Agarwal M, Kumar N. Enlightening the path to NSCLC biomarkers: Utilizing the power of XAI-guided deep learning. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 243:107864. [PMID: 37866126 DOI: 10.1016/j.cmpb.2023.107864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 10/07/2023] [Accepted: 10/11/2023] [Indexed: 10/24/2023]
Abstract
BACKGROUND AND OBJECTIVE The early diagnosis of Non-small cell lung cancer (NSCLC) is of prime importance to improve the patient's survivability and quality of life. Being a heterogeneous disease at the molecular and cellular level, the biomarkers responsible for the heterogeneity aid in distinguishing NSCLC into its prominent subtypes-adenocarcinoma and squamous cell carcinoma. Moreover, if identified, these biomarkers could pave the path to targeted therapy. Through this work, a novel explainable AI (XAI)-guided deep learning framework is proposed that assists in discovering a set of significant NSCLC-relevant biomarkers using methylation data. METHODS The proposed framework is divided into two blocks- the first block combines an autoencoder and a neural network to classify NSCLC instances. The second block utilizes various eXplainable AI (XAI) methods, namely IntegratedGradients, GradientSHAP, and DeepLIFT, to discover a set of seven significant biomarkers. RESULTS The classification performance of the biomarkers discovered using the proposed framework is evaluated by employing multiple machine learning algorithms, among which the Multilayer Perceptron (MLP) algorithm-based model outperforms others, yielding a 10-fold cross-validation accuracy of 91.53%. An improved accuracy of 96.37% is achieved by integrating RNA-Seq, CNV, and methylation data. On performing statistical analysis using the Friedman and Nemenyi tests, the MLP model is found to be significantly better than other machine learning-based models. Further, the clinical efficacy of the resultant biomarkers is established based on their potential druggability, the likelihood of predicting NSCLC patients' survival, gene-disease association, and biological pathways targeted by them. While the biomarkers C18orf18, CCNT2, THOP1, and TNPO2, are found potentially druggable, the biomarkers CCDC15, SNORA9, THOP1, and TNPO2 are found prognostically relevant. On further analysis, some of the discovered biomarkers are found to be associated with around 104 diseases. Moreover, five KEGG, ten Reactome, and three Wiki pathways are found to be triggered by the biomarkers discovered. CONCLUSION In summary, the proposed framework uncovers a set of clinically effective biomarkers that accurately classify NSCLC. As a future course of work, efforts would be made to combine a variety of omics data with histopathological data to unveil more precise biomarkers for devising personalized therapy.
Collapse
Affiliation(s)
- Kountay Dwivedi
- Department of Computer Science, University of Delhi, Delhi, India.
| | - Ankit Rajpal
- Department of Computer Science, University of Delhi, Delhi, India.
| | - Sheetal Rajpal
- Department of Computer Science, Dyal Singh College, Delhi, India.
| | - Virendra Kumar
- Department of Nuclear Magnetic Resonance, All India Institute of Medical Sciences, New Delhi, India.
| | - Manoj Agarwal
- Department of Computer Science, Hans Raj College, University of Delhi, Delhi, India.
| | - Naveen Kumar
- Department of Computer Science, University of Delhi, Delhi, India.
| |
Collapse
|
6
|
Wang H, Han X, Ren J, Cheng H, Li H, Li Y, Li X. A prognostic prediction model for ovarian cancer using a cross-modal view correlation discovery network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:736-764. [PMID: 38303441 DOI: 10.3934/mbe.2024031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Ovarian cancer is a tumor with different clinicopathological and molecular features, and the vast majority of patients have local or extensive spread at the time of diagnosis. Early diagnosis and prognostic prediction of patients can contribute to the understanding of the underlying pathogenesis of ovarian cancer and the improvement of therapeutic outcomes. The occurrence of ovarian cancer is influenced by multiple complex mechanisms, including the genome, transcriptome and proteome. Different types of omics analysis help predict the survival rate of ovarian cancer patients. Multi-omics data of ovarian cancer exhibit high-dimensional heterogeneity, and existing methods for integrating multi-omics data have not taken into account the variability and inter-correlation between different omics data. In this paper, we propose a deep learning model, MDCADON, which utilizes multi-omics data and cross-modal view correlation discovery network. We introduce random forest into LASSO regression for feature selection on mRNA expression, DNA methylation, miRNA expression and copy number variation (CNV), aiming to select important features highly correlated with ovarian cancer prognosis. A multi-modal deep neural network is used to comprehensively learn feature representations of each omics data and clinical data, and cross-modal view correlation discovery network is employed to construct the multi-omics discovery tensor, exploring the inter-relationships between different omics data. The experimental results demonstrate that MDCADON is superior to the existing methods in predicting ovarian cancer prognosis, which enables survival analysis for patients and facilitates the determination of follow-up treatment plans. Finally, we perform Gene Ontology (GO) term analysis and biological pathway analysis on the genes identified by MDCADON, revealing the underlying mechanisms of ovarian cancer and providing certain support for guiding ovarian cancer treatments.
Collapse
Affiliation(s)
- Huiqing Wang
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Xiao Han
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Jianxue Ren
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Hao Cheng
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Haolin Li
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Ying Li
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Xue Li
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| |
Collapse
|
7
|
Carrillo-Perez F, Pizurica M, Ozawa MG, Vogel H, West RB, Kong CS, Herrera LJ, Shen J, Gevaert O. Synthetic whole-slide image tile generation with gene expression profile-infused deep generative models. CELL REPORTS METHODS 2023; 3:100534. [PMID: 37671024 PMCID: PMC10475789 DOI: 10.1016/j.crmeth.2023.100534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 03/10/2023] [Accepted: 06/22/2023] [Indexed: 09/07/2023]
Abstract
In this work, we propose an approach to generate whole-slide image (WSI) tiles by using deep generative models infused with matched gene expression profiles. First, we train a variational autoencoder (VAE) that learns a latent, lower-dimensional representation of multi-tissue gene expression profiles. Then, we use this representation to infuse generative adversarial networks (GANs) that generate lung and brain cortex tissue tiles, resulting in a new model that we call RNA-GAN. Tiles generated by RNA-GAN were preferred by expert pathologists compared with tiles generated using traditional GANs, and in addition, RNA-GAN needs fewer training epochs to generate high-quality tiles. Finally, RNA-GAN was able to generalize to gene expression profiles outside of the training set, showing imputation capabilities. A web-based quiz is available for users to play a game distinguishing real and synthetic tiles: https://rna-gan.stanford.edu/, and the code for RNA-GAN is available here: https://github.com/gevaertlab/RNA-GAN.
Collapse
Affiliation(s)
- Francisco Carrillo-Perez
- Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, 1265 Welch Road, Stanford, CA 94305-547, USA
- Computer Engineering, Automatics and Robotics Department, University of Granada, C. Periodista Daniel Saucedo Aranda, s/n, Granada, 18014 Granada, Spain
| | - Marija Pizurica
- Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, 1265 Welch Road, Stanford, CA 94305-547, USA
- Internet Technology and Data Science Lab (IDLab), Ghent University, Technologiepark-Zwijnaarde 126, Gent, 9052 Gent, Belgium
| | - Michael G. Ozawa
- Department of Pathology, Stanford University School of Medicine, 300 Pasteur Dr, Palo Alto, CA 94304, USA
| | - Hannes Vogel
- Department of Pathology, Stanford University School of Medicine, 300 Pasteur Dr, Palo Alto, CA 94304, USA
| | - Robert B. West
- Department of Pathology, Stanford University School of Medicine, 300 Pasteur Dr, Palo Alto, CA 94304, USA
| | - Christina S. Kong
- Department of Pathology, Stanford University School of Medicine, 300 Pasteur Dr, Palo Alto, CA 94304, USA
| | - Luis Javier Herrera
- Computer Engineering, Automatics and Robotics Department, University of Granada, C. Periodista Daniel Saucedo Aranda, s/n, Granada, 18014 Granada, Spain
| | - Jeanne Shen
- Department of Pathology, Stanford University School of Medicine, 300 Pasteur Dr, Palo Alto, CA 94304, USA
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, 1265 Welch Road, Stanford, CA 94305-547, USA
- Department of Biomedical Data Science, Stanford University, School of Medicine, Medical School Office Building (MSOB), 1265 Welch Road, Stanford, CA 94305-547, USA
| |
Collapse
|
8
|
Carrillo-Perez F, Pizurica M, Zheng Y, Nandi TN, Madduri R, Shen J, Gevaert O. RNA-to-image multi-cancer synthesis using cascaded diffusion models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.13.523899. [PMID: 36711711 PMCID: PMC9882105 DOI: 10.1101/2023.01.13.523899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Data scarcity presents a significant obstacle in the field of biomedicine, where acquiring diverse and sufficient datasets can be costly and challenging. Synthetic data generation offers a potential solution to this problem by expanding dataset sizes, thereby enabling the training of more robust and generalizable machine learning models. Although previous studies have explored synthetic data generation for cancer diagnosis, they have predominantly focused on single modality settings, such as whole-slide image tiles or RNA-Seq data. To bridge this gap, we propose a novel approach, RNA-Cascaded-Diffusion-Model or RNA-CDM, for performing RNA-to-image synthesis in a multi-cancer context, drawing inspiration from successful text-to-image synthesis models used in natural images. In our approach, we employ a variational auto-encoder to reduce the dimensionality of a patient's gene expression profile, effectively distinguishing between different types of cancer. Subsequently, we employ a cascaded diffusion model to synthesize realistic whole-slide image tiles using the latent representation derived from the patient's RNA-Seq data. Our results demonstrate that the generated tiles accurately preserve the distribution of cell types observed in real-world data, with state-of-the-art cell identification models successfully detecting important cell types in the synthetic samples. Furthermore, we illustrate that the synthetic tiles maintain the cell fraction observed in bulk RNA-Seq data and that modifications in gene expression affect the composition of cell types in the synthetic tiles. Next, we utilize the synthetic data generated by RNA-CDM to pretrain machine learning models and observe improved performance compared to training from scratch. Our study emphasizes the potential usefulness of synthetic data in developing machine learning models in sarce-data settings, while also highlighting the possibility of imputing missing data modalities by leveraging the available information. In conclusion, our proposed RNA-CDM approach for synthetic data generation in biomedicine, particularly in the context of cancer diagnosis, offers a novel and promising solution to address data scarcity. By generating synthetic data that aligns with real-world distributions and leveraging it to pretrain machine learning models, we contribute to the development of robust clinical decision support systems and potential advancements in precision medicine.
Collapse
|
9
|
Steyaert S, Pizurica M, Nagaraj D, Khandelwal P, Hernandez-Boussard T, Gentles AJ, Gevaert O. Multimodal data fusion for cancer biomarker discovery with deep learning. NAT MACH INTELL 2023; 5:351-362. [PMID: 37693852 PMCID: PMC10484010 DOI: 10.1038/s42256-023-00633-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 02/17/2023] [Indexed: 09/12/2023]
Abstract
Technological advances now make it possible to study a patient from multiple angles with high-dimensional, high-throughput multi-scale biomedical data. In oncology, massive amounts of data are being generated ranging from molecular, histopathology, radiology to clinical records. The introduction of deep learning has significantly advanced the analysis of biomedical data. However, most approaches focus on single data modalities leading to slow progress in methods to integrate complementary data types. Development of effective multimodal fusion approaches is becoming increasingly important as a single modality might not be consistent and sufficient to capture the heterogeneity of complex diseases to tailor medical care and improve personalised medicine. Many initiatives now focus on integrating these disparate modalities to unravel the biological processes involved in multifactorial diseases such as cancer. However, many obstacles remain, including lack of usable data as well as methods for clinical validation and interpretation. Here, we cover these current challenges and reflect on opportunities through deep learning to tackle data sparsity and scarcity, multimodal interpretability, and standardisation of datasets.
Collapse
Affiliation(s)
- Sandra Steyaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
| | - Marija Pizurica
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
| | | | | | - Tina Hernandez-Boussard
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| | - Andrew J Gentles
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| |
Collapse
|
10
|
Wang S, Wang S, Wang Z. A survey on multi-omics-based cancer diagnosis using machine learning with the potential application in gastrointestinal cancer. Front Med (Lausanne) 2023; 9:1109365. [PMID: 36703893 PMCID: PMC9871466 DOI: 10.3389/fmed.2022.1109365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 12/28/2022] [Indexed: 01/12/2023] Open
Abstract
Gastrointestinal cancer is becoming increasingly common, which leads to over 3 million deaths every year. No typical symptoms appear in the early stage of gastrointestinal cancer, posing a significant challenge in the diagnosis and treatment of patients with gastrointestinal cancer. Many patients are in the middle and late stages of gastrointestinal cancer when they feel uncomfortable, unfortunately, most of them will die of gastrointestinal cancer. Recently, various artificial intelligence techniques like machine learning based on multi-omics have been presented for cancer diagnosis and treatment in the era of precision medicine. This paper provides a survey on multi-omics-based cancer diagnosis using machine learning with potential application in gastrointestinal cancer. Particularly, we make a comprehensive summary and analysis from the perspective of multi-omics datasets, task types, and multi-omics-based integration methods. Furthermore, this paper points out the remaining challenges of multi-omics-based cancer diagnosis using machine learning and discusses future topics.
Collapse
Affiliation(s)
- Suixue Wang
- School of Information and Communication Engineering, Hainan University, Haikou, China
| | - Shuling Wang
- Department of Neurology, Affiliated Haikou Hospital of Xiangya School of Medicine, Central South University, Haikou, China
| | - Zhengxia Wang
- School of Computer Science and Technology, Hainan University, Haikou, China
| |
Collapse
|
11
|
Stahlberg EA, Abdel-Rahman M, Aguilar B, Asadpoure A, Beckman RA, Borkon LL, Bryan JN, Cebulla CM, Chang YH, Chatterjee A, Deng J, Dolatshahi S, Gevaert O, Greenspan EJ, Hao W, Hernandez-Boussard T, Jackson PR, Kuijjer M, Lee A, Macklin P, Madhavan S, McCoy MD, Mohammad Mirzaei N, Razzaghi T, Rocha HL, Shahriyari L, Shmulevich I, Stover DG, Sun Y, Syeda-Mahmood T, Wang J, Wang Q, Zervantonakis I. Exploring approaches for predictive cancer patient digital twins: Opportunities for collaboration and innovation. Front Digit Health 2022; 4:1007784. [PMID: 36274654 PMCID: PMC9586248 DOI: 10.3389/fdgth.2022.1007784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 08/30/2022] [Indexed: 01/26/2023] Open
Abstract
We are rapidly approaching a future in which cancer patient digital twins will reach their potential to predict cancer prevention, diagnosis, and treatment in individual patients. This will be realized based on advances in high performance computing, computational modeling, and an expanding repertoire of observational data across multiple scales and modalities. In 2020, the US National Cancer Institute, and the US Department of Energy, through a trans-disciplinary research community at the intersection of advanced computing and cancer research, initiated team science collaborative projects to explore the development and implementation of predictive Cancer Patient Digital Twins. Several diverse pilot projects were launched to provide key insights into important features of this emerging landscape and to determine the requirements for the development and adoption of cancer patient digital twins. Projects included exploring approaches to using a large cohort of digital twins to perform deep phenotyping and plan treatments at the individual level, prototyping self-learning digital twin platforms, using adaptive digital twin approaches to monitor treatment response and resistance, developing methods to integrate and fuse data and observations across multiple scales, and personalizing treatment based on cancer type. Collectively these efforts have yielded increased insights into the opportunities and challenges facing cancer patient digital twin approaches and helped define a path forward. Given the rapidly growing interest in patient digital twins, this manuscript provides a valuable early progress report of several CPDT pilot projects commenced in common, their overall aims, early progress, lessons learned and future directions that will increasingly involve the broader research community.
Collapse
Affiliation(s)
- Eric A. Stahlberg
- Cancer Data Science Initiatives, Frederick National Laboratory for Cancer Research, Frederick, MD, United States
| | - Mohamed Abdel-Rahman
- Department of Ophthalmology and Visual Sciences, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH, United States
| | - Boris Aguilar
- Institute for Systems Biology, Seattle, WA, United States
| | - Alireza Asadpoure
- Department of Civil and Environmental Engineering, University of Massachusetts Amherst, Amherst, MA, United States
| | - Robert A. Beckman
- Innovation Center for Biomedical Informatics, Georgetown University, Washington DC, United States
| | - Lynn L. Borkon
- Cancer Data Science Initiatives, Frederick National Laboratory for Cancer Research, Frederick, MD, United States
| | - Jeffrey N. Bryan
- Department of Veterinary Medicine and Surgery, University of Missouri, Columbia, MO, United States
| | - Colleen M. Cebulla
- Department of Ophthalmology and Visual Sciences, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH, United States
| | - Young Hwan Chang
- Department of Biomedical Engineering and OHSU Center for Spatial Systems Biomedicine (OCSSB), Oregon Health and Science University, Portland, OR, United States
| | - Ansu Chatterjee
- School of Statistics, University of Minnesota, Minneapolis, MN, United States
| | - Jun Deng
- Department of Therapeutic Radiology, Yale University School of Medicine, Yale University, New Haven, CT, United States
| | - Sepideh Dolatshahi
- Department of Biomedical Engineering, University of Virginia, Charlottesville VA, United States
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine and Department of Biomedical Data Science, Stanford University, Stanford, CA, United States
| | - Emily J. Greenspan
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, National Institutes of Health, Bethesda, MD, United States
| | - Wenrui Hao
- Department of Mathematics, The Pennsylvania State University, University Park, PA, United States
| | - Tina Hernandez-Boussard
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine and Department of Biomedical Data Science, Stanford University, Stanford, CA, United States
| | - Pamela R. Jackson
- Mathematical NeuroOncology Lab, Precision Neurotherapeutics Innovation Program, Mayo Clinic Arizona, Phoenix, AZ, United States
| | - Marieke Kuijjer
- Computational Biology and Systems Medicine Group, Centre for Molecular Medicine Norway University of Oslo, Oslo, Norway
| | - Adrian Lee
- Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Paul Macklin
- Department of Intelligent Systems Engineering, Indiana University, Bloomington, IN, United States
| | - Subha Madhavan
- Innovation Center for Biomedical Informatics, Georgetown University, Washington DC, United States
| | - Matthew D. McCoy
- Innovation Center for Biomedical Informatics, Georgetown University, Washington DC, United States
| | - Navid Mohammad Mirzaei
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst, MA, United States
| | - Talayeh Razzaghi
- School of Industrial and Systems Engineering, The University of Oklahoma, Norman, OK, United States
| | - Heber L. Rocha
- Department of Intelligent Systems Engineering, Indiana University, Bloomington, IN, United States
| | - Leili Shahriyari
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst, MA, United States
| | | | - Daniel G. Stover
- Division of Medical Oncology and Department of Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, OH, United States
| | - Yi Sun
- Department of Mathematics, University of South Carolina, Columbia, SC, United States
| | | | - Jinhua Wang
- Institute for Health Informatics and the Masonic Cancer Center, University of Minnesota, Minneapolis, MN, United States
| | - Qi Wang
- Department of Mathematics, University of South Carolina, Columbia, SC, United States
| | - Ioannis Zervantonakis
- Department of Bioengineering, UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, United States
| |
Collapse
|
12
|
Jeong SH, Woo MW, Shin DS, Yeom HG, Lim HJ, Kim BC, Yun JP. Three-Dimensional Postoperative Results Prediction for Orthognathic Surgery through Deep Learning-Based Alignment Network. J Pers Med 2022; 12:998. [PMID: 35743782 PMCID: PMC9225553 DOI: 10.3390/jpm12060998] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 06/16/2022] [Accepted: 06/17/2022] [Indexed: 12/13/2022] Open
Abstract
To date, for the diagnosis of dentofacial dysmorphosis, we have relied almost entirely on reference points, planes, and angles. This is time consuming, and it is also greatly influenced by the skill level of the practitioner. To solve this problem, we wanted to know if deep neural networks could predict postoperative results of orthognathic surgery without relying on reference points, planes, and angles. We use three-dimensional point cloud data of the skull of 269 patients. The proposed method has two main stages for prediction. In step 1, the skull is divided into six parts through the segmentation network. In step 2, three-dimensional transformation parameters are predicted through the alignment network. The ground truth values of transformation parameters are calculated through the iterative closest points (ICP), which align the preoperative part of skull to the corresponding postoperative part of skull. We compare pointnet, pointnet++ and pointconv for the feature extractor of the alignment network. Moreover, we design a new loss function, which considers the distance error of transformed points for a better accuracy. The accuracy, mean intersection over union (mIoU), and dice coefficient (DC) of the first segmentation network, which divides the upper and lower part of skull, are 0.9998, 0.9994, and 0.9998, respectively. For the second segmentation network, which divides the lower part of skull into 5 parts, they were 0.9949, 0.9900, 0.9949, respectively. The mean absolute error of transverse, anterior-posterior, and vertical distance of part 2 (maxilla) are 0.765 mm, 1.455 mm, and 1.392 mm, respectively. For part 3 (mandible), they were 1.069 mm, 1.831 mm, and 1.375 mm, respectively, and for part 4 (chin), they were 1.913 mm, 2.340 mm, and 1.257 mm, respectively. From this study, postoperative results can now be easily predicted by simply entering the point cloud data of computed tomography.
Collapse
Affiliation(s)
- Seung Hyun Jeong
- Advanced Mechatronics R&D Group, Korea Institute of Industrial Technology (KITECH), Gyeongsan 38408, Korea; (S.H.J.); (M.W.W.)
| | - Min Woo Woo
- Advanced Mechatronics R&D Group, Korea Institute of Industrial Technology (KITECH), Gyeongsan 38408, Korea; (S.H.J.); (M.W.W.)
- School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, Korea
| | - Dong Sun Shin
- Department of Oral and Maxillofacial Surgery, Daejeon Dental Hospital, College of Dentistry, Wonkwang University, Daejeon 35233, Korea; (D.S.S.); (H.J.L.)
| | - Han Gyeol Yeom
- Department of Oral and Maxillofacial Radiology, Daejeon Dental Hospital, College of Dentistry, Wonkwang University, Daejeon 35233, Korea;
| | - Hun Jun Lim
- Department of Oral and Maxillofacial Surgery, Daejeon Dental Hospital, College of Dentistry, Wonkwang University, Daejeon 35233, Korea; (D.S.S.); (H.J.L.)
| | - Bong Chul Kim
- Department of Oral and Maxillofacial Surgery, Daejeon Dental Hospital, College of Dentistry, Wonkwang University, Daejeon 35233, Korea; (D.S.S.); (H.J.L.)
| | - Jong Pil Yun
- Advanced Mechatronics R&D Group, Korea Institute of Industrial Technology (KITECH), Gyeongsan 38408, Korea; (S.H.J.); (M.W.W.)
- KITECH School, University of Science and Technology, Daejeon 34113, Korea
| |
Collapse
|