1
|
Vieira FG, Bispo R, Lopes MB. Integration of Multi-Omics Data for the Classification of Glioma Types and Identification of Novel Biomarkers. Bioinform Biol Insights 2024; 18:11779322241249563. [PMID: 38812741 PMCID: PMC11135104 DOI: 10.1177/11779322241249563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 04/09/2024] [Indexed: 05/31/2024] Open
Abstract
Glioma is currently one of the most prevalent types of primary brain cancer. Given its high level of heterogeneity along with the complex biological molecular markers, many efforts have been made to accurately classify the type of glioma in each patient, which, in turn, is critical to improve early diagnosis and increase survival. Nonetheless, as a result of the fast-growing technological advances in high-throughput sequencing and evolving molecular understanding of glioma biology, its classification has been recently subject to significant alterations. In this study, we integrate multiple glioma omics modalities (including mRNA, DNA methylation, and miRNA) from The Cancer Genome Atlas (TCGA), while using the revised glioma reclassified labels, with a supervised method based on sparse canonical correlation analysis (DIABLO) to discriminate between glioma types. We were able to find a set of highly correlated features distinguishing glioblastoma from lower-grade gliomas (LGGs) that were mainly associated with the disruption of receptor tyrosine kinases signaling pathways and extracellular matrix organization and remodeling. Concurrently, the discrimination of the LGG types was characterized primarily by features involved in ubiquitination and DNA transcription processes. Furthermore, we could identify several novel glioma biomarkers likely helpful in both diagnosis and prognosis of the patients, including the genes PPP1R8, GPBP1L1, KIAA1614, C14orf23, CCDC77, BVES, EXD3, CD300A, and HEPN1. Collectively, this comprehensive approach not only allowed a highly accurate discrimination of the different TCGA glioma patients but also presented a step forward in advancing our comprehension of the underlying molecular mechanisms driving glioma heterogeneity. Ultimately, our study also revealed novel candidate biomarkers that might constitute potential therapeutic targets, marking a significant stride toward personalized and more effective treatment strategies for patients with glioma.
Collapse
Affiliation(s)
- Francisca G Vieira
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
| | - Regina Bispo
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- Department of Mathematics, NOVA School of Science and Technology, Caparica, Portugal
| | - Marta B Lopes
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- Department of Mathematics, NOVA School of Science and Technology, Caparica, Portugal
- UNIDEMI, Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal
| |
Collapse
|
2
|
Gygi JP, Konstorum A, Pawar S, Aron E, Kleinstein SH, Guan L. A supervised Bayesian factor model for the identification of multi-omics signatures. Bioinformatics 2024; 40:btae202. [PMID: 38603606 PMCID: PMC11078774 DOI: 10.1093/bioinformatics/btae202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 02/29/2024] [Accepted: 04/10/2024] [Indexed: 04/13/2024] Open
Abstract
MOTIVATION Predictive biological signatures provide utility as biomarkers for disease diagnosis and prognosis, as well as prediction of responses to vaccination or therapy. These signatures are identified from high-throughput profiling assays through a combination of dimensionality reduction and machine learning techniques. The genes, proteins, metabolites, and other biological analytes that compose signatures also generate hypotheses on the underlying mechanisms driving biological responses, thus improving biological understanding. Dimensionality reduction is a critical step in signature discovery to address the large number of analytes in omics datasets, especially for multi-omics profiling studies with tens of thousands of measurements. Latent factor models, which can account for the structural heterogeneity across diverse assays, effectively integrate multi-omics data and reduce dimensionality to a small number of factors that capture correlations and associations among measurements. These factors provide biologically interpretable features for predictive modeling. However, multi-omics integration and predictive modeling are generally performed independently in sequential steps, leading to suboptimal factor construction. Combining these steps can yield better multi-omics signatures that are more predictive while still being biologically meaningful. RESULTS We developed a supervised variational Bayesian factor model that extracts multi-omics signatures from high-throughput profiling datasets that can span multiple data types. Signature-based multiPle-omics intEgration via lAtent factoRs (SPEAR) adaptively determines factor rank, emphasis on factor structure, data relevance and feature sparsity. The method improves the reconstruction of underlying factors in synthetic examples and prediction accuracy of coronavirus disease 2019 severity and breast cancer tumor subtypes. AVAILABILITY AND IMPLEMENTATION SPEAR is a publicly available R-package hosted at https://bitbucket.org/kleinstein/SPEAR.
Collapse
Affiliation(s)
- Jeremy P Gygi
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
| | - Anna Konstorum
- Department of Pathology, Yale School of Medicine, New Haven, CT 06520, United States
| | - Shrikant Pawar
- Department of Genetics, Yale Center for Genomic Analysis (YCGA), Yale School of Medicine, New Haven, CT 06520, United States
| | - Edel Aron
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
| | - Steven H Kleinstein
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
- Department of Pathology, Yale School of Medicine, New Haven, CT 06520, United States
- Department of Immunobiology, Yale School of Medicine, New Haven, CT 06520, United States
| | - Leying Guan
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 06520, United States
| |
Collapse
|
3
|
Acharya D, Mukhopadhyay A. A comprehensive review of machine learning techniques for multi-omics data integration: challenges and applications in precision oncology. Brief Funct Genomics 2024:elae013. [PMID: 38600757 DOI: 10.1093/bfgp/elae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/12/2024] [Accepted: 03/22/2024] [Indexed: 04/12/2024] Open
Abstract
Multi-omics data play a crucial role in precision medicine, mainly to understand the diverse biological interaction between different omics. Machine learning approaches have been extensively employed in this context over the years. This review aims to comprehensively summarize and categorize these advancements, focusing on the integration of multi-omics data, which includes genomics, transcriptomics, proteomics and metabolomics, alongside clinical data. We discuss various machine learning techniques and computational methodologies used for integrating distinct omics datasets and provide valuable insights into their application. The review emphasizes both the challenges and opportunities present in multi-omics data integration, precision medicine and patient stratification, offering practical recommendations for method selection in various scenarios. Recent advances in deep learning and network-based approaches are also explored, highlighting their potential to harmonize diverse biological information layers. Additionally, we present a roadmap for the integration of multi-omics data in precision oncology, outlining the advantages, challenges and implementation difficulties. Hence this review offers a thorough overview of current literature, providing researchers with insights into machine learning techniques for patient stratification, particularly in precision oncology. Contact: anirban@klyuniv.ac.in.
Collapse
Affiliation(s)
- Debabrata Acharya
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| | - Anirban Mukhopadhyay
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| |
Collapse
|
4
|
Eteleeb AM, Novotny BC, Tarraga CS, Sohn C, Dhungel E, Brase L, Nallapu A, Buss J, Farias F, Bergmann K, Bradley J, Norton J, Gentsch J, Wang F, Davis AA, Morris JC, Karch CM, Perrin RJ, Benitez BA, Harari O. Brain high-throughput multi-omics data reveal molecular heterogeneity in Alzheimer's disease. PLoS Biol 2024; 22:e3002607. [PMID: 38687811 PMCID: PMC11086901 DOI: 10.1371/journal.pbio.3002607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 05/10/2024] [Accepted: 03/28/2024] [Indexed: 05/02/2024] Open
Abstract
Unbiased data-driven omic approaches are revealing the molecular heterogeneity of Alzheimer disease. Here, we used machine learning approaches to integrate high-throughput transcriptomic, proteomic, metabolomic, and lipidomic profiles with clinical and neuropathological data from multiple human AD cohorts. We discovered 4 unique multimodal molecular profiles, one of them showing signs of poor cognitive function, a faster pace of disease progression, shorter survival with the disease, severe neurodegeneration and astrogliosis, and reduced levels of metabolomic profiles. We found this molecular profile to be present in multiple affected cortical regions associated with higher Braak tau scores and significant dysregulation of synapse-related genes, endocytosis, phagosome, and mTOR signaling pathways altered in AD early and late stages. AD cross-omics data integration with transcriptomic data from an SNCA mouse model revealed an overlapping signature. Furthermore, we leveraged single-nuclei RNA-seq data to identify distinct cell-types that most likely mediate molecular profiles. Lastly, we identified that the multimodal clusters uncovered cerebrospinal fluid biomarkers poised to monitor AD progression and possibly cognition. Our cross-omics analyses provide novel critical molecular insights into AD.
Collapse
Affiliation(s)
- Abdallah M. Eteleeb
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- The Charles F. and Joanne Knight Alzheimer Disease Research Center, Washington University, St. Louis, Missouri, United States of America
| | - Brenna C. Novotny
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
| | - Carolina Soriano Tarraga
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
| | - Christopher Sohn
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
| | - Eliza Dhungel
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
| | - Logan Brase
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
| | - Aasritha Nallapu
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
| | - Jared Buss
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
| | - Fabiana Farias
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics Center, Washington University, St. Louis, Missouri, United States of America
| | - Kristy Bergmann
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics Center, Washington University, St. Louis, Missouri, United States of America
| | - Joseph Bradley
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics Center, Washington University, St. Louis, Missouri, United States of America
| | - Joanne Norton
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics Center, Washington University, St. Louis, Missouri, United States of America
| | - Jen Gentsch
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics Center, Washington University, St. Louis, Missouri, United States of America
| | - Fengxian Wang
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics Center, Washington University, St. Louis, Missouri, United States of America
| | - Albert A. Davis
- Department of Neurology, Washington University, St. Louis, Missouri, United States of America
- Hope Center for Neurological Disorders, Washington University, St. Louis, Missouri, United States of America
| | - John C. Morris
- The Charles F. and Joanne Knight Alzheimer Disease Research Center, Washington University, St. Louis, Missouri, United States of America
- Department of Neurology, Washington University, St. Louis, Missouri, United States of America
- Hope Center for Neurological Disorders, Washington University, St. Louis, Missouri, United States of America
| | - Celeste M. Karch
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- The Charles F. and Joanne Knight Alzheimer Disease Research Center, Washington University, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics Center, Washington University, St. Louis, Missouri, United States of America
- Hope Center for Neurological Disorders, Washington University, St. Louis, Missouri, United States of America
| | - Richard J. Perrin
- The Charles F. and Joanne Knight Alzheimer Disease Research Center, Washington University, St. Louis, Missouri, United States of America
- Department of Neurology, Washington University, St. Louis, Missouri, United States of America
- Hope Center for Neurological Disorders, Washington University, St. Louis, Missouri, United States of America
- Department of Pathology and Immunology, Washington University, St. Louis, Missouri, United States of America
| | - Bruno A. Benitez
- Department of Neurology and Neuroscience, Harvard Medical School and Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America
| | - Oscar Harari
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- The Charles F. and Joanne Knight Alzheimer Disease Research Center, Washington University, St. Louis, Missouri, United States of America
- Hope Center for Neurological Disorders, Washington University, St. Louis, Missouri, United States of America
| |
Collapse
|
5
|
Lan W, Liao H, Chen Q, Zhu L, Pan Y, Chen YPP. DeepKEGG: a multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery. Brief Bioinform 2024; 25:bbae185. [PMID: 38678587 PMCID: PMC11056029 DOI: 10.1093/bib/bbae185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 03/07/2024] [Accepted: 04/09/2024] [Indexed: 05/01/2024] Open
Abstract
Deep learning-based multi-omics data integration methods have the capability to reveal the mechanisms of cancer development, discover cancer biomarkers and identify pathogenic targets. However, current methods ignore the potential correlations between samples in integrating multi-omics data. In addition, providing accurate biological explanations still poses significant challenges due to the complexity of deep learning models. Therefore, there is an urgent need for a deep learning-based multi-omics integration method to explore the potential correlations between samples and provide model interpretability. Herein, we propose a novel interpretable multi-omics data integration method (DeepKEGG) for cancer recurrence prediction and biomarker discovery. In DeepKEGG, a biological hierarchical module is designed for local connections of neuron nodes and model interpretability based on the biological relationship between genes/miRNAs and pathways. In addition, a pathway self-attention module is constructed to explore the correlation between different samples and generate the potential pathway feature representation for enhancing the prediction performance of the model. Lastly, an attribution-based feature importance calculation method is utilized to discover biomarkers related to cancer recurrence and provide a biological interpretation of the model. Experimental results demonstrate that DeepKEGG outperforms other state-of-the-art methods in 5-fold cross validation. Furthermore, case studies also indicate that DeepKEGG serves as an effective tool for biomarker discovery. The code is available at https://github.com/lanbiolab/DeepKEGG.
Collapse
Affiliation(s)
- Wei Lan
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Haibo Liao
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Qingfeng Chen
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Lingzhi Zhu
- School of Computer and Information Science, Hunan Institute of Technology, No. 18 Henghua Road, Zhuhui District, Hengyang 421002, China
| | - Yi Pan
- School of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Avenue, Shenzhen University Town, Nanshan District, Shenzhen 518055, China
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Plenty Rd, Bundoora, Melbourne, Victoria 3086, Australia
| |
Collapse
|
6
|
Shapiro BD, Battle A. Bayesian Multi-View Clustering given complex inter-view structure. F1000Res 2024; 11:1460. [PMID: 38495778 PMCID: PMC10940850 DOI: 10.12688/f1000research.126215.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/02/2024] [Indexed: 03/19/2024] Open
Abstract
Multi-view datasets are becoming increasingly prevalent. These datasets consist of different modalities that provide complementary characterizations of the same underlying system. They can include heterogeneous types of information with complex relationships, varying degrees of missingness, and assorted sample sizes, as is often the case in multi-omic biological studies. Clustering multi-view data allows us to leverage different modalities to infer underlying systematic structure, but most existing approaches are limited to contexts in which entities are the same across views or have clear one-to-one relationships across data types with a common sample size. Many methods also make strong assumptions about the similarities of clusterings across views. We propose a Bayesian multi-view clustering approach (BMVC) which can handle the realities of multi-view datasets that often have complex relationships and diverse structure. BMVC incorporates known and complex many-to-many relationships between entities via a probabilistic graphical model that enables the joint inference of clusterings specific to each view, but where each view informs the others. Additionally, BMVC estimates the strength of the relationships between each pair of views, thus moderating the degree to which it imposes dependence constraints. We benchmarked BMVC on simulated data to show that it accurately estimates varying degrees of inter-view dependence when inter-view relationships are not limited to one-to-one correspondence. Next, we demonstrated its ability to capture visually interpretable inter-view structure in a public health survey of individuals and households in Puerto Rico following Hurricane Maria. Finally, we showed that BMVC clusters integrate the complex relationships between multi-omic profiles of breast cancer patient data, improving the biological homogeneity of clusters and elucidating hypotheses for functional biological mechanisms. We found that BMVC leverages complex inter-view structure to produce higher quality clusters than those generated by standard approaches. We also showed that BMVC is a valuable tool for real-world discovery and hypothesis generation.
Collapse
Affiliation(s)
- Benjamin D. Shapiro
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Alexis Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| |
Collapse
|
7
|
Zhang Q, Chang C, Shen L, Long Q. Incorporating graph information in Bayesian factor analysis with robust and adaptive shrinkage priors. Biometrics 2024; 80:ujad014. [PMID: 38281768 PMCID: PMC10826885 DOI: 10.1093/biomtc/ujad014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 10/20/2023] [Accepted: 11/16/2023] [Indexed: 01/30/2024]
Abstract
There has been an increasing interest in decomposing high-dimensional multi-omics data into a product of low-rank and sparse matrices for the purpose of dimension reduction and feature engineering. Bayesian factor models achieve such low-dimensional representation of the original data through different sparsity-inducing priors. However, few of these models can efficiently incorporate the information encoded by the biological graphs, which has been already proven to be useful in many analysis tasks. In this work, we propose a Bayesian factor model with novel hierarchical priors, which incorporate the biological graph knowledge as a tool of identifying a group of genes functioning collaboratively. The proposed model therefore enables sparsity within networks by allowing each factor loading to be shrunk adaptively and by considering additional layers to relate individual shrinkage parameters to the underlying graph information, both of which yield a more accurate structure recovery of factor loadings. Further, this new priors overcome the phase transition phenomenon, in contrast to existing graph-incorporated approaches, so that it is robust to noisy edges that are inconsistent with the actual sparsity structure of the factor loadings. Finally, our model can handle both continuous and discrete data types. The proposed method is shown to outperform several existing factor analysis methods through simulation experiments and real data analyses.
Collapse
Affiliation(s)
- Qiyiwen Zhang
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Changgee Chang
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 47405, United States
| | - Li Shen
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Qi Long
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| |
Collapse
|
8
|
Cai Y, Wang S. Deeply integrating latent consistent representations in high-noise multi-omics data for cancer subtyping. Brief Bioinform 2024; 25:bbae061. [PMID: 38426322 PMCID: PMC10939425 DOI: 10.1093/bib/bbae061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 01/13/2024] [Accepted: 01/29/2024] [Indexed: 03/02/2024] Open
Abstract
Cancer is a complex and high-mortality disease regulated by multiple factors. Accurate cancer subtyping is crucial for formulating personalized treatment plans and improving patient survival rates. The underlying mechanisms that drive cancer progression can be comprehensively understood by analyzing multi-omics data. However, the high noise levels in omics data often pose challenges in capturing consistent representations and adequately integrating their information. This paper proposed a novel variational autoencoder-based deep learning model, named Deeply Integrating Latent Consistent Representations (DILCR). Firstly, multiple independent variational autoencoders and contrastive loss functions were designed to separate noise from omics data and capture latent consistent representations. Subsequently, an Attention Deep Integration Network was proposed to integrate consistent representations across different omics levels effectively. Additionally, we introduced the Improved Deep Embedded Clustering algorithm to make integrated variable clustering friendly. The effectiveness of DILCR was evaluated using 10 typical cancer datasets from The Cancer Genome Atlas and compared with 14 state-of-the-art integration methods. The results demonstrated that DILCR effectively captures the consistent representations in omics data and outperforms other integration methods in cancer subtyping. In the Kidney Renal Clear Cell Carcinoma case study, cancer subtypes were identified by DILCR with significant biological significance and interpretability.
Collapse
Affiliation(s)
- Yueyi Cai
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| |
Collapse
|
9
|
Zengin T, Masud BA, Önal-Süzek T. TCGAnalyzeR: An Online Pan-Cancer Tool for Integrative Visualization of Molecular and Clinical Data of Cancer Patients for Cohort and Associated Gene Discovery. Cancers (Basel) 2024; 16:345. [PMID: 38254834 PMCID: PMC10814871 DOI: 10.3390/cancers16020345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/08/2024] [Accepted: 01/11/2024] [Indexed: 01/24/2024] Open
Abstract
For humans, the parallel processing capability of visual recognition allows for faster comprehension of complex scenes and patterns. This is essential, especially for clinicians interpreting big data for whom the visualization tools play an even more vital role in transforming raw big data into clinical decision making by managing the inherent complexity and monitoring patterns interactively in real time. The Cancer Genome Atlas (TCGA) database's size and data variety challenge the effective utilization of this valuable resource by clinicians and biologists. We re-analyzed the five molecular data types, i.e., mutation, transcriptome profile, copy number variation, miRNA, and methylation data, of ~11,000 cancer patients with all 33 cancer types and integrated the existing TCGA patient cohorts from the literature into a free and efficient web application: TCGAnalyzeR. TCGAnalyzeR provides an integrative visualization of pre-analyzed TCGA data with several novel modules: (i) simple nucleotide variations with driver prediction; (ii) recurrent copy number alterations; (iii) differential expression in tumor versus normal, with pathway and the survival analysis; (iv) TCGA clinical data including metastasis and survival analysis; (v) external subcohorts from the literature, curatedTCGAData, and BiocOncoTK R packages; (vi) internal patient clusters determined using an iClusterPlus R package or signature-based expression analysis of five molecular data types. TCGAnalyzeR integrated the multi-omics, pan-cancer TCGA with ~120 subcohorts from the literature along with clipboard panels, thus allowing users to create their own subcohorts, compare against existing external subcohorts (MSI, Immune, PAM50, Triple Negative, IDH1, miRNA, metastasis, etc.) along with our internal patient clusters, and visualize cohort-centric or gene-centric results interactively using TCGAnalyzeR.
Collapse
Affiliation(s)
- Talip Zengin
- Department of Molecular Biology and Genetics, Mugla Sitki Kocman University, Mugla 48000, Türkiye;
| | - Başak Abak Masud
- Department of Bioinformatics, Mugla Sitki Kocman University, Mugla 48000, Türkiye;
| | - Tuğba Önal-Süzek
- Department of Bioinformatics, Mugla Sitki Kocman University, Mugla 48000, Türkiye;
| |
Collapse
|
10
|
Li M, Noordam R, Winter EM, van Meurs M, Bouma HR, Arbous MS, Rensen PCN, Kooijman S. Hydrocortisone-associated death and hospital length of stay in patients with sepsis: A retrospective cohort of large-scale clinical care data. Biomed Pharmacother 2024; 170:115961. [PMID: 38039761 DOI: 10.1016/j.biopha.2023.115961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 11/14/2023] [Accepted: 11/27/2023] [Indexed: 12/03/2023] Open
Abstract
PURPOSE Sepsis is one of the leading causes of morbidity and mortality worldwide with approximately 50 million annual cases. There is ongoing debate on the clinical benefit of hydrocortisone in the prevention of death in septic patients. Here we evaluated the association between hydrocortisone treatment and mortality in patients diagnosed with sepsis in a large-scale clinical dataset. METHODS Data from patients between 2008 and 2019 were extracted from the retrospective Medical Information Mart for Intensive Care IV (MIMIC-IV) database. Patients who received hydrocortisone after diagnosis were matched using propensity-score matching with patients who did not, to balance confounding (by indication and contraindication) factors between the groups. 90-day mortality and survivors' length of hospital stay was compared between patients who did or did not receive hydrocortisone. RESULTS A total of 31,749 septic patients were included in the study (mean age: 67, men: 57.3%, in-hospital mortality: 15.6%). 90-day mortality was higher among the 1802 patients receiving hydrocortisone when compared with the 6348 matched non-users (hazard ratio: 1.35, 95% CI: 1.24-1.47). Hydrocortisone treatment was also associated with increased in-hospital mortality (40.9% vs. 27.6%, p < 0.0001) and prolonged hospital stay in those who survived until discharge (median 12.6 days vs. 10.8 days, p < 0.0001). Stratification for age, gender, ethnicity, occurrence of septic shock, and the need for vasopressor drug administration such as (nor)epinephrine did not reveal sub-population(s) benefiting of hydrocortisone use. CONCLUSION Hydrocortisone treatment is associated with increased risk of death as well as prolonged hospital stay in septic patients. Although residual confounding (by indication) cannot be ruled out completely due to the observational nature of the study, the present study suggests clinical implication of hydrocortisone use in patients with sepsis.
Collapse
Affiliation(s)
- Mohan Li
- Department of Internal Medicine, Division of Endocrinology, Leiden University Medical Center, Leiden, the Netherlands
| | - Raymond Noordam
- Department of Internal Medicine, Section of Gerontology and Geriatrics, Leiden University Medical Center, Leiden, the Netherlands
| | - Elizabeth M Winter
- Department of Internal Medicine, Division of Endocrinology, Leiden University Medical Center, Leiden, the Netherlands
| | - Matijs van Meurs
- Department of Critical Care, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Hjalmar R Bouma
- Department of Clinical Pharmacy and Pharmacology and Department of Internal Medicine, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - M Sesmu Arbous
- Department of Intensive Care, Leiden University Medical Center, Leiden, the Netherlands
| | - Patrick C N Rensen
- Department of Internal Medicine, Division of Endocrinology, Leiden University Medical Center, Leiden, the Netherlands
| | - Sander Kooijman
- Department of Internal Medicine, Division of Endocrinology, Leiden University Medical Center, Leiden, the Netherlands.
| |
Collapse
|
11
|
Lu Z, Ahmadiankalati M, Tan Z. Joint clustering multiple longitudinal features: A comparison of methods and software packages with practical guidance. Stat Med 2023; 42:5513-5540. [PMID: 37789706 DOI: 10.1002/sim.9917] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 06/07/2023] [Accepted: 09/13/2023] [Indexed: 10/05/2023]
Abstract
Clustering longitudinal features is a common goal in medical studies to identify distinct disease developmental trajectories. Compared to clustering a single longitudinal feature, integrating multiple longitudinal features allows additional information to be incorporated into the clustering process, which may reveal co-existing longitudinal patterns and generate deeper biological insight. Despite its increasing importance and popularity, there is limited practical guidance for implementing cluster analysis approaches for multiple longitudinal features and evaluating their comparative performance in medical datasets. In this paper, we provide an overview of several commonly used approaches to clustering multiple longitudinal features, with an emphasis on application and implementation through R software. These methods can be broadly categorized into two categories, namely model-based (including frequentist and Bayesian) approaches and algorithm-based approaches. To evaluate their performance, we compare these approaches using real-life and simulated datasets. These results provide practical guidance to applied researchers who are interested in applying these approaches for clustering multiple longitudinal features. Recommendations for applied researchers and suggestions for future research in this area are also discussed.
Collapse
Affiliation(s)
- Zihang Lu
- Department of Public Health Sciences, Queen's University, Kingston, Ontario, Canada
- Department of Mathematics and Statistics, Queen's University, Kingston, Ontario, Canada
| | | | - Zhiwen Tan
- Department of Public Health Sciences, Queen's University, Kingston, Ontario, Canada
| |
Collapse
|
12
|
Chamoso-Sanchez D, Rabadán Pérez F, Argente J, Barbas C, Martos-Moreno GA, Rupérez FJ. Identifying subgroups of childhood obesity by using multiplatform metabotyping. Front Mol Biosci 2023; 10:1301996. [PMID: 38174068 PMCID: PMC10761426 DOI: 10.3389/fmolb.2023.1301996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024] Open
Abstract
Introduction: Obesity results from an interplay between genetic predisposition and environmental factors such as diet, physical activity, culture, and socioeconomic status. Personalized treatments for obesity would be optimal, thus necessitating the identification of individual characteristics to improve the effectiveness of therapies. For example, genetic impairment of the leptin-melanocortin pathway can result in rare cases of severe early-onset obesity. Metabolomics has the potential to distinguish between a healthy and obese status; however, differentiating subsets of individuals within the obesity spectrum remains challenging. Factor analysis can integrate patient features from diverse sources, allowing an accurate subclassification of individuals. Methods: This study presents a workflow to identify metabotypes, particularly when routine clinical studies fail in patient categorization. 110 children with obesity (BMI > +2 SDS) genotyped for nine genes involved in the leptin-melanocortin pathway (CPE, MC3R, MC4R, MRAP2, NCOA1, PCSK1, POMC, SH2B1, and SIM1) and two glutamate receptor genes (GRM7 and GRIK1) were studied; 55 harboring heterozygous rare sequence variants and 55 with no variants. Anthropometric and routine clinical laboratory data were collected, and serum samples processed for untargeted metabolomic analysis using GC-q-MS and CE-TOF-MS and reversed-phase U(H)PLC-QTOF-MS/MS in positive and negative ionization modes. Following signal processing and multialignment, multivariate and univariate statistical analyses were applied to evaluate the genetic trait association with metabolomics data and clinical and routine laboratory features. Results and Discussion: Neither the presence of a heterozygous rare sequence variant nor clinical/routine laboratory features determined subgroups in the metabolomics data. To identify metabolomic subtypes, we applied Factor Analysis, by constructing a composite matrix from the five analytical platforms. Six factors were discovered and three different metabotypes. Subtle but neat differences in the circulating lipids, as well as in insulin sensitivity could be established, which opens the possibility to personalize the treatment according to the patients categorization into such obesity subtypes. Metabotyping in clinical contexts poses challenges due to the influence of various uncontrolled variables on metabolic phenotypes. However, this strategy reveals the potential to identify subsets of patients with similar clinical diagnoses but different metabolic conditions. This approach underscores the broader applicability of Factor Analysis in metabotyping across diverse clinical scenarios.
Collapse
Affiliation(s)
- David Chamoso-Sanchez
- Centro de Metabolómica y Bioanálisis (CEMBIO), Facultad de Farmacia, Universidad San Pablo-CEU, CEU Universities, Boadilla del Monte, Spain
| | | | - Jesús Argente
- Department of Pediatrics and Pediatric Endocrinology, Hospital Infantil Universitario Niño Jesús, Instituto de Investigación Sanitaria La Princesa, Universidad Autónoma de Madrid, Madrid, Spain
- CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain
- IMDEA Food Institute, Madrid, Spain
| | - Coral Barbas
- Centro de Metabolómica y Bioanálisis (CEMBIO), Facultad de Farmacia, Universidad San Pablo-CEU, CEU Universities, Boadilla del Monte, Spain
| | - Gabriel A. Martos-Moreno
- Department of Pediatrics and Pediatric Endocrinology, Hospital Infantil Universitario Niño Jesús, Instituto de Investigación Sanitaria La Princesa, Universidad Autónoma de Madrid, Madrid, Spain
- CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain
| | - Francisco J. Rupérez
- Centro de Metabolómica y Bioanálisis (CEMBIO), Facultad de Farmacia, Universidad San Pablo-CEU, CEU Universities, Boadilla del Monte, Spain
| |
Collapse
|
13
|
Sun Z, Chung D, Neelon B, Millar-Wilson A, Ethier SP, Xiao F, Zheng Y, Wallace K, Hardiman G. A Bayesian framework for pathway-guided identification of cancer subgroups by integrating multiple types of genomic data. Stat Med 2023; 42:5266-5284. [PMID: 37715500 DOI: 10.1002/sim.9911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 07/15/2023] [Accepted: 09/05/2023] [Indexed: 09/17/2023]
Abstract
In recent years, comprehensive cancer genomics platforms, such as The Cancer Genome Atlas (TCGA), provide access to an enormous amount of high throughput genomic datasets for each patient, including gene expression, DNA copy number alterations, DNA methylation, and somatic mutation. While the integration of these multi-omics datasets has the potential to provide novel insights that can lead to personalized medicine, most existing approaches only focus on gene-level analysis and lack the ability to facilitate biological findings at the pathway-level. In this article, we propose Bayes-InGRiD (Bayesian Integrative Genomics Robust iDentification of cancer subgroups), a novel pathway-guided Bayesian sparse latent factor model for the simultaneous identification of cancer patient subgroups (clustering) and key molecular features (variable selection) within a unified framework, based on the joint analysis of continuous, binary, and count data. By utilizing pathway (gene set) information, Bayes-InGRiD does not only enhance the accuracy and robustness of cancer patient subgroup and key molecular feature identification, but also promotes biological understanding and interpretation. Finally, to facilitate an efficient posterior sampling, an alternative Gibbs sampler for logistic and negative binomial models is proposed using Pólya-Gamma mixtures of normal to represent latent variables for binary and count data, which yields a conditionally Gaussian representation of the posterior. The R package "INGRID" implementing the proposed approach is currently available in our research group GitHub webpage (https://dongjunchung.github.io/INGRID/).
Collapse
Affiliation(s)
- Zequn Sun
- Department of Preventive Medicine, Northwestern University, Chicago, Illinois
| | - Dongjun Chung
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio
- The Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | - Brian Neelon
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina
| | | | - Stephen P Ethier
- Department of Pathology and Laboratory Medicine, Medical University of South Carolina, Charleston, South Carolina
| | - Feifei Xiao
- Department of Biostatistics, University of Florida, Gainesville, Florida
| | - Yinan Zheng
- Department of Preventive Medicine, Northwestern University, Chicago, Illinois
| | - Kristin Wallace
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina
| | - Gary Hardiman
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina
- Faculty of Medicine, Health and Life Sciences, School of Biological Sciences and Institute for Global Food Security, Queen's University Belfast, Belfast, UK
| |
Collapse
|
14
|
Li Z, Melograna F, Hoskens H, Duroux D, Marazita ML, Walsh S, Weinberg SM, Shriver MD, Müller-Myhsok B, Claes P, Van Steen K. netMUG: a novel network-guided multi-view clustering workflow for dissecting genetic and facial heterogeneity. Front Genet 2023; 14:1286800. [PMID: 38125750 PMCID: PMC10731261 DOI: 10.3389/fgene.2023.1286800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 11/14/2023] [Indexed: 12/23/2023] Open
Abstract
Introduction: Multi-view data offer advantages over single-view data for characterizing individuals, which is crucial in precision medicine toward personalized prevention, diagnosis, or treatment follow-up. Methods: Here, we develop a network-guided multi-view clustering framework named netMUG to identify actionable subgroups of individuals. This pipeline first adopts sparse multiple canonical correlation analysis to select multi-view features possibly informed by extraneous data, which are then used to construct individual-specific networks (ISNs). Finally, the individual subtypes are automatically derived by hierarchical clustering on these network representations. Results: We applied netMUG to a dataset containing genomic data and facial images to obtain BMI-informed multi-view strata and showed how it could be used for a refined obesity characterization. Benchmark analysis of netMUG on synthetic data with known strata of individuals indicated its superior performance compared with both baseline and benchmark methods for multi-view clustering. The clustering derived from netMUG achieved an adjusted Rand index of 1 with respect to the synthesized true labels. In addition, the real-data analysis revealed subgroups strongly linked to BMI and genetic and facial determinants of these subgroups. Discussion: netMUG provides a powerful strategy, exploiting individual-specific networks to identify meaningful and actionable strata. Moreover, the implementation is easy to generalize to accommodate heterogeneous data sources or highlight data structures.
Collapse
Affiliation(s)
- Zuqi Li
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Federico Melograna
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Hanne Hoskens
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Diane Duroux
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| | - Mary L. Marazita
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Susan Walsh
- Department of Biology, Indiana University Indianapolis, Indianapolis, IN, United States
| | - Seth M. Weinberg
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Mark D. Shriver
- Department of Anthropology, Pennsylvania State University, State College, PA, United States
| | | | - Peter Claes
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
- Department of Electrical Engineering, KU Leuven, Leuven, Belgium
- Murdoch Children’s Research Institute, Melbourne, VIC, Australia
| | - Kristel Van Steen
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| |
Collapse
|
15
|
Song Y, Xiang Z, Lu Z, Su R, Shu W, Sui M, Wei X, Xu X. Identification of a brand intratumor microbiome signature for predicting prognosis of hepatocellular carcinoma. J Cancer Res Clin Oncol 2023; 149:11319-11332. [PMID: 37380815 DOI: 10.1007/s00432-023-04962-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 06/01/2023] [Indexed: 06/30/2023]
Abstract
PURPOSE Given that prognosis of hepatocellular carcinoma (HCC) differs dramatically, it is imperative to uncover effective and available prognostic biomarker(s). The intratumor microbiome plays a significant role in the response to tumor microenvironment, we aimed to identify an intratumor microbiome signature for predicting the prognosis of HCC patients accurately and investigate its possible mechanisms subsequently. METHODS The TCGA HCC microbiome data (TCGA-LIHC-microbiome) was downloaded from cBioPortal. To create an intratumor microbiome-related prognostic signature, univariate and multivariate Cox regression analyses were used to quantify the association of microbial abundance and patients' overall survival (OS), as well as their diseases specific survival (DSS). The performance of the scoring model was evaluated by the area under the ROC curve (AUC). Based on the microbiome-related signature, clinical factors, and multi-omics molecular subtypes on the basis of "icluster" algorithm, nomograms were established to predict OS and DSS. Patients were further clustered into three subtypes based on their microbiome-related characteristics by consensus clustering. Moreover, deconvolution algorithm, weighted correlation network analysis (WGCNA) and gene set variation analysis (GSVA) were used to investigate the potential mechanisms. RESULTS In TCGA LIHC microbiome data, the abundances of 166 genera among the total 1406 genera were considerably associated with HCC patients' OS. From that filtered dataset we identified a 27-microbe prognostic signature and developed a microbiome-related score (MRS) model. Compared with those in the relatively low-risk group, patients in higher-risk group own a much worse OS (P < 0.0001). Besides, the time-dependent ROC curves with MRS showed excellent predictive efficacy both in OS and DSS. Moreover, MRS is an independent prognostic factor for OS and DSS over clinical factors and multi-omics-based molecular subtypes. The integration of MRS into nomograms significantly improved the efficacy of prognosis prediction (1-year AUC:0.849, 3-year AUC: 0.825, 5-year AUC: 0.822). The analysis of microbiome-based subtypes on their immune characteristics and specific gene modules inferred that the intratumor microbiome may affect the HCC patients' prognosis via modulating the cancer stemness and immune response. CONCLUSION MRS, a 27 intratumor microbiome-related prognostic model, was successfully established to predict HCC patients overall survive independently. And the possible underlying mechanisms were also investigated to provide a potential intervention strategy.
Collapse
Affiliation(s)
- Yisu Song
- Department of Hepatobiliary and Pancreatic Surgery, Affiliated Hangzhou First People's Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Integrated Oncology and Intelligent Medicine of Zhejiang Province, Hangzhou, 310006, China
| | - Ze Xiang
- Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Integrated Oncology and Intelligent Medicine of Zhejiang Province, Hangzhou, 310006, China
| | - Zhengyang Lu
- Department of Hepatobiliary and Pancreatic Surgery, Affiliated Hangzhou First People's Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Integrated Oncology and Intelligent Medicine of Zhejiang Province, Hangzhou, 310006, China
- Zhejiang Chinese Medical University, Hangzhou, 310053, People's Republic of China
| | - Renyi Su
- Department of Hepatobiliary and Pancreatic Surgery, Affiliated Hangzhou First People's Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Integrated Oncology and Intelligent Medicine of Zhejiang Province, Hangzhou, 310006, China
| | - Wenzhi Shu
- Department of Hepatobiliary and Pancreatic Surgery, Affiliated Hangzhou First People's Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Integrated Oncology and Intelligent Medicine of Zhejiang Province, Hangzhou, 310006, China
| | - Meihua Sui
- School of Basic Medical Sciences and Women's Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Cancer Center, Zhejiang University, Hangzhou, China
| | - Xuyong Wei
- Department of Hepatobiliary and Pancreatic Surgery, Affiliated Hangzhou First People's Hospital, Zhejiang University School of Medicine, Hangzhou, China.
- Key Laboratory of Integrated Oncology and Intelligent Medicine of Zhejiang Province, Hangzhou, 310006, China.
| | - Xiao Xu
- Zhejiang University School of Medicine, Hangzhou, China.
- Key Laboratory of Integrated Oncology and Intelligent Medicine of Zhejiang Province, Hangzhou, 310006, China.
- Institute of Organ Transplantation, Zhejiang University, Hangzhou, China.
| |
Collapse
|
16
|
Cembrowska-Lech D, Krzemińska A, Miller T, Nowakowska A, Adamski C, Radaczyńska M, Mikiciuk G, Mikiciuk M. An Integrated Multi-Omics and Artificial Intelligence Framework for Advance Plant Phenotyping in Horticulture. BIOLOGY 2023; 12:1298. [PMID: 37887008 PMCID: PMC10603917 DOI: 10.3390/biology12101298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 09/27/2023] [Accepted: 09/28/2023] [Indexed: 10/28/2023]
Abstract
This review discusses the transformative potential of integrating multi-omics data and artificial intelligence (AI) in advancing horticultural research, specifically plant phenotyping. The traditional methods of plant phenotyping, while valuable, are limited in their ability to capture the complexity of plant biology. The advent of (meta-)genomics, (meta-)transcriptomics, proteomics, and metabolomics has provided an opportunity for a more comprehensive analysis. AI and machine learning (ML) techniques can effectively handle the complexity and volume of multi-omics data, providing meaningful interpretations and predictions. Reflecting the multidisciplinary nature of this area of research, in this review, readers will find a collection of state-of-the-art solutions that are key to the integration of multi-omics data and AI for phenotyping experiments in horticulture, including experimental design considerations with several technical and non-technical challenges, which are discussed along with potential solutions. The future prospects of this integration include precision horticulture, predictive breeding, improved disease and stress response management, sustainable crop management, and exploration of plant biodiversity. The integration of multi-omics and AI holds immense promise for revolutionizing horticultural research and applications, heralding a new era in plant phenotyping.
Collapse
Affiliation(s)
- Danuta Cembrowska-Lech
- Department of Physiology and Biochemistry, Institute of Biology, University of Szczecin, Felczaka 3c, 71-412 Szczecin, Poland;
- Polish Society of Bioinformatics and Data Science BIODATA, Popiełuszki 4c, 71-214 Szczecin, Poland; (A.K.); (T.M.)
| | - Adrianna Krzemińska
- Polish Society of Bioinformatics and Data Science BIODATA, Popiełuszki 4c, 71-214 Szczecin, Poland; (A.K.); (T.M.)
- Institute of Biology, University of Szczecin, Wąska 13, 71-415 Szczecin, Poland;
| | - Tymoteusz Miller
- Polish Society of Bioinformatics and Data Science BIODATA, Popiełuszki 4c, 71-214 Szczecin, Poland; (A.K.); (T.M.)
- Institute of Marine and Environmental Sciences, University of Szczecin, Wąska 13, 71-415 Szczecin, Poland
| | - Anna Nowakowska
- Department of Physiology and Biochemistry, Institute of Biology, University of Szczecin, Felczaka 3c, 71-412 Szczecin, Poland;
| | - Cezary Adamski
- Institute of Biology, University of Szczecin, Wąska 13, 71-415 Szczecin, Poland;
| | | | - Grzegorz Mikiciuk
- Department of Horticulture, Faculty of Environmental Management and Agriculture, West Pomeranian University of Technology in Szczecin, Słowackiego 17, 71-434 Szczecin, Poland;
| | - Małgorzata Mikiciuk
- Department of Bioengineering, Faculty of Environmental Management and Agriculture, West Pomeranian University of Technology in Szczecin, Słowackiego 17, 71-434 Szczecin, Poland;
| |
Collapse
|
17
|
Chen W, Wang H, Liang C. Deep multi-view contrastive learning for cancer subtype identification. Brief Bioinform 2023; 24:bbad282. [PMID: 37539822 DOI: 10.1093/bib/bbad282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 05/29/2023] [Accepted: 07/19/2023] [Indexed: 08/05/2023] Open
Abstract
Cancer heterogeneity has posed great challenges in exploring precise therapeutic strategies for cancer treatment. The identification of cancer subtypes aims to detect patients with distinct molecular profiles and thus could provide new clues on effective clinical therapies. While great efforts have been made, it remains challenging to develop powerful computational methods that can efficiently integrate multi-omics datasets for the task. In this paper, we propose a novel self-supervised learning model called Deep Multi-view Contrastive Learning (DMCL) for cancer subtype identification. Specifically, by incorporating the reconstruction loss, contrastive loss and clustering loss into a unified framework, our model simultaneously encodes the sample discriminative information into the extracted feature representations and well preserves the sample cluster structures in the embedded space. Moreover, DMCL is an end-to-end framework where the cancer subtypes could be directly obtained from the model outputs. We compare DMCL with eight alternatives ranging from classic cancer subtype identification methods to recently developed state-of-the-art systems on 10 widely used cancer multi-omics datasets as well as an integrated dataset, and the experimental results validate the superior performance of our method. We further conduct a case study on liver cancer and the analysis results indicate that different subtypes might have different responses to the selected chemotherapeutic drugs.
Collapse
Affiliation(s)
- Wenlan Chen
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Hong Wang
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| |
Collapse
|
18
|
Zhang Y, Zhang N, Chai X, Sun T. Machine learning for image-based multi-omics analysis of leaf veins. JOURNAL OF EXPERIMENTAL BOTANY 2023; 74:4928-4941. [PMID: 37410807 DOI: 10.1093/jxb/erad251] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 06/29/2023] [Indexed: 07/08/2023]
Abstract
Veins are a critical component of the plant growth and development system, playing an integral role in supporting and protecting leaves, as well as transporting water, nutrients, and photosynthetic products. A comprehensive understanding of the form and function of veins requires a dual approach that combines plant physiology with cutting-edge image recognition technology. The latest advancements in computer vision and machine learning have facilitated the creation of algorithms that can identify vein networks and explore their developmental progression. Here, we review the functional, environmental, and genetic factors associated with vein networks, along with the current status of research on image analysis. In addition, we discuss the methods of venous phenotype extraction and multi-omics association analysis using machine learning technology, which could provide a theoretical basis for improving crop productivity by optimizing the vein network architecture.
Collapse
Affiliation(s)
- Yubin Zhang
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Ning Zhang
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Xiujuan Chai
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Tan Sun
- Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing, China
- Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| |
Collapse
|
19
|
Zheng Y, Liu Y, Yang J, Dong L, Zhang R, Tian S, Yu Y, Ren L, Hou W, Zhu F, Mai Y, Han J, Zhang L, Jiang H, Lin L, Lou J, Li R, Lin J, Liu H, Kong Z, Wang D, Dai F, Bao D, Cao Z, Chen Q, Chen Q, Chen X, Gao Y, Jiang H, Li B, Li B, Li J, Liu R, Qing T, Shang E, Shang J, Sun S, Wang H, Wang X, Zhang N, Zhang P, Zhang R, Zhu S, Scherer A, Wang J, Wang J, Huo Y, Liu G, Cao C, Shao L, Xu J, Hong H, Xiao W, Liang X, Lu D, Jin L, Tong W, Ding C, Li J, Fang X, Shi L. Multi-omics data integration using ratio-based quantitative profiling with Quartet reference materials. Nat Biotechnol 2023:10.1038/s41587-023-01934-1. [PMID: 37679543 DOI: 10.1038/s41587-023-01934-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 07/31/2023] [Indexed: 09/09/2023]
Abstract
Characterization and integration of the genome, epigenome, transcriptome, proteome and metabolome of different datasets is difficult owing to a lack of ground truth. Here we develop and characterize suites of publicly available multi-omics reference materials of matched DNA, RNA, protein and metabolites derived from immortalized cell lines from a family quartet of parents and monozygotic twin daughters. These references provide built-in truth defined by relationships among the family members and the information flow from DNA to RNA to protein. We demonstrate how using a ratio-based profiling approach that scales the absolute feature values of a study sample relative to those of a concurrently measured common reference sample produces reproducible and comparable data suitable for integration across batches, labs, platforms and omics types. Our study identifies reference-free 'absolute' feature quantification as the root cause of irreproducibility in multi-omics measurement and data integration and establishes the advantages of ratio-based multi-omics profiling with common reference materials.
Collapse
Affiliation(s)
- Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, China
| | | | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Sha Tian
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Feng Zhu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yuanbang Mai
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | | | | | | | - Ling Lin
- Zhangjiang Center for Translational Medicine, Shanghai Biotecan Medical Diagnostics Co. Ltd., Shanghai, China
| | - Jingwei Lou
- Zhangjiang Center for Translational Medicine, Shanghai Biotecan Medical Diagnostics Co. Ltd., Shanghai, China
| | - Ruiqiang Li
- Novogene Bioinformatics Institute, Beijing, China
| | - Jingchao Lin
- Metabo-Profile Biotechnology (Shanghai) Co. Ltd., Shanghai, China
| | | | | | - Depeng Wang
- Nextomics Biosciences Institute, Wuhan, China
| | | | - Ding Bao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Zehui Cao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qiaochu Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Xingdong Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yuechen Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - He Jiang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Bin Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Bingying Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
- Nextomics Biosciences Institute, Wuhan, China
| | - Ruimei Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Tao Qing
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Erfei Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Shanyue Sun
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Haiyan Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Xiaolin Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Peipei Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ruolan Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Sibo Zhu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Jiucun Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jing Wang
- National Institute of Metrology, Beijing, China
| | - Yinbo Huo
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Gang Liu
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Chengming Cao
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Li Shao
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Xiaozhen Liang
- Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences, Shanghai, China
| | - Daru Lu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Weida Tong
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Chen Ding
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China.
| | - Xiang Fang
- National Institute of Metrology, Beijing, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes (Shanghai), Shanghai, China.
| |
Collapse
|
20
|
Yu Y, Zhang N, Mai Y, Ren L, Chen Q, Cao Z, Chen Q, Liu Y, Hou W, Yang J, Hong H, Xu J, Tong W, Dong L, Shi L, Fang X, Zheng Y. Correcting batch effects in large-scale multiomics studies using a reference-material-based ratio method. Genome Biol 2023; 24:201. [PMID: 37674217 PMCID: PMC10483871 DOI: 10.1186/s13059-023-03047-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 05/18/2023] [Indexed: 09/08/2023] Open
Abstract
BACKGROUND Batch effects are notoriously common technical variations in multiomics data and may result in misleading outcomes if uncorrected or over-corrected. A plethora of batch-effect correction algorithms are proposed to facilitate data integration. However, their respective advantages and limitations are not adequately assessed in terms of omics types, the performance metrics, and the application scenarios. RESULTS As part of the Quartet Project for quality control and data integration of multiomics profiling, we comprehensively assess the performance of seven batch effect correction algorithms based on different performance metrics of clinical relevance, i.e., the accuracy of identifying differentially expressed features, the robustness of predictive models, and the ability of accurately clustering cross-batch samples into their own donors. The ratio-based method, i.e., by scaling absolute feature values of study samples relative to those of concurrently profiled reference material(s), is found to be much more effective and broadly applicable than others, especially when batch effects are completely confounded with biological factors of study interests. We further provide practical guidelines for implementing the ratio based approach in increasingly large-scale multiomics studies. CONCLUSIONS Multiomics measurements are prone to batch effects, which can be effectively corrected using ratio-based scaling of the multiomics data. Our study lays the foundation for eliminating batch effects at a ratio scale.
Collapse
Affiliation(s)
- Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yuanbang Mai
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qiaochu Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Zehui Cao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, Guangdong, China
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | | | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes, Shanghai, China.
| | - Xiang Fang
- National Institute of Metrology, Beijing, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
| |
Collapse
|
21
|
Gygi JP, Kleinstein SH, Guan L. Predictive overfitting in immunological applications: Pitfalls and solutions. Hum Vaccin Immunother 2023; 19:2251830. [PMID: 37697867 PMCID: PMC10498807 DOI: 10.1080/21645515.2023.2251830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 07/27/2023] [Accepted: 08/21/2023] [Indexed: 09/13/2023] Open
Abstract
Overfitting describes the phenomenon where a highly predictive model on the training data generalizes poorly to future observations. It is a common concern when applying machine learning techniques to contemporary medical applications, such as predicting vaccination response and disease status in infectious disease or cancer studies. This review examines the causes of overfitting and offers strategies to counteract it, focusing on model complexity reduction, reliable model evaluation, and harnessing data diversity. Through discussion of the underlying mathematical models and illustrative examples using both synthetic data and published real datasets, our objective is to equip analysts and bioinformaticians with the knowledge and tools necessary to detect and mitigate overfitting in their research.
Collapse
Affiliation(s)
- Jeremy P. Gygi
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
| | - Steven H. Kleinstein
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
| | - Leying Guan
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
22
|
Park J, Lee JW, Park M. Comparison of cancer subtype identification methods combined with feature selection methods in omics data analysis. BioData Min 2023; 16:18. [PMID: 37420304 DOI: 10.1186/s13040-023-00334-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 06/30/2023] [Indexed: 07/09/2023] Open
Abstract
BACKGROUND Cancer subtype identification is important for the early diagnosis of cancer and the provision of adequate treatment. Prior to identifying the subtype of cancer in a patient, feature selection is also crucial for reducing the dimensionality of the data by detecting genes that contain important information about the cancer subtype. Numerous cancer subtyping methods have been developed, and their performance has been compared. However, combinations of feature selection and subtype identification methods have rarely been considered. This study aimed to identify the best combination of variable selection and subtype identification methods in single omics data analysis. RESULTS Combinations of six filter-based methods and six unsupervised subtype identification methods were investigated using The Cancer Genome Atlas (TCGA) datasets for four cancers. The number of features selected varied, and several evaluation metrics were used. Although no single combination was found to have a distinctively good performance, Consensus Clustering (CC) and Neighborhood-Based Multi-omics Clustering (NEMO) used with variance-based feature selection had a tendency to show lower p-values, and nonnegative matrix factorization (NMF) stably showed good performance in many cases unless the Dip test was used for feature selection. In terms of accuracy, the combination of NMF and similarity network fusion (SNF) with Monte Carlo Feature Selection (MCFS) and Minimum-Redundancy Maximum Relevance (mRMR) showed good overall performance. NMF always showed among the worst performances without feature selection in all datasets, but performed much better when used with various feature selection methods. iClusterBayes (ICB) had decent performance when used without feature selection. CONCLUSIONS Rather than a single method clearly emerging as optimal, the best methodology was different depending on the data used, the number of features selected, and the evaluation method. A guideline for choosing the best combination method under various situations is provided.
Collapse
Affiliation(s)
- JiYoon Park
- Department of Statistics, Korea University, 145 Anam-Ro, Seongbuk-Gu, Seoul, 02841, South Korea
| | - Jae Won Lee
- Department of Statistics, Korea University, 145 Anam-Ro, Seongbuk-Gu, Seoul, 02841, South Korea
| | - Mira Park
- Department of Preventive Medicine, Eulji University, 77 Gyeryong-Ro, Jung-Gu, Daejeon, 34824, South Korea.
| |
Collapse
|
23
|
Erdem C, Gross SM, Heiser LM, Birtwistle MR. MOBILE pipeline enables identification of context-specific networks and regulatory mechanisms. Nat Commun 2023; 14:3991. [PMID: 37414767 PMCID: PMC10326020 DOI: 10.1038/s41467-023-39729-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 06/27/2023] [Indexed: 07/08/2023] Open
Abstract
Robust identification of context-specific network features that control cellular phenotypes remains a challenge. We here introduce MOBILE (Multi-Omics Binary Integration via Lasso Ensembles) to nominate molecular features associated with cellular phenotypes and pathways. First, we use MOBILE to nominate mechanisms of interferon-γ (IFNγ) regulated PD-L1 expression. Our analyses suggest that IFNγ-controlled PD-L1 expression involves BST2, CLIC2, FAM83D, ACSL5, and HIST2H2AA3 genes, which were supported by prior literature. We also compare networks activated by related family members transforming growth factor-beta 1 (TGFβ1) and bone morphogenetic protein 2 (BMP2) and find that differences in ligand-induced changes in cell size and clustering properties are related to differences in laminin/collagen pathway activity. Finally, we demonstrate the broad applicability and adaptability of MOBILE by analyzing publicly available molecular datasets to investigate breast cancer subtype specific networks. Given the ever-growing availability of multi-omics datasets, we envision that MOBILE will be broadly useful for identification of context-specific molecular features and pathways.
Collapse
Affiliation(s)
- Cemal Erdem
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA
| | - Sean M Gross
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Laura M Heiser
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA.
| | - Marc R Birtwistle
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA.
- Department of Bioengineering, Clemson University, Clemson, SC, USA.
| |
Collapse
|
24
|
Craddock J, Jiang J, Patrick SM, Mutambirwa SBA, Stricker PD, Bornman MSR, Jaratlerdsiri W, Hayes VM. Alterations in the Epigenetic Machinery Associated with Prostate Cancer Health Disparities. Cancers (Basel) 2023; 15:3462. [PMID: 37444571 DOI: 10.3390/cancers15133462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 06/27/2023] [Accepted: 06/29/2023] [Indexed: 07/15/2023] Open
Abstract
Prostate cancer is driven by acquired genetic alterations, including those impacting the epigenetic machinery. With African ancestry as a significant risk factor for aggressive disease, we hypothesize that dysregulation among the roughly 656 epigenetic genes may contribute to prostate cancer health disparities. Investigating prostate tumor genomic data from 109 men of southern African and 56 men of European Australian ancestry, we found that African-derived tumors present with a longer tail of epigenetic driver gene candidates (72 versus 10). Biased towards African-specific drivers (63 versus 9 shared), many are novel to prostate cancer (18/63), including several putative therapeutic targets (CHD7, DPF3, POLR1B, SETD1B, UBTF, and VPS72). Through clustering of all variant types and copy number alterations, we describe two epigenetic PCa taxonomies capable of differentiating patients by ancestry and predicted clinical outcomes. We identified the top genes in African- and European-derived tumors representing a multifunctional "generic machinery", the alteration of which may be instrumental in epigenetic dysregulation and prostate tumorigenesis. In conclusion, numerous somatic alterations in the epigenetic machinery drive prostate carcinogenesis, but African-derived tumors appear to achieve this state with greater diversity among such alterations. The greater novelty observed in African-derived tumors illustrates the significant clinical benefit to be derived from a much needed African-tailored approach to prostate cancer healthcare aimed at reducing prostate cancer health disparities.
Collapse
Affiliation(s)
- Jenna Craddock
- School of Health Systems and Public Health, Faculty of Health Sciences, University of Pretoria, Pretoria 0084, South Africa
| | - Jue Jiang
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, NSW 2006, Australia
| | - Sean M Patrick
- School of Health Systems and Public Health, Faculty of Health Sciences, University of Pretoria, Pretoria 0084, South Africa
| | - Shingai B A Mutambirwa
- Department of Urology, Sefako Makgatho Health Science University, Dr George Mukhari Academic Hospital, Medunsa 0208, South Africa
| | - Phillip D Stricker
- Department of Urology, St Vincent's Hospital, Darlinghurst, NSW 2010, Australia
| | - M S Riana Bornman
- School of Health Systems and Public Health, Faculty of Health Sciences, University of Pretoria, Pretoria 0084, South Africa
| | - Weerachai Jaratlerdsiri
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, NSW 2006, Australia
| | - Vanessa M Hayes
- School of Health Systems and Public Health, Faculty of Health Sciences, University of Pretoria, Pretoria 0084, South Africa
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, NSW 2006, Australia
- Manchester Cancer Research Centre, University of Manchester, Manchester M20 4GJ, UK
| |
Collapse
|
25
|
Ji Y, Dutta P, Davuluri R. Deep multi-omics integration by learning correlation-maximizing representation identifies prognostically stratified cancer subtypes. BIOINFORMATICS ADVANCES 2023; 3:vbad075. [PMID: 37424943 PMCID: PMC10328436 DOI: 10.1093/bioadv/vbad075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 04/08/2023] [Indexed: 07/11/2023]
Abstract
Motivation Molecular subtyping by integrative modeling of multi-omics and clinical data can help the identification of robust and clinically actionable disease subgroups; an essential step in developing precision medicine approaches. Results We developed a novel outcome-guided molecular subgrouping framework, called Deep Multi-Omics Integrative Subtyping by Maximizing Correlation (DeepMOIS-MC), for integrative learning from multi-omics data by maximizing correlation between all input -omics views. DeepMOIS-MC consists of two parts: clustering and classification. In the clustering part, the preprocessed high-dimensional multi-omics views are input into two-layer fully connected neural networks. The outputs of individual networks are subjected to Generalized Canonical Correlation Analysis loss to learn the shared representation. Next, the learned representation is filtered by a regression model to select features that are related to a covariate clinical variable, for example, a survival/outcome. The filtered features are used for clustering to determine the optimal cluster assignments. In the classification stage, the original feature matrix of one of the -omics view is scaled and discretized based on equal frequency binning, and then subjected to feature selection using RandomForest. Using these selected features, classification models (for example, XGBoost model) are built to predict the molecular subgroups that were identified at clustering stage. We applied DeepMOIS-MC on lung and liver cancers, using TCGA datasets. In comparative analysis, we found that DeepMOIS-MC outperformed traditional approaches in patient stratification. Finally, we validated the robustness and generalizability of the classification models on independent datasets. We anticipate that the DeepMOIS-MC can be adopted to many multi-omics integrative analyses tasks. Availability and implementation Source codes for PyTorch implementation of DGCCA and other DeepMOIS-MC modules are available at GitHub (https://github.com/duttaprat/DeepMOIS-MC). Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Yanrong Ji
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Pratik Dutta
- Department of Biomedical Informatics, Stony Brook Cancer Center, Stony Brook Medicine, Stony Brook University, Stony Brook, NY 11794, USA
| | - Ramana Davuluri
- Department of Biomedical Informatics, Stony Brook Cancer Center, Stony Brook Medicine, Stony Brook University, Stony Brook, NY 11794, USA
| |
Collapse
|
26
|
Chen Z, Yang Z, Zhu L, Gao P, Matsubara T, Kanaya S, Altaf-Ul-Amin M. Learning vector quantized representation for cancer subtypes identification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 236:107543. [PMID: 37100024 DOI: 10.1016/j.cmpb.2023.107543] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 02/13/2023] [Accepted: 04/07/2023] [Indexed: 05/21/2023]
Abstract
BACKGROUND AND OBJECTIVE Defining and separating cancer subtypes is essential for facilitating personalized therapy modality and prognosis of patients. The definition of subtypes has been constantly recalibrated as a result of our deepened understanding. During this recalibration, researchers often rely on clustering of cancer data to provide an intuitive visual reference that could reveal the intrinsic characteristics of subtypes. The data being clustered are often omics data such as transcriptomics that have strong correlations to the underlying biological mechanism. However, while existing studies have shown promising results, they suffer from issues associated with omics data: sample scarcity and high dimensionality while they impose unrealistic assumptions to extract useful features from the data while avoiding overfitting to spurious correlations. METHODS This paper proposes to leverage a recent strong generative model, Vector-Quantized Variational AutoEncoder, to tackle the data issues and extract discrete representations that are crucial to the quality of subsequent clustering by retaining only information relevant to reconstructing the input. RESULTS Extensive experiments and medical analysis on multiple datasets comprising 10 distinct cancers demonstrate the proposed clustering results can significantly and robustly improve prognosis over prevalent subtyping systems. CONCLUSION Our proposal does not impose strict assumptions on data distribution; while, its latent features are better representations of the transcriptomic data in different cancer subtypes, capable of yielding superior clustering performance with any mainstream clustering method.
Collapse
Affiliation(s)
- Zheng Chen
- Graduate School of Engineering Science, Osaka University, Japan.
| | - Ziwei Yang
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Japan
| | - Lingwei Zhu
- Department of Computing Science, University of Alberta, Canada
| | - Peng Gao
- Institute for Quantitative Biosciences, University of Tokyo, Japan
| | | | - Shigehiko Kanaya
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Japan; Data Science Center, Nara Insitute of Science and Technology, Japan
| | - Md Altaf-Ul-Amin
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Japan
| |
Collapse
|
27
|
Ruan P, Todd JL, Zhao H, Liu Y, Vinisko R, Soellner JF, Schmid R, Kaner RJ, Luckhardt TR, Neely ML, Noth I, Porteous M, Raj R, Safdar Z, Strek ME, Hesslinger C, Palmer SM, Leonard TB, Salisbury ML. Integrative multi-omics analysis reveals novel idiopathic pulmonary fibrosis endotypes associated with disease progression. Respir Res 2023; 24:141. [PMID: 37344825 PMCID: PMC10283254 DOI: 10.1186/s12931-023-02435-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 04/26/2023] [Indexed: 06/23/2023] Open
Abstract
BACKGROUND Idiopathic pulmonary fibrosis (IPF) is characterized by the accumulation of extracellular matrix in the pulmonary interstitium and progressive functional decline. We hypothesized that integration of multi-omics data would identify clinically meaningful molecular endotypes of IPF. METHODS The IPF-PRO Registry is a prospective registry of patients with IPF. Proteomic and transcriptomic (including total RNA [toRNA] and microRNA [miRNA]) analyses were performed using blood collected at enrollment. Molecular data were integrated using Similarity Network Fusion, followed by unsupervised spectral clustering to identify molecular subtypes. Cox proportional hazards models tested the relationship between these subtypes and progression-free and transplant-free survival. The molecular subtypes were compared to risk groups based on a previously described 52-gene (toRNA expression) signature. Biological characteristics of the molecular subtypes were evaluated via linear regression differential expression and canonical pathways (Ingenuity Pathway Analysis [IPA]) over-representation analyses. RESULTS Among 232 subjects, two molecular subtypes were identified. Subtype 1 (n = 105, 45.3%) and Subtype 2 (n = 127, 54.7%) had similar distributions of age (70.1 +/- 8.1 vs. 69.3 +/- 7.6 years; p = 0.31) and sex (79.1% vs. 70.1% males, p = 0.16). Subtype 1 had more severe disease based on composite physiologic index (CPI) (55.8 vs. 51.2; p = 0.002). After adjusting for CPI and antifibrotic treatment at enrollment, subtype 1 experienced shorter progression-free survival (HR 1.79, 95% CI 1.28,2.56; p = 0.0008) and similar transplant-free survival (HR 1.30, 95% CI 0.87,1.96; p = 0.20) as subtype 2. There was little agreement in the distribution of subjects to the molecular subtypes and the risk groups based on 52-gene signature (kappa = 0.04, 95% CI= -0.08, 0.17), and the 52-gene signature risk groups were associated with differences in transplant-free but not progression-free survival. Based on heatmaps and differential expression analyses, proteins and miRNAs (but not toRNA) contributed to classification of subjects to the molecular subtypes. The IPA showed enrichment in pulmonary fibrosis-relevant pathways, including mTOR, VEGF, PDGF, and B-cell receptor signaling. CONCLUSIONS Integration of transcriptomic and proteomic data from blood enabled identification of clinically meaningful molecular endotypes of IPF. If validated, these endotypes could facilitate identification of individuals likely to experience disease progression and enrichment of clinical trials. TRIAL REGISTRATION NCT01915511.
Collapse
Affiliation(s)
- Peifeng Ruan
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Jamie L Todd
- Duke Clinical Research Institute, Durham, NC, USA
- Duke University Medical Center, Durham, NC, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Yi Liu
- Boehringer Ingelheim Pharmaceuticals, Inc, Ridgefield, CT, USA
| | - Richard Vinisko
- Boehringer Ingelheim Pharmaceuticals, Inc, Ridgefield, CT, USA
| | | | - Ramona Schmid
- Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach, Germany
| | | | - Tracy R Luckhardt
- Department of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Megan L Neely
- Duke Clinical Research Institute, Durham, NC, USA
- Duke University Medical Center, Durham, NC, USA
| | - Imre Noth
- Division of Pulmonary and Critical Care Medicine, University of Virginia, Charlottesville, VA, USA
| | - Mary Porteous
- Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Rishi Raj
- Stanford University School of Medicine, Stanford, CA, USA
| | | | - Mary E Strek
- Section of Pulmonary and Critical Care Medicine, University of Chicago, Chicago, IL, USA
| | | | - Scott M Palmer
- Duke Clinical Research Institute, Durham, NC, USA
- Duke University Medical Center, Durham, NC, USA
| | | | - Margaret L Salisbury
- Department of Medicine, Vanderbilt University Medical Center, 1211 Medical Center Drive, 37232, Nashville, TN, USA.
| |
Collapse
|
28
|
Manganaro L, Bianco S, Bironzo P, Cipollini F, Colombi D, Corà D, Corti G, Doronzo G, Errico L, Falco P, Gandolfi L, Guerrera F, Monica V, Novello S, Papotti M, Parab S, Pittaro A, Primo L, Righi L, Sabbatini G, Sandri A, Vattakunnel S, Bussolino F, Scagliotti GV. Consensus clustering methodology to improve molecular stratification of non-small cell lung cancer. Sci Rep 2023; 13:7759. [PMID: 37173325 PMCID: PMC10182023 DOI: 10.1038/s41598-023-33954-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 04/21/2023] [Indexed: 05/15/2023] Open
Abstract
Recent advances in machine learning research, combined with the reduced sequencing costs enabled by modern next-generation sequencing, paved the way to the implementation of precision medicine through routine multi-omics molecular profiling of tumours. Thus, there is an emerging need of reliable models exploiting such data to retrieve clinically useful information. Here, we introduce an original consensus clustering approach, overcoming the intrinsic instability of common clustering methods based on molecular data. This approach is applied to the case of non-small cell lung cancer (NSCLC), integrating data of an ongoing clinical study (PROMOLE) with those made available by The Cancer Genome Atlas, to define a molecular-based stratification of the patients beyond, but still preserving, histological subtyping. The resulting subgroups are biologically characterized by well-defined mutational and gene-expression profiles and are significantly related to disease-free survival (DFS). Interestingly, it was observed that (1) cluster B, characterized by a short DFS, is enriched in KEAP1 and SKP2 mutations, that makes it an ideal candidate for further studies with inhibitors, and (2) over- and under-representation of inflammation and immune systems pathways in squamous-cell carcinomas subgroups could be potentially exploited to stratify patients treated with immunotherapy.
Collapse
Affiliation(s)
- L Manganaro
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - S Bianco
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - P Bironzo
- Medical Oncology Division at San Luigi Hospital, Department of Oncology, University of Torino, Orbassano (TO), Italy
| | - F Cipollini
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - D Colombi
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - D Corà
- Department of Translational Medicine, Piemonte Orientale University, Novara, Italy
- Center for Translational Research on Autoimmune and Allergic Diseases-CAAD, Novara, Italy
| | - G Corti
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - G Doronzo
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - L Errico
- Division of Thoracic Surgery at AOU San Luigi, Department of Oncology, University of Torino, Orbassano (TO), Italy
| | - P Falco
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - L Gandolfi
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - F Guerrera
- Division of Thoracic Surgery at AOU Città della Salute e della Scienza, Department of Surgical Sciences, University of Torino, Torino, Italy
| | - V Monica
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - S Novello
- Medical Oncology Division at San Luigi Hospital, Department of Oncology, University of Torino, Orbassano (TO), Italy
| | - M Papotti
- Pathology Division at AOU Città della Salute e della Scienza, Department of Oncology, University of Torino, Torino, Italy
| | - S Parab
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - A Pittaro
- Pathology Division at AOU Città della Salute e della Scienza, Department of Oncology, University of Torino, Torino, Italy
| | - L Primo
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - L Righi
- Pathology Division at AOU San Luigi, Department of Oncology, University of Torino, Orbassano (TO), Italy
| | - G Sabbatini
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - A Sandri
- Division of Thoracic Surgery at AOU San Luigi, Department of Oncology, University of Torino, Orbassano (TO), Italy
| | | | - F Bussolino
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - G V Scagliotti
- Medical Oncology Division at San Luigi Hospital, Department of Oncology, University of Torino, Orbassano (TO), Italy.
| |
Collapse
|
29
|
Zou H, Xiao L, Zeng D, Luo S. Multivariate functional mixed model with MRI data: An application to Alzheimer's disease. Stat Med 2023; 42:1492-1511. [PMID: 36805635 PMCID: PMC10133011 DOI: 10.1002/sim.9683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 11/09/2022] [Accepted: 01/26/2023] [Indexed: 02/22/2023]
Abstract
Alzheimer's Disease (AD) is the leading cause of dementia and impairment in various domains. Recent AD studies, (ie, Alzheimer's Disease Neuroimaging Initiative (ADNI) study), collect multimodal data, including longitudinal neurological assessments and magnetic resonance imaging (MRI) data, to better study the disease progression. Adopting early interventions is essential to slow AD progression for subjects with mild cognitive impairment (MCI). It is of particular interest to develop an AD predictive model that leverages multimodal data and provides accurate personalized predictions. In this article, we propose a multivariate functional mixed model with MRI data (MFMM-MRI) that simultaneously models longitudinal neurological assessments, baseline MRI data, and the survival outcome (ie, dementia onset) for subjects with MCI at baseline. Two functional forms (the random-effects model and instantaneous model) linking the longitudinal and survival process are investigated. We use Markov Chain Monte Carlo (MCMC) method based on No-U-Turn Sampling (NUTS) algorithm to obtain posterior samples. We develop a dynamic prediction framework that provides accurate personalized predictions of longitudinal trajectories and survival probability. We apply MFMM-MRI to the ADNI study and identify significant associations among longitudinal outcomes, MRI data, and the risk of dementia onset. The instantaneous model with voxels from the whole brain has the best prediction performance among all candidate models. The simulation study supports the validity of the estimation and dynamic prediction method.
Collapse
Affiliation(s)
- Haotian Zou
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, United States
| | - Luo Xiao
- Department of Statistics, North Carolina State University, North Carolina, United States
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, United States
| | - Sheng Luo
- Department of Biostatistics and Bioinformatics, Duke University, North Carolina, United States
| | | |
Collapse
|
30
|
Mo Q, Yun S, Sallman DA, Vincelette ND, Peng G, Zhang L, Lancet JE, Padron E. Integrative molecular subtypes of acute myeloid leukemia. Blood Cancer J 2023; 13:71. [PMID: 37156780 PMCID: PMC10167212 DOI: 10.1038/s41408-023-00836-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/04/2023] [Accepted: 04/11/2023] [Indexed: 05/10/2023] Open
Affiliation(s)
- Qianxing Mo
- Department of Biostatistics & Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, 33612, USA.
| | - Seongseok Yun
- Department of Malignant Hematology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, 33612, USA
| | - David A Sallman
- Department of Malignant Hematology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, 33612, USA
| | - Nicole D Vincelette
- Department of Malignant Hematology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, 33612, USA
| | - Guang Peng
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Ling Zhang
- Department of Hematopathology and Laboratory Medicine, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, 33612, USA
| | - Jeffrey E Lancet
- Department of Malignant Hematology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, 33612, USA
| | - Eric Padron
- Department of Malignant Hematology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, 33612, USA
| |
Collapse
|
31
|
Li Z, Melograna F, Hoskens H, Duroux D, Marazita ML, Walsh S, Weinberg SM, Shriver MD, Müller-Myhsok B, Claes P, Van Steen K. netMUG: a novel network-guided multi-view clustering workflow for dissecting genetic and facial heterogeneity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539350. [PMID: 37205363 PMCID: PMC10187283 DOI: 10.1101/2023.05.04.539350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Multi-view data offer advantages over single-view data for characterizing individuals, which is crucial in precision medicine toward personalized prevention, diagnosis, or treatment follow-up. Here, we develop a network-guided multi-view clustering framework named netMUG to identify actionable subgroups of individuals. This pipeline first adopts sparse multiple canonical correlation analysis to select multi-view features possibly informed by extraneous data, which are then used to construct individual-specific networks (ISNs). Finally, the individual subtypes are automatically derived by hierarchical clustering on these network representations. We applied netMUG to a dataset containing genomic data and facial images to obtain BMI-informed multi-view strata and showed how it could be used for a refined obesity characterization. Benchmark analysis of netMUG on synthetic data with known strata of individuals indicated its superior performance compared with both baseline and benchmark methods for multi-view clustering. In addition, the real-data analysis revealed subgroups strongly linked to BMI and genetic and facial determinants of these classes. NetMUG provides a powerful strategy, exploiting individual-specific networks to identify meaningful and actionable strata. Moreover, the implementation is easy to generalize to accommodate heterogeneous data sources or highlight data structures.
Collapse
Affiliation(s)
- Zuqi Li
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | | | - Hanne Hoskens
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Diane Duroux
- GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| | - Mary L. Marazita
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA 15219, USA
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Susan Walsh
- Department of Biology, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Seth M. Weinberg
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA 15219, USA
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Mark D. Shriver
- Department of Anthropology, Pennsylvania State University, State College, PA 16801, USA
| | | | - Peter Claes
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
- Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium
- Murdoch Children’s Research Institute, Melbourne, Victoria, Australia
| | - Kristel Van Steen
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| |
Collapse
|
32
|
Zhao J, Zhao B, Song X, Lyu C, Chen W, Xiong Y, Wei DQ. Subtype-DCC: decoupled contrastive clustering method for cancer subtype identification based on multi-omics data. Brief Bioinform 2023; 24:7005165. [PMID: 36702755 DOI: 10.1093/bib/bbad025] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 12/21/2022] [Accepted: 01/08/2023] [Indexed: 01/28/2023] Open
Abstract
Due to the high heterogeneity and complexity of cancers, patients with different cancer subtypes often have distinct groups of genomic and clinical characteristics. Therefore, the discovery and identification of cancer subtypes are crucial to cancer diagnosis, prognosis and treatment. Recent technological advances have accelerated the increasing availability of multi-omics data for cancer subtyping. To take advantage of the complementary information from multi-omics data, it is necessary to develop computational models that can represent and integrate different layers of data into a single framework. Here, we propose a decoupled contrastive clustering method (Subtype-DCC) based on multi-omics data integration for clustering to identify cancer subtypes. The idea of contrastive learning is introduced into deep clustering based on deep neural networks to learn clustering-friendly representations. Experimental results demonstrate the superior performance of the proposed Subtype-DCC model in identifying cancer subtypes over the currently available state-of-the-art clustering methods. The strength of Subtype-DCC is also supported by the survival and clinical analysis.
Collapse
Affiliation(s)
- Jing Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Bowen Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xiaotong Song
- School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Chujun Lyu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Weizhi Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
- Peng Cheng Laboratory, Vanke Cloud City Phase I Building 8, Xili Street, Nanshan District, Shenzhen, Guangdong, 518055, China
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nayang, Henan, 473006, China
| |
Collapse
|
33
|
Hernández-Verdin I, Kirasic E, Wienand K, Mokhtari K, Eimer S, Loiseau H, Rousseau A, Paillassa J, Ahle G, Lerintiu F, Uro-Coste E, Oberic L, Figarella-Branger D, Chinot O, Gauchotte G, Taillandier L, Marolleau JP, Polivka M, Adam C, Ursu R, Schmitt A, Barillot N, Nichelli L, Lozano-Sánchez F, Ibañez-Juliá MJ, Peyre M, Mathon B, Abada Y, Charlotte F, Davi F, Stewart C, de Reyniès A, Choquet S, Soussain C, Houillier C, Chapuy B, Hoang-Xuan K, Alentorn A. Molecular and clinical diversity in primary central nervous system lymphoma. Ann Oncol 2023; 34:186-199. [PMID: 36402300 DOI: 10.1016/j.annonc.2022.11.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 11/08/2022] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Primary central nervous system lymphoma (PCNSL) is a rare and distinct entity within diffuse large B-cell lymphoma presenting with variable response rates probably to underlying molecular heterogeneity. PATIENTS AND METHODS To identify and characterize PCNSL heterogeneity and facilitate clinical translation, we carried out a comprehensive multi-omic analysis [whole-exome sequencing, RNA sequencing (RNA-seq), methylation sequencing, and clinical features] in a discovery cohort of 147 fresh-frozen (FF) immunocompetent PCNSLs and a validation cohort of formalin-fixed, paraffin-embedded (FFPE) 93 PCNSLs with RNA-seq and clinico-radiological data. RESULTS Consensus clustering of multi-omic data uncovered concordant classification of four robust, non-overlapping, prognostically significant clusters (CS). The CS1 and CS2 groups presented an immune-cold hypermethylated profile but a distinct clinical behavior. The 'immune-hot' CS4 group, enriched with mutations increasing the Janus kinase (JAK)-signal transducer and activator of transcription (STAT) and nuclear factor-κB activity, had the most favorable clinical outcome, while the heterogeneous-immune CS3 group had the worse prognosis probably due to its association with meningeal infiltration and enriched HIST1H1E mutations. CS1 was characterized by high Polycomb repressive complex 2 activity and CDKN2A/B loss leading to higher proliferation activity. Integrated analysis on proposed targets suggests potential use of immune checkpoint inhibitors/JAK1 inhibitors for CS4, cyclin D-Cdk4,6 plus phosphoinositide 3-kinase (PI3K) inhibitors for CS1, lenalidomide/demethylating drugs for CS2, and enhancer of zeste 2 polycomb repressive complex 2 subunit (EZH2) inhibitors for CS3. We developed an algorithm to identify the PCNSL subtypes using RNA-seq data from either FFPE or FF tissue. CONCLUSIONS The integration of genome-wide data from multi-omic data revealed four molecular patterns in PCNSL with a distinctive prognostic impact that provides a basis for future clinical stratification and subtype-based targeted interventions.
Collapse
Affiliation(s)
- I Hernández-Verdin
- Institut du Cerveau-Paris Brain Institute-ICM, Inserm, Sorbonne Université, CNRS, Paris, France
| | - E Kirasic
- Institut du Cerveau-Paris Brain Institute-ICM, Inserm, Sorbonne Université, CNRS, Paris, France
| | - K Wienand
- Department of Hematology and Medical Oncology, University Medical Center Göttingen, Göttingen, Germany; Department of Hematology, Oncology and Cancer Immunology, Campus Benjamin Franklin, Charité-Universitätsmedizin Berlin, Berlin, Germany; Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - K Mokhtari
- Institut du Cerveau-Paris Brain Institute-ICM, Inserm, Sorbonne Université, CNRS, Paris, France; Department of Neuropathology, Groupe Hospitalier Pitié Salpêtrière, APHP, Paris, France
| | - S Eimer
- Department of Pathology, CHU de Bordeaux, Hôpital Pellegrin, Bordeaux, France
| | - H Loiseau
- Department of Neurosurgery, Bordeaux University Hospital Center, Pellegrin Hospital, Bordeaux, France; EA 7435-IMOTION, University of Bordeaux, Bordeaux, France
| | - A Rousseau
- Department of Pathology, PBH, CHU Angers, Angers, France; CRCINA, Université de Nantes-université d'Angers, Angers, France
| | - J Paillassa
- Department of Hematology, CHU Angers, Angers, France
| | - G Ahle
- Department of Neurology, Hôpitaux Civils de Colmar, Colmar, France
| | - F Lerintiu
- Department of Neuropathology, Hôpitaux Civils de Colmar, Strasbourg, France
| | - E Uro-Coste
- Department of Pathology, CHU de Toulouse, IUC-Oncopole, Toulouse, France; INSERM U1037, Cancer Research Center of Toulouse (CRCT), Toulouse, France; Université Toulouse III Paul Sabatier, Toulouse, France
| | - L Oberic
- Department of Hematology, IUC Toulouse Oncopole, Toulouse, France
| | - D Figarella-Branger
- Neuropathology Department, University Hospital Timone, Aix Marseille University, Marseille, France; Inst Neurophysiopathol, CNRS, INP, Aix-Marseille University, Marseille, France
| | - O Chinot
- Department of Neuro-oncology, CHU Timone, APHM, Marseille, France; Institute of NeuroPhysiopathology, CNRS, INP, Aix-Marseille University, Marseille, France
| | - G Gauchotte
- Department of Biopathology, CHRU Nancy, CHRU/ICL, Bâtiment BBB, Vandoeuvre-lès-Nancy, France; Department of Legal Medicine, CHRU Nancy, Vandoeuvre-lès-Nancy, France; INSERM U1256, University of Lorraine, Vandoeuvre-lès-Nancy, France; Centre de Ressources Biologiques, BB-0033-00035, CHRU, Nancy, France
| | - L Taillandier
- Department of Neuro-oncology, CHRU-Nancy, Université de Lorraine, Nancy, France
| | - J-P Marolleau
- Department of Hematology, CHU Amiens-Picardie, Amiens, France
| | - M Polivka
- Department of Anatomopathology, Lariboisière Hospital, Assistance Publique-Hopitaux de Paris, University of Paris, Paris, France
| | - C Adam
- Pathology Department, Bicêtre University Hospital, Public Hospital Network of Paris, Le Kremlin Bicêtre, France
| | - R Ursu
- Department of Neurology, Université de Paris, AP-HP, Hôpital Saint Louis, Paris, France
| | - A Schmitt
- Department of Hematology, Institut Bergonié Hospital, Bordeaux, France
| | - N Barillot
- Institut du Cerveau-Paris Brain Institute-ICM, Inserm, Sorbonne Université, CNRS, Paris, France
| | - L Nichelli
- Department of Neuroradiology, Sorbonne Université, Assistance Publique-Hôpitaux de Paris, Groupe Hospitalier Pitié-Salpêtrière-Charles Foix, Paris, France
| | - F Lozano-Sánchez
- Department of Neurology-2, Sorbonne Université, Assistance Publique-Hôpitaux de Paris, Groupe Hospitalier Pitié-Salpêtrière-Charles Foix, Paris, France
| | | | - M Peyre
- Institut du Cerveau-Paris Brain Institute-ICM, Inserm, Sorbonne Université, CNRS, Paris, France; Department of Neurosurgery, Sorbonne Université, Assistance Publique-Hôpitaux de Paris, Groupe Hospitalier Pitié-Salpêtrière-Charles Foix, Paris, France
| | - B Mathon
- Institut du Cerveau-Paris Brain Institute-ICM, Inserm, Sorbonne Université, CNRS, Paris, France; Department of Neurosurgery, Sorbonne Université, Assistance Publique-Hôpitaux de Paris, Groupe Hospitalier Pitié-Salpêtrière-Charles Foix, Paris, France
| | - Y Abada
- Department of Neurology-2, Sorbonne Université, Assistance Publique-Hôpitaux de Paris, Groupe Hospitalier Pitié-Salpêtrière-Charles Foix, Paris, France
| | - F Charlotte
- Department Pathology, Hôpital Pitié-Salpêtrière and Sorbonne University, Paris, France
| | - F Davi
- Department Hematology, APHP, Hôpital Pitié-Salpêtrière and Sorbonne University, Paris, France
| | - C Stewart
- Department Broad Institute of MIT and Harvard, Cambridge, USA
| | - A de Reyniès
- Department INSERM UMR_S1138-Centre de Recherche des Cordeliers-Université Pierre et Marie Curie et Université Paris Descartes, Paris, France
| | - S Choquet
- Department Pathology, Hôpital Pitié-Salpêtrière and Sorbonne University, Paris, France
| | - C Soussain
- Department Hematology Unit, Institut Curie, Saint-Cloud, France
| | - C Houillier
- Department of Neurology-2, Sorbonne Université, Assistance Publique-Hôpitaux de Paris, Groupe Hospitalier Pitié-Salpêtrière-Charles Foix, Paris, France
| | - B Chapuy
- Department of Hematology and Medical Oncology, University Medical Center Göttingen, Göttingen, Germany; Department of Hematology, Oncology and Cancer Immunology, Campus Benjamin Franklin, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - K Hoang-Xuan
- Institut du Cerveau-Paris Brain Institute-ICM, Inserm, Sorbonne Université, CNRS, Paris, France; Department of Neurology-2, Sorbonne Université, Assistance Publique-Hôpitaux de Paris, Groupe Hospitalier Pitié-Salpêtrière-Charles Foix, Paris, France
| | - A Alentorn
- Institut du Cerveau-Paris Brain Institute-ICM, Inserm, Sorbonne Université, CNRS, Paris, France; Department of Neurology-2, Sorbonne Université, Assistance Publique-Hôpitaux de Paris, Groupe Hospitalier Pitié-Salpêtrière-Charles Foix, Paris, France.
| |
Collapse
|
34
|
Sathyanarayanan A, Mueller TT, Ali Moni M, Schueler K, Baune BT, Lio P, Mehta D, Baune BT, Dierssen M, Ebert B, Fabbri C, Fusar-Poli P, Gennarelli M, Harmer C, Howes OD, Janzing JGE, Lio P, Maron E, Mehta D, Minelli A, Nonell L, Pisanu C, Potier MC, Rybakowski F, Serretti A, Squassina A, Stacey D, van Westrhenen R, Xicota L. Multi-omics data integration methods and their applications in psychiatric disorders. Eur Neuropsychopharmacol 2023; 69:26-46. [PMID: 36706689 DOI: 10.1016/j.euroneuro.2023.01.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 11/22/2022] [Accepted: 01/02/2023] [Indexed: 01/27/2023]
Abstract
To study mental illness and health, in the past researchers have often broken down their complexity into individual subsystems (e.g., genomics, transcriptomics, proteomics, clinical data) and explored the components independently. Technological advancements and decreasing costs of high throughput sequencing has led to an unprecedented increase in data generation. Furthermore, over the years it has become increasingly clear that these subsystems do not act in isolation but instead interact with each other to drive mental illness and health. Consequently, individual subsystems are now analysed jointly to promote a holistic understanding of the underlying biological complexity of health and disease. Complementing the increasing data availability, current research is geared towards developing novel methods that can efficiently combine the information rich multi-omics data to discover biologically meaningful biomarkers for diagnosis, treatment, and prognosis. However, clinical translation of the research is still challenging. In this review, we summarise conventional and state-of-the-art statistical and machine learning approaches for discovery of biomarker, diagnosis, as well as outcome and treatment response prediction through integrating multi-omics and clinical data. In addition, we describe the role of biological model systems and in silico multi-omics model designs in clinical translation of psychiatric research from bench to bedside. Finally, we discuss the current challenges and explore the application of multi-omics integration in future psychiatric research. The review provides a structured overview and latest updates in the field of multi-omics in psychiatry.
Collapse
Affiliation(s)
- Anita Sathyanarayanan
- Queensland University of Technology, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Faculty of Health, Kelvin Grove, Queensland 4059, Australia
| | - Tamara T Mueller
- Institute for Artificial Intelligence and Informatics in Medicine, TU Munich, 80333 Munich, Germany
| | - Mohammad Ali Moni
- Artificial Intelligence and Digital Health Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Katja Schueler
- Clinic for Psychosomatics, Hospital zum Heiligen Geist, Frankfurt am Main, Germany; Frankfurt Psychoanalytic Institute, Frankfurt am Main, Germany
| | - Bernhard T Baune
- Department of Psychiatry and Psychotherapy, University of Münster, Germany; Department of Psychiatry, Melbourne Medical School, University of Melbourne, Australia; The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Australia
| | - Pietro Lio
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Divya Mehta
- Queensland University of Technology, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Faculty of Health, Kelvin Grove, Queensland 4059, Australia.
| | | | - Bernhard T Baune
- Department of Psychiatry and Psychotherapy, University of Münster, Germany; Department of Psychiatry, Melbourne Medical School, University of Melbourne, Australia; The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Australia
| | - Mara Dierssen
- Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology; Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Bjarke Ebert
- Medical Strategy & Communication, H. Lundbeck A/S, Valby, Denmark
| | - Chiara Fabbri
- Department of Biomedical and NeuroMotor Sciences, University of Bologna, Bologna, Italy; Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Paolo Fusar-Poli
- Early Psychosis: Intervention and Clinical-detection (EPIC) Lab, Department of Psychosis Studies, King's College London, United Kingdom; Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| | - Massimo Gennarelli
- Department of Molecular and Translational Medicine, University of Brescia; Genetics Unit, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| | | | - Oliver D Howes
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom; Psychiatric Imaging, Medical Research Council Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London, United Kingdom
| | | | - Pietro Lio
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Eduard Maron
- Department of Psychiatry, University of Tartu, Tartu, Estonia; Centre for Neuropsychopharmacology, Division of Brain Sciences, Imperial College London, London, United Kingdom; Documental Ltd, Tallin, Estonia; West Tallinn Central Hospital, Tallinn, Estonia
| | - Divya Mehta
- Queensland University of Technology, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Faculty of Health, Kelvin Grove, Queensland 4059, Australia
| | - Alessandra Minelli
- Department of Molecular and Translational Medicine, University of Brescia; Genetics Unit, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| | - Lara Nonell
- MARGenomics, IMIM (Hospital del Mar Research Institute), Barcelona, Spain
| | - Claudia Pisanu
- Department of Biomedical Sciences, Section of Neuroscience and Clinical Pharmacology, University of Cagliari, Cagliari, Italy
| | | | - Filip Rybakowski
- Department of Psychiatry, Poznan University of Medical Sciences, Poznan, Poland
| | - Alessandro Serretti
- Department of Biomedical and NeuroMotor Sciences, University of Bologna, Bologna, Italy
| | - Alessio Squassina
- Department of Biomedical Sciences, Section of Neuroscience and Clinical Pharmacology, University of Cagliari, Cagliari, Italy
| | - David Stacey
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Roos van Westrhenen
- Parnassia Psychiatric Institute, Amsterdam, the Netherlands; Department of Psychiatry and Neuropsychology, Faculty of Health and Sciences, Maastricht University, Maastricht, the Netherlands; Institute of Psychiatry, Psychology & Neuroscience (IoPPN) King's College London, United Kingdom
| | - Laura Xicota
- Paris Brain Institute ICM, Salpetriere Hospital, Paris, France
| |
Collapse
|
35
|
Wei Y, Li L, Zhao X, Yang H, Sa J, Cao H, Cui Y. Cancer subtyping with heterogeneous multi-omics data via hierarchical multi-kernel learning. Brief Bioinform 2023; 24:6847203. [PMID: 36433785 DOI: 10.1093/bib/bbac488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 09/14/2022] [Accepted: 10/15/2022] [Indexed: 11/27/2022] Open
Abstract
Differentiating cancer subtypes is crucial to guide personalized treatment and improve the prognosis for patients. Integrating multi-omics data can offer a comprehensive landscape of cancer biological process and provide promising ways for cancer diagnosis and treatment. Taking the heterogeneity of different omics data types into account, we propose a hierarchical multi-kernel learning (hMKL) approach, a novel cancer molecular subtyping method to identify cancer subtypes by adopting a two-stage kernel learning strategy. In stage 1, we obtain a composite kernel borrowing the cancer integration via multi-kernel learning (CIMLR) idea by optimizing the kernel parameters for individual omics data type. In stage 2, we obtain a final fused kernel through a weighted linear combination of individual kernels learned from stage 1 using an unsupervised multiple kernel learning method. Based on the final fusion kernel, k-means clustering is applied to identify cancer subtypes. Simulation studies show that hMKL outperforms the one-stage CIMLR method when there is data heterogeneity. hMKL can estimate the number of clusters correctly, which is the key challenge in subtyping. Application to two real data sets shows that hMKL identified meaningful subtypes and key cancer-associated biomarkers. The proposed method provides a novel toolkit for heterogeneous multi-omics data integration and cancer subtypes identification.
Collapse
Affiliation(s)
- Yifang Wei
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Lingmei Li
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Xin Zhao
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Haitao Yang
- Division of Health Statistics, School of Public Health, Hebei Medical University, Shijiazhuang, Hebei 050017, PR China
| | - Jian Sa
- Department of Science and Technology, Shanxi Provincial Key Laboratory of Major Disease Risk Assessment, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Hongyan Cao
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China.,Department of Mathematics, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
36
|
Ge S, Liu J, Cheng Y, Meng X, Wang X. Multi-view spectral clustering with latent representation learning for applications on multi-omics cancer subtyping. Brief Bioinform 2023; 24:6850565. [PMID: 36445207 DOI: 10.1093/bib/bbac500] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 09/19/2022] [Accepted: 10/22/2022] [Indexed: 11/30/2022] Open
Abstract
Driven by multi-omics data, some multi-view clustering algorithms have been successfully applied to cancer subtypes prediction, aiming to identify subtypes with biometric differences in the same cancer, thereby improving the clinical prognosis of patients and designing personalized treatment plan. Due to the fact that the number of patients in omics data is much smaller than the number of genes, multi-view spectral clustering based on similarity learning has been widely developed. However, these algorithms still suffer some problems, such as over-reliance on the quality of pre-defined similarity matrices for clustering results, inability to reasonably handle noise and redundant information in high-dimensional omics data, ignoring complementary information between omics data, etc. This paper proposes multi-view spectral clustering with latent representation learning (MSCLRL) method to alleviate the above problems. First, MSCLRL generates a corresponding low-dimensional latent representation for each omics data, which can effectively retain the unique information of each omics and improve the robustness and accuracy of the similarity matrix. Second, the obtained latent representations are assigned appropriate weights by MSCLRL, and global similarity learning is performed to generate an integrated similarity matrix. Third, the integrated similarity matrix is used to feed back and update the low-dimensional representation of each omics. Finally, the final integrated similarity matrix is used for clustering. In 10 benchmark multi-omics datasets and 2 separate cancer case studies, the experiments confirmed that the proposed method obtained statistically and biologically meaningful cancer subtypes.
Collapse
Affiliation(s)
- Shuguang Ge
- School of Information and Control Engineering, China University of Mining and Technology, No. 1, Daxue Road, 221116 Xuzhou, Jiangsu, China.,Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, No. 1, Daxue Road, 221116 Xuzhou, Jiangsu, China
| | - Jian Liu
- School of Information and Control Engineering, China University of Mining and Technology, No. 1, Daxue Road, 221116 Xuzhou, Jiangsu, China.,Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, No. 1, Daxue Road, 221116 Xuzhou, Jiangsu, China
| | - Yuhu Cheng
- School of Information and Control Engineering, China University of Mining and Technology, No. 1, Daxue Road, 221116 Xuzhou, Jiangsu, China.,Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, No. 1, Daxue Road, 221116 Xuzhou, Jiangsu, China
| | - Xiaojing Meng
- School of Medical Information and Engineering, Xuzhou Medical University, No. 209, Tongshan Road, 221116 Xuzhou, Jiangsu, China
| | - Xuesong Wang
- School of Information and Control Engineering, China University of Mining and Technology, No. 1, Daxue Road, 221116 Xuzhou, Jiangsu, China.,Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, No. 1, Daxue Road, 221116 Xuzhou, Jiangsu, China
| |
Collapse
|
37
|
Chalise P, Kwon D, Fridley BL, Mo Q. Statistical Methods for Integrative Clustering of Multi-omics Data. Methods Mol Biol 2023; 2629:73-93. [PMID: 36929074 PMCID: PMC10950392 DOI: 10.1007/978-1-0716-2986-4_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Cancers are heterogeneous diseases caused by accumulated mutations or abnormal alterations at multi-levels of biological processes including genomics, epigenomics, transcriptomics, and proteomics. There is a great clinical interest in identifying cancer molecular subtypes for disease prognosis and personalized medicine. Integrative clustering is a powerful unsupervised learning method that has been increasingly used to identify cancer molecular subtypes using multi-omics data including somatic mutations, DNA copy numbers, DNA methylation, and gene expression. Integrative clustering methods are generally classified into model-based or nonparametric approaches. In this chapter, we will give an overview of the frequently used model-based methods, including iCluster, iClusterPlus, and iClusterBayes, and the nonparametric method, integrative nonnegative matrix factorization (intNMF). We will use the integrative analyses of uveal melanoma and lower-grade glioma to illustrate these representative methods. Finally, we will discuss the strengths and limitations of these representative methods and give suggestions for performing integrative analyses of cancer multi-omics data in practice.
Collapse
Affiliation(s)
- Prabhakar Chalise
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, KS, USA
| | - Deukwoo Kwon
- Department of Population Health Science & Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Brooke L Fridley
- Department of Biostatistics & Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Qianxing Mo
- Department of Biostatistics & Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA.
| |
Collapse
|
38
|
Wang X, Fridley BL. Multi-omics Data Deconvolution and Integration: New Methods, Insights, and Translational Implications. Methods Mol Biol 2023; 2629:1-9. [PMID: 36929070 DOI: 10.1007/978-1-0716-2986-4_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
In the current era of multi-omics, new sequencing and molecular profiling technologies have facilitated our quest for a deeper and broader understanding of the variations and dynamic regulations in human genomes. However, analyzing and integrating data generated from diverse platforms, modalities, and large-scale heterogeneous samples to extract functional and clinically valuable information remains a significant challenge. Here, we first discuss recent advances in methods and algorithms for analyzing data at the genome, transcriptome, proteome, metabolome, and microbiome levels, followed by emerging methods for leveraging single-cell sequencing and spatial transcriptomic data. We also highlight the mechanistic insights that these advances can bring to the field, as well as the current challenges and outlooks relating to their translational and reproducible adoption at the population level. It is evident that novel statistical methods, which were inspired by new assays, will enable the associated molecular profiling pipelines and experimental designs to continuously improve our understanding of the human genome and the downstream consequences in the transcriptome, epigenome, proteome, metabolome, regulome, and microbiome.
Collapse
Affiliation(s)
- Xuefeng Wang
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Brooke L Fridley
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA.
| |
Collapse
|
39
|
Shi X, Liang C, Wang H. Multiview Robust Graph-Based Clustering for Cancer Subtype Identification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:544-556. [PMID: 35044919 DOI: 10.1109/tcbb.2022.3143897] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Cancer subtype identification is to classify cancer into groups according to their molecular characteristics and clinical manifestations and is the basis for more personalized diagnosis and therapy. Public datasets such as The Cancer Genome Atlas (TCGA) have collected a massive number of multi-omics data. The accumulation of these datasets provides unprecedented opportunities to study the mechanism of cancers and further identify cancer subtypes at a comprehensive level. In this paper, we propose a multi-view robust graph-based clustering (MRGC) method to effectively identify cancer subtypes. Our method first learns robust latent representations from the raw omics data to alleviate the influences of the noise, where a set of similarity matrices are then adaptively learned based on these new representations. Finally, a global similarity graph is obtained by exploiting the consensus structure from the graphs. As a result, the three parts in our method can reinforce each other in a mutual iterative manner. We conduct extensive experiments on both generic machine learning datasets and cancer datasets. The experimental results confirm that our model can achieve satisfactory clustering performance compared to several state-of-the-art approaches. Moreover, we convey the practicability of MRGC by carrying out a case study on hepatocellular carcinoma.
Collapse
|
40
|
Li B, Zhang F, Niu Q, Liu J, Yu Y, Wang P, Zhang S, Zhang H, Wang Z. A molecular classification of gastric cancer associated with distinct clinical outcomes and validated by an XGBoost-based prediction model. MOLECULAR THERAPY. NUCLEIC ACIDS 2022; 31:224-240. [PMID: 36700042 PMCID: PMC9843270 DOI: 10.1016/j.omtn.2022.12.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Accepted: 12/22/2022] [Indexed: 12/28/2022]
Abstract
Gastric cancer (GC) is a heterogeneous disease and a leading cause of cancer-related deaths. Discovering robust, clinically relevant molecular classifications is critical for guiding personalized therapies for GC. Here, we propose a refined molecular classification scheme for GC using integrated optimal algorithms and multi-omics data. Based on the important features of mRNA, microRNA, and DNA methylation data selected by the multivariate Cox regression model, three subtypes linked to distinct clinical outcomes were identified by combining similarity network fusion and consensus clustering methods. Three subtypes were validated by an extreme gradient boosting machine learning prediction model with 125 differentially expressed genes in multiple independent cohorts. The molecular characteristics of mutation signatures, characteristic gene sets, driver genes, and chemotherapy sensitivity for each subtype were also identified: subtype 1 was associated with favorable prognosis and characterized by high ARID1A and PIK3CA mutations, subtype 2 was associated with a poor prognosis and harbored high recurrent TP53 mutations, and subtype 3 was associated with high CHD1, APOA1 mutations, and a poor prognosis. The proposed three-subtype scheme achieved a better clinical prediction performance (area under the curve value = 0.71) than The Cancer Genome Atlas classification, which may provide a practical subtyping framework to improve the treatment of GC.
Collapse
Affiliation(s)
- Bing Li
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Fengbin Zhang
- Department of Gastroenterology and Hepatology, The Fourth Hospital of Hebei Medical University, Shijiazhuang 050011, China
| | - Qikai Niu
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Jun Liu
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Yanan Yu
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Pengqian Wang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Siqi Zhang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Huamin Zhang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China,Corresponding author: Huamin Zhang, Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China.
| | - Zhong Wang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China,Corresponding author: Zhong Wang, Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China.
| |
Collapse
|
41
|
Athieniti E, Spyrou GM. A guide to multi-omics data collection and integration for translational medicine. Comput Struct Biotechnol J 2022; 21:134-149. [PMID: 36544480 PMCID: PMC9747357 DOI: 10.1016/j.csbj.2022.11.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/02/2022] Open
Abstract
The emerging high-throughput technologies have led to the shift in the design of translational medicine projects towards collecting multi-omics patient samples and, consequently, their integrated analysis. However, the complexity of integrating these datasets has triggered new questions regarding the appropriateness of the available computational methods. Currently, there is no clear consensus on the best combination of omics to include and the data integration methodologies required for their analysis. This article aims to guide the design of multi-omics studies in the field of translational medicine regarding the types of omics and the integration method to choose. We review articles that perform the integration of multiple omics measurements from patient samples. We identify five objectives in translational medicine applications: (i) detect disease-associated molecular patterns, (ii) subtype identification, (iii) diagnosis/prognosis, (iv) drug response prediction, and (v) understand regulatory processes. We describe common trends in the selection of omic types combined for different objectives and diseases. To guide the choice of data integration tools, we group them into the scientific objectives they aim to address. We describe the main computational methods adopted to achieve these objectives and present examples of tools. We compare tools based on how they deal with the computational challenges of data integration and comment on how they perform against predefined objective-specific evaluation criteria. Finally, we discuss examples of tools for downstream analysis and further extraction of novel insights from multi-omics datasets.
Collapse
|
42
|
Rong Z, Liu Z, Song J, Cao L, Yu Y, Qiu M, Hou Y. MCluster-VAEs: An end-to-end variational deep learning-based clustering method for subtype discovery using multi-omics data. Comput Biol Med 2022; 150:106085. [PMID: 36162197 DOI: 10.1016/j.compbiomed.2022.106085] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 07/30/2022] [Accepted: 09/03/2022] [Indexed: 11/03/2022]
Abstract
The discovery of cancer subtypes based on unsupervised clustering helps in providing a precise diagnosis, guide treatment, and improve patients' prognoses. Instead of single-omics data, multi-omics data can improve the clustering performance because it obtains a comprehensive landscape for understanding biological systems and mechanisms. However, heterogeneous data from multiple sources raises high complexity and different kinds of noise, which are detrimental to the extraction of clustering information. We propose an end-to-end deep learning based method, called Multi-omics Clustering Variational Autoencoders (MCluster-VAEs), that can extract cluster-friendly representations on multi-omics data. First, a unified network architecture with an attention mechanism was developed for accurately modeling multi-omics data. Then, using a novel objective function built from the Variational Bayes technique, the model was trained to effectively obtain the posterior estimation of the clustering assignments. Compared with 12 other state-of-the-art multi-omics clustering methods, MCluster-VAEs achieved an outstanding performance on benchmark datasets from the TCGA database. On the Pan Cancer dataset, MCluster-VAEs achieved an adjusted Rand index of approximately 0.78 for cancer category recognition, an increase of more than 18% compared with other methods. Furthermore, a survival analysis and clinical parameter enrichment tests conducted on 10 cancer datasets demonstrated that MCluster-VAEs provides comparable and even better results than many common integrative approaches. These results demonstrate that MCluster-VAEs are a powerful new tool for dissecting complex multi-omics relationships and providing new insights for cancer subtype discovery.
Collapse
Affiliation(s)
- Zhiwei Rong
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Zhilin Liu
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Jiali Song
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Lei Cao
- Department of Epidemiology and Biostatistics Harbin, Harbin Medical University School of Public Health, Harbin, 150000, Heilongjiang, China
| | - Yipe Yu
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Mantang Qiu
- Department of Thoracic Surgery Beijing, Peking University People's Hospital, Beijing, 100000, China.
| | - Yan Hou
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China; Peking University Clinical Research Center, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China.
| |
Collapse
|
43
|
Li W, Shao C, Zhou H, Du H, Chen H, Wan H, He Y. Multi-omics research strategies in ischemic stroke: A multidimensional perspective. Ageing Res Rev 2022; 81:101730. [PMID: 36087702 DOI: 10.1016/j.arr.2022.101730] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Revised: 08/23/2022] [Accepted: 09/03/2022] [Indexed: 01/31/2023]
Abstract
Ischemic stroke (IS) is a multifactorial and heterogeneous neurological disorder with high rate of death and long-term impairment. Despite years of studies, there are still no stroke biomarkers for clinical practice, and the molecular mechanisms of stroke remain largely unclear. The high-throughput omics approach provides new avenues for discovering biomarkers of IS and explaining its pathological mechanisms. However, single-omics approaches only provide a limited understanding of the biological pathways of diseases. The integration of multiple omics data means the simultaneous analysis of thousands of genes, RNAs, proteins and metabolites, revealing networks of interactions between multiple molecular levels. Integrated analysis of multi-omics approaches will provide helpful insights into stroke pathogenesis, therapeutic target identification and biomarker discovery. Here, we consider advances in genomics, transcriptomics, proteomics and metabolomics and outline their use in discovering the biomarkers and pathological mechanisms of IS. We then delineate strategies for achieving integration at the multi-omics level and discuss how integrative omics and systems biology can contribute to our understanding and management of IS.
Collapse
Affiliation(s)
- Wentao Li
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, China.
| | - Chongyu Shao
- School of Life Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, China.
| | - Huifen Zhou
- School of Life Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, China.
| | - Haixia Du
- School of Life Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, China.
| | - Haiyang Chen
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, China.
| | - Haitong Wan
- School of Life Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, China.
| | - Yu He
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, China.
| |
Collapse
|
44
|
Zhanpeng H, Jiekang W. A Multiview Clustering Method With Low-Rank and Sparsity Constraints for Cancer Subtyping. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3213-3223. [PMID: 34705654 DOI: 10.1109/tcbb.2021.3122917] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multiomics data clustering is one of the major challenges in the field of precision medicine. Integration of multiomics data for cancer subtyping can improve the understanding on cancer and reveal systems-level insights. How to integrate multiomics data for accurate cancer subtyping is an interesting and challenging research problem. To capture the global and the local structure of omics data, a novel framework for integrating multiomics data is proposed for cancer subtyping. Multiview clustering with low-rank and sparsity constraints (MVCLRS) can measure the local similarities of samples in each omics data and obtain global consensus structures by integrating the multiomics data. The main insight provided by MVCLRS is that low-rank sparse subspace clustering for the construction of an affinity matrix can best capture the local similarities in omics data. Extensive testing is conducted on 10 real world cancer datasets with multiomics from The Cancer Genome Atlas. Compared with 10 state-of-the-art multiomics clustering algorithms, the MVCLRS performs better in the 10 cancer datasets by providing its clustering results with at least one enriched clinical label in nine of ten cancer subtypes, the most of any method.
Collapse
|
45
|
Guan S, He Y, Su Y, Zhou L. A Risk Signature Consisting of Eight m 6A Methylation Regulators Predicts the Prognosis of Glioma. Cell Mol Neurobiol 2022; 42:2733-2743. [PMID: 34432221 DOI: 10.1007/s10571-021-01135-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 07/27/2021] [Indexed: 01/05/2023]
Abstract
Glioma progression seriously correlates to the epigenetic context. This study aims to identify glioma subtypes by clustering analysis of patients using the multi-omics data of N6-methyladenosine (m6A) methylation regulators and to construct a risk signature for investigating the role of m6A methylation regulators in the prognosis of glioma. Multi-omics data of glioma and normal control tissues were obtained through The Cancer Genome Atlas (TCGA) database. The clustering analysis of multi-omics data of patients was conducted using the R package iClusterPlus software. The risk model was constructed by univariate and multivariate Cox analysis, and the glioma expression data and related clinical data were obtained by Chinese Glioma Genome Atlas (CGGA) datasets to verify the risk model. By analyzing the glioma data in TCGA, we found that the risk signature could be constructed according to the eight genes with m6A methylation modification function, including ALKBH5, HNRNPA2B1, IGF2BP2, IGF2BP3, RBM15, WTAP, YTHDF1, and YTHDF2. Meanwhile, we found that IGF2BP2 and IGF2BP3 were highly expressed in glioma subtypes with high-risk scores and closely related to the prognosis of glioma patients. m6A methylation regulators, especially IGF2BP2 and IGF2BP3, play important roles in the malignant progression of glioma. The risk signature constructed by eight m6A methylation regulators can predict the prognosis of glioma. IGF2BP2 and IGF2BP3 may be the key regulatory factors of m6A methylation regulators involved in the occurrence and development of glioma, and can serve as molecular markers for the prognosis of glioma.
Collapse
Affiliation(s)
- Sizhong Guan
- Department of Neurosurgery, The First Hospital of China Medical University, Shenyang, 110001, People's Republic of China
| | - Ye He
- Department of Laboratory Medicine, The First Hospital of China Medical University, Shenyang, 110001, People's Republic of China
| | - Yanna Su
- Department of Laboratory Medicine, The First Hospital of China Medical University, Shenyang, 110001, People's Republic of China
| | - Liping Zhou
- Post Graduation Training Department, The First Hospital of China Medical University, No. 155, Northern Nanjing Road, Heping District, Shenyang, 110001, Liaoning Province, People's Republic of China.
| |
Collapse
|
46
|
Raufaste-Cazavieille V, Santiago R, Droit A. Multi-omics analysis: Paving the path toward achieving precision medicine in cancer treatment and immuno-oncology. Front Mol Biosci 2022; 9:962743. [PMID: 36304921 PMCID: PMC9595279 DOI: 10.3389/fmolb.2022.962743] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 09/21/2022] [Indexed: 11/13/2022] Open
Abstract
The acceleration of large-scale sequencing and the progress in high-throughput computational analyses, defined as omics, was a hallmark for the comprehension of the biological processes in human health and diseases. In cancerology, the omics approach, initiated by genomics and transcriptomics studies, has revealed an incredible complexity with unsuspected molecular diversity within a same tumor type as well as spatial and temporal heterogeneity of tumors. The integration of multiple biological layers of omics studies brought oncology to a new paradigm, from tumor site classification to pan-cancer molecular classification, offering new therapeutic opportunities for precision medicine. In this review, we will provide a comprehensive overview of the latest innovations for multi-omics integration in oncology and summarize the largest multi-omics dataset available for adult and pediatric cancers. We will present multi-omics techniques for characterizing cancer biology and show how multi-omics data can be combined with clinical data for the identification of prognostic and treatment-specific biomarkers, opening the way to personalized therapy. To conclude, we will detail the newest strategies for dissecting the tumor immune environment and host–tumor interaction. We will explore the advances in immunomics and microbiomics for biomarker identification to guide therapeutic decision in immuno-oncology.
Collapse
Affiliation(s)
| | - Raoul Santiago
- CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Division of Pediatric Hematology-Oncology, Centre Hospitalier Universitaire de L’Université Laval, Charles Bruneau Cancer Center, Québec, QC, Canada
- *Correspondence: Raoul Santiago, ; Arnaud Droit,
| | - Arnaud Droit
- CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- *Correspondence: Raoul Santiago, ; Arnaud Droit,
| |
Collapse
|
47
|
Zhong C, Xie T, Chen L, Zhong X, Li X, Cai X, Chen K, Lan S. Immune depletion of the methylated phenotype of colon cancer is closely related to resistance to immune checkpoint inhibitors. Front Immunol 2022; 13:983636. [PMID: 36159794 PMCID: PMC9492852 DOI: 10.3389/fimmu.2022.983636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 08/02/2022] [Indexed: 11/27/2022] Open
Abstract
Background Molecular typing based on single omics data has its limitations and requires effective integration of multiple omics data for tumor typing of colorectal cancer (CRC). Methods Transcriptome expression, DNA methylation, somatic mutation, clinicopathological information, and copy number variation were retrieved from TCGA, UCSC Xena, cBioPortal, FireBrowse, or GEO. After pre-processing and calculating the clustering prediction index (CPI) with gap statistics, integrative clustering analysis was conducted via MOVICS. The tumor microenvironment (TME) was deconvolved using several algorithms such as GSVA, MCPcounter, ESTIMATE, and PCA. The metabolism-relevant pathways were extracted through ssGSEA. Differential analysis was based on limma and enrichment analysis was carried out by Enrichr. DNA methylation and transcriptome expression were integrated via ELMER. Finally, nearest template or hemotherapeutic sensitivity prediction was conducted using NTP or pRRophetic. Results Three molecular subtypes (CS1, CS2, and CS3) were recognized by integrating transcriptome, DNA methylation, and driver mutations. CRC patients in CS3 had the most favorable prognosis. A total of 90 differentially mutated genes among the three CSs were obtained, and CS3 displayed the highest tumor mutation burden (TMB), while significant instability across the entire chromosome was observed in the CS2 group. A total of 30 upregulated mRNAs served as classifiers were identified and the similar diversity in clinical outcomes of CS3 was validated in four external datasets. The heterogeneity in the TME and metabolism-related pathways were also observed in the three CSs. Furthermore, we found CS2 tended to loss methylations while CS3 tended to gain methylations. Univariate and multivariate Cox regression revealed that the subtypes were independent prognostic factors. For the drug sensitivity analysis, we found patients in CS2 were more sensitive to ABT.263, NSC.87877, BIRB.0796, and PAC.1. By Integrating with the DNA mutation and RNA expression in CS3, we identified that SOX9, a specific marker of CS3, was higher in the tumor than tumor adjacent by IHC in the in-house cohort and public cohort. Conclusion The molecular subtypes based on integrated multi-omics uncovered new insights into the prognosis, mechanisms, and clinical therapeutic targets for CRC.
Collapse
Affiliation(s)
- Chengqian Zhong
- Department of Digestive Endoscopy center, Longyan First Affiliated Hospital of Fujian Medical University, Longyan, China
| | - Tingjiang Xie
- Department of Gastrointestinal Surgery, Longyan First Affiliated Hospital of Fujian Medical University, Longyan, China
| | - Long Chen
- Department of Gastrointestinal Surgery, Longyan First Affiliated Hospital of Fujian Medical University, Longyan, China
| | - Xuejing Zhong
- Department of Science and Education, Longyan First Affiliated Hospital of Fujian Medical University, Longyan, China
| | - Xinjing Li
- Department of Pathology, Longyan First Affiliated Hospital of Fujian Medical University, Longyan, China
| | - Xiumei Cai
- Department of Digestive Endoscopy center, Longyan First Affiliated Hospital of Fujian Medical University, Longyan, China
| | - Kaihong Chen
- Department of Cardiology, Longyan First Affiliated Hospital of Fujian Medical University, Longyan, China
- *Correspondence: Shiqian Lan, ; Kaihong Chen,
| | - Shiqian Lan
- Department of Digestive Endoscopy center, Longyan First Affiliated Hospital of Fujian Medical University, Longyan, China
- *Correspondence: Shiqian Lan, ; Kaihong Chen,
| |
Collapse
|
48
|
HSSG: Identification of Cancer Subtypes Based on Heterogeneity Score of A Single Gene. Cells 2022; 11:cells11152456. [PMID: 35954300 PMCID: PMC9368717 DOI: 10.3390/cells11152456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 08/02/2022] [Accepted: 08/04/2022] [Indexed: 11/17/2022] Open
Abstract
Cancer is a highly heterogeneous disease, which leads to the fact that even the same cancer can be further classified into different subtypes according to its pathology. With the multi-omics data widely used in cancer subtypes identification, effective feature selection is essential for accurately identifying cancer subtypes. However, the feature selection in the existing cancer subtypes identification methods has the problem that the most helpful features cannot be selected from a biomolecular perspective, and the relationship between the selected features cannot be reflected. To solve this problem, we propose a method for feature selection to identify cancer subtypes based on the heterogeneity score of a single gene: HSSG. In the proposed method, the sample-similarity network of a single gene is constructed, and pseudo-F statistics calculates the heterogeneity score for cancer subtypes identification of each gene. Finally, we construct gene-gene networks using genes with higher heterogeneity scores and mine essential genes from the networks. From the seven TCGA data sets for three experiments, including cancer subtypes identification in single-omics data, the performance in feature selection of multi-omics data, and the effectiveness and stability of the selected features, HSSG achieves good performance in all. This indicates that HSSG can effectively select features for subtypes identification.
Collapse
|
49
|
Bard JE, Nowak NJ, Buck MJ, Sinha S. Multimodal Dimension Reduction and Subtype Classification of Head and Neck Squamous Cell Tumors. Front Oncol 2022; 12:892207. [PMID: 35912202 PMCID: PMC9326399 DOI: 10.3389/fonc.2022.892207] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 06/09/2022] [Indexed: 01/18/2023] Open
Abstract
Traditional analysis of genomic data from bulk sequencing experiments seek to group and compare sample cohorts into biologically meaningful groups. To accomplish this task, large scale databases of patient-derived samples, like that of TCGA, have been established, giving the ability to interrogate multiple data modalities per tumor. We have developed a computational strategy employing multimodal integration paired with spectral clustering and modern dimension reduction techniques such as PHATE to provide a more robust method for cancer sub-type classification. Using this integrated approach, we have examined 514 Head and Neck Squamous Carcinoma (HNSC) tumor samples from TCGA across gene-expression, DNA-methylation, and microbiome data modalities. We show that these approaches, primarily developed for single-cell sequencing can be efficiently applied to bulk tumor sequencing data. Our multimodal analysis captures the dynamic heterogeneity, identifies new and refines subtypes of HNSC, and orders tumor samples along well-defined cellular trajectories. Collectively, these results showcase the inherent molecular complexity of tumors and offer insights into carcinogenesis and importance of targeted therapy. Computational techniques as highlighted in our study provide an organic and powerful approach to identify granular patterns in large and noisy datasets that may otherwise be overlooked.
Collapse
Affiliation(s)
- Jonathan E. Bard
- Department of Biochemistry, Jacobs School of Medicine and Biomedical Sciences, State University of New York at Buffalo, Buffalo, NY, United States,Genomics and Bioinformatics Core, Jacobs School of Medicine and Biomedical Sciences, State University of New York at Buffalo, Buffalo, NY, United States
| | - Norma J. Nowak
- Department of Biochemistry, Jacobs School of Medicine and Biomedical Sciences, State University of New York at Buffalo, Buffalo, NY, United States,Genomics and Bioinformatics Core, Jacobs School of Medicine and Biomedical Sciences, State University of New York at Buffalo, Buffalo, NY, United States
| | - Michael J. Buck
- Department of Biochemistry, Jacobs School of Medicine and Biomedical Sciences, State University of New York at Buffalo, Buffalo, NY, United States,Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, State University of New York at Buffalo, Buffalo, NY, United States,*Correspondence: Michael J. Buck, ; Satrajit Sinha,
| | - Satrajit Sinha
- Department of Biochemistry, Jacobs School of Medicine and Biomedical Sciences, State University of New York at Buffalo, Buffalo, NY, United States,*Correspondence: Michael J. Buck, ; Satrajit Sinha,
| |
Collapse
|
50
|
Mokhtari A, Porte B, Belzeaux R, Etain B, Ibrahim EC, Marie-Claire C, Lutz PE, Delahaye-Duriez A. The molecular pathophysiology of mood disorders: From the analysis of single molecular layers to multi-omic integration. Prog Neuropsychopharmacol Biol Psychiatry 2022; 116:110520. [PMID: 35104608 DOI: 10.1016/j.pnpbp.2022.110520] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 01/22/2022] [Accepted: 01/22/2022] [Indexed: 12/14/2022]
Abstract
Next-generation sequencing now enables the rapid and affordable production of reliable biological data at multiple molecular levels, collectively referred to as "omics". To maximize the potential for discovery, computational biologists have created and adapted integrative multi-omic analytical methods. When applied to diseases with traceable pathophysiology such as cancer, these new algorithms and statistical approaches have enabled the discovery of clinically relevant molecular mechanisms and biomarkers. In contrast, these methods have been much less applied to the field of molecular psychiatry, although diagnostic and prognostic biomarkers are similarly needed. In the present review, we first briefly summarize main findings from two decades of studies that investigated single molecular processes in relation to mood disorders. Then, we conduct a systematic review of multi-omic strategies that have been proposed and used more recently. We also list databases and types of data available to researchers for future work. Finally, we present the newest methodologies that have been employed for multi-omics integration in other medical fields, and discuss their potential for molecular psychiatry studies.
Collapse
Affiliation(s)
- Amazigh Mokhtari
- NeuroDiderot, Inserm U1141, Université de Paris, F-75019 Paris, France
| | - Baptiste Porte
- NeuroDiderot, Inserm U1141, Université de Paris, F-75019 Paris, France
| | - Raoul Belzeaux
- Aix Marseille Université CNRS, Institut de Neurosciences de la Timone, F-13005 Marseille, France; Fondation FondaMental, F-94000 Créteil, France; Assistance Publique Hôpitaux de Marseille, Pôle de psychiatrie, pédopsychiatrie et addictologie, F-13005 Marseille, France
| | - Bruno Etain
- Assistance Publique des Hôpitaux de Paris, GHU Lariboisière-Saint Louis-Fernand Widal, DMU Neurosciences, Département de psychiatrie et de Médecine Addictologique, F-75010 Paris, France; Université de Paris, INSERM UMR-S 1144, Optimisation thérapeutique en neuropsychopharmacologie, OTeN, F-75006 Paris, France
| | - El Cherif Ibrahim
- Aix Marseille Université CNRS, Institut de Neurosciences de la Timone, F-13005 Marseille, France
| | - Cynthia Marie-Claire
- Université de Paris, INSERM UMR-S 1144, Optimisation thérapeutique en neuropsychopharmacologie, OTeN, F-75006 Paris, France
| | - Pierre-Eric Lutz
- Centre National de la Recherche Scientifique, Université de Strasbourg, Fédération de Médecine Translationnelle de Strasbourg, Institut des Neurosciences Cellulaires et Intégratives UPR3212, F-67000 Strasbourg, France; Douglas Mental Health University Institute, McGill University, QC H4H 1R3 Montréal, Canada.
| | - Andrée Delahaye-Duriez
- NeuroDiderot, Inserm U1141, Université de Paris, F-75019 Paris, France; Assistance Publique des Hôpitaux de Paris, Unité de médecine génomique, Département BioPhaReS, Hôpital Jean Verdier, Hôpitaux Universitaires de Paris Seine Saint Denis, F-93140 Bondy, France; Université Sorbonne Paris Nord, F-93000 Bobigny, France.
| |
Collapse
|