1
|
Saha A, Chakraborty T, Rahimikollu J, Xiao H, de Oliveira LBP, Hand TW, Handali S, Secor WE, Fraga LAO, Fairley JK, Das J, Sarkar A. Deep humoral profiling coupled to interpretable machine learning unveils diagnostic markers and pathophysiology of schistosomiasis. Sci Transl Med 2024; 16:eadk7832. [PMID: 39292803 PMCID: PMC12033386 DOI: 10.1126/scitranslmed.adk7832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 02/27/2024] [Accepted: 08/27/2024] [Indexed: 09/20/2024]
Abstract
Schistosomiasis, a highly prevalent parasitic disease, affects more than 200 million people worldwide. Current diagnostics based on parasite egg detection in stool detect infection only at a late stage, and current antibody-based tests cannot distinguish past from current infection. Here, we developed and used a multiplexed antibody profiling platform to obtain a comprehensive repertoire of antihelminth humoral profiles including isotype, subclass, Fc receptor (FcR) binding, and glycosylation profiles of antigen-specific antibodies. Using Essential Regression (ER) and SLIDE, interpretable machine learning methods, we identified latent factors (context-specific groups) that move beyond biomarkers and provide insights into the pathophysiology of different stages of schistosome infection. By comparing profiles of infected and healthy individuals, we identified modules with unique humoral signatures of active disease, including hallmark signatures of parasitic infection such as elevated immunoglobulin G4 (IgG4). However, we also captured previously uncharacterized humoral responses including elevated FcR binding and specific antibody glycoforms in patients with active infection, helping distinguish them from those without active infection but with equivalent antibody titers. This signature was validated in an independent cohort. Our approach also uncovered two distinct endotypes, nonpatent infection and prior infection, in those who were not actively infected. Higher amounts of IgG1 and FcR1/FcR3A binding were also found to be likely protective of the transition from nonpatent to active infection. Overall, we unveiled markers for antibody-based diagnostics and latent factors underlying the pathogenesis of schistosome infection. Our results suggest that selective antigen targeting could be useful in early detection, thus controlling infection severity.
Collapse
Affiliation(s)
- Anushka Saha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30309, USA
| | - Trirupa Chakraborty
- Center for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
- Integrative Systems Biology Program, Pittsburgh, PA 15213, USA
| | - Javad Rahimikollu
- Center for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
- Joint CMU-Pitt Ph.D. Program in Computational Biology, Pittsburgh, PA 15213, USA
| | - Hanxi Xiao
- Center for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
- Joint CMU-Pitt Ph.D. Program in Computational Biology, Pittsburgh, PA 15213, USA
| | - Lorena B. Pereira de Oliveira
- Programa Multicêntrico de Bioquímica e Biologia Molecular (PMBqBM), Federal University of Juiz de Fora, Campus Governador Valadares, Juiz de Fora, Minas Gerais 36036-900, Brazil
- University Vale do Rio Doce, Governador Valadares, Minas Gerais 36036-900, Brazil
| | - Timothy W. Hand
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Sukwan Handali
- Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - W. Evan Secor
- Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Lucia A. O. Fraga
- Federal University of Juiz de Fora, Juiz de Fora, Minas Gerais 36036-900, Brazil
| | - Jessica K. Fairley
- Department of Medicine, Division of Infectious Diseases, Emory University School of Medicine, Atlanta, GA 30307, USA
| | - Jishnu Das
- Center for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Aniruddh Sarkar
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30309, USA
| |
Collapse
|
2
|
Maity T, Balachandran AK, Krishnamurthy LP, Nagar KL, Upadhyayula RS, Sengupta S, Maiti PK. Data-Driven Approaches to Predict Dendrimer Cytotoxicity. ACS OMEGA 2024; 9:24899-24906. [PMID: 38882163 PMCID: PMC11173563 DOI: 10.1021/acsomega.4c01775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/10/2024] [Accepted: 05/17/2024] [Indexed: 06/18/2024]
Abstract
Dendrimers are employed as functional elements in contrast agents and are proposed as nontoxic vehicles for drug delivery. Toxicity is a property that is to be evaluated for this novel class of bionanomaterials for in vivo applications. The current research is hampered due to the lack of structured data sets for toxicity studies for dendrimers. In this work, we have built a data set by curating literature for toxicity data and augmented it with structural and physicochemical features. We present a comprehensive, feature-rich database of dendrimer toxicity measured across various cell lines for prediction, design, and optimization studies. We have also explored novel computational approaches for predicting dendrimer cytotoxicity. We demonstrate superior outcomes for toxicity prediction using essential regression in the space of small data sets.
Collapse
Affiliation(s)
- Tarun Maity
- Centre for Condensed Matter Theory, Department of Physics, Indian Institute of Science, Bengaluru 560012, India
| | - Anandu K Balachandran
- Accenture Labs, Technology & Innovation, Ecospace, Bellandur, Bengaluru 560087, India
| | | | - Karthik L Nagar
- Accenture Labs, Technology & Innovation, Ecospace, Bellandur, Bengaluru 560087, India
| | | | - Shubhashis Sengupta
- Accenture Labs, Technology & Innovation, Ecospace, Bellandur, Bengaluru 560087, India
| | - Prabal K Maiti
- Centre for Condensed Matter Theory, Department of Physics, Indian Institute of Science, Bengaluru 560012, India
| |
Collapse
|
3
|
Rahimikollu J, Xiao H, Rosengart A, Rosen ABI, Tabib T, Zdinak PM, He K, Bing X, Bunea F, Wegkamp M, Poholek AC, Joglekar AV, Lafyatis RA, Das J. SLIDE: Significant Latent Factor Interaction Discovery and Exploration across biological domains. Nat Methods 2024; 21:835-845. [PMID: 38374265 PMCID: PMC11588359 DOI: 10.1038/s41592-024-02175-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 01/09/2024] [Indexed: 02/21/2024]
Abstract
Modern multiomic technologies can generate deep multiscale profiles. However, differences in data modalities, multicollinearity of the data, and large numbers of irrelevant features make analyses and integration of high-dimensional omic datasets challenging. Here we present Significant Latent Factor Interaction Discovery and Exploration (SLIDE), a first-in-class interpretable machine learning technique for identifying significant interacting latent factors underlying outcomes of interest from high-dimensional omic datasets. SLIDE makes no assumptions regarding data-generating mechanisms, comes with theoretical guarantees regarding identifiability of the latent factors/corresponding inference, and has rigorous false discovery rate control. Using SLIDE on single-cell and spatial omic datasets, we uncovered significant interacting latent factors underlying a range of molecular, cellular and organismal phenotypes. SLIDE outperforms/performs at least as well as a wide range of state-of-the-art approaches, including other latent factor approaches. More importantly, it provides biological inference beyond prediction that other methods do not afford. Thus, SLIDE is a versatile engine for biological discovery from modern multiomic datasets.
Collapse
Affiliation(s)
- Javad Rahimikollu
- Center for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
- Joint CMU-Pitt PhD Program in Computational Biology, Pittsburgh, PA, USA
| | - Hanxi Xiao
- Center for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
- Joint CMU-Pitt PhD Program in Computational Biology, Pittsburgh, PA, USA
| | - AnnaElaine Rosengart
- Center for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Aaron B I Rosen
- Center for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
- Joint CMU-Pitt PhD Program in Computational Biology, Pittsburgh, PA, USA
| | - Tracy Tabib
- Division of Rheumatology and Clinical Immunology, Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Paul M Zdinak
- Center for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Kun He
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xin Bing
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Florentina Bunea
- Department of Statistics and Data Science, Cornell University, Ithaca, NY, USA
| | - Marten Wegkamp
- Department of Statistics and Data Science, Cornell University, Ithaca, NY, USA
- Department of Mathematics, Cornell University, Ithaca, NY, USA
| | - Amanda C Poholek
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Alok V Joglekar
- Center for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Robert A Lafyatis
- Division of Rheumatology and Clinical Immunology, Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Jishnu Das
- Center for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
4
|
Mallick H, Porwal A, Saha S, Basak P, Svetnik V, Paul E. An integrated Bayesian framework for multi-omics prediction and classification. Stat Med 2024; 43:983-1002. [PMID: 38146838 DOI: 10.1002/sim.9953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 10/06/2023] [Accepted: 10/24/2023] [Indexed: 12/27/2023]
Abstract
With the growing commonality of multi-omics datasets, there is now increasing evidence that integrated omics profiles lead to more efficient discovery of clinically actionable biomarkers that enable better disease outcome prediction and patient stratification. Several methods exist to perform host phenotype prediction from cross-sectional, single-omics data modalities but decentralized frameworks that jointly analyze multiple time-dependent omics data to highlight the integrative and dynamic impact of repeatedly measured biomarkers are currently limited. In this article, we propose a novel Bayesian ensemble method to consolidate prediction by combining information across several longitudinal and cross-sectional omics data layers. Unlike existing frequentist paradigms, our approach enables uncertainty quantification in prediction as well as interval estimation for a variety of quantities of interest based on posterior summaries. We apply our method to four published multi-omics datasets and demonstrate that it recapitulates known biology in addition to providing novel insights while also outperforming existing methods in estimation, prediction, and uncertainty quantification. Our open-source software is publicly available at https://github.com/himelmallick/IntegratedLearner.
Collapse
Affiliation(s)
- Himel Mallick
- Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, 10065, New York, USA
- Department of Statistics and Data Science, Cornell University, Ithaca, New York, USA
| | - Anupreet Porwal
- Department of Statistics, University of Washington, Seattle, Washington, USA
| | - Satabdi Saha
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Piyali Basak
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, USA
| | - Vladimir Svetnik
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, USA
| | - Erina Paul
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, USA
| |
Collapse
|
5
|
Xiao H, Rosen A, Chhibbar P, Moise L, Das J. From bench to bedside via bytes: Multi-omic immunoprofiling and integration using machine learning and network approaches. Hum Vaccin Immunother 2023; 19:2282803. [PMID: 38100557 PMCID: PMC10730168 DOI: 10.1080/21645515.2023.2282803] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 11/09/2023] [Indexed: 12/17/2023] Open
Abstract
A significant surge in research endeavors leverages the vast potential of high-throughput omic technology platforms for broad profiling of biological responses to vaccines and cutting-edge immunotherapies and stem-cell therapies under development. These profiles capture different aspects of core regulatory and functional processes at different scales of resolution from molecular and cellular to organismal. Systems approaches capture the complex and intricate interplay between these layers and scales. Here, we summarize experimental data modalities, for characterizing the genome, epigenome, transcriptome, proteome, metabolome, and antibody-ome, that enable us to generate large-scale immune profiles. We also discuss machine learning and network approaches that are commonly used to analyze and integrate these modalities, to gain insights into correlates and mechanisms of natural and vaccine-mediated immunity as well as therapy-induced immunomodulation.
Collapse
Affiliation(s)
- Hanxi Xiao
- Center for Systems Immunology, Departments of Immunology and Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Aaron Rosen
- Center for Systems Immunology, Departments of Immunology and Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Prabal Chhibbar
- Center for Systems Immunology, Departments of Immunology and Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Jishnu Das
- Center for Systems Immunology, Departments of Immunology and Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
6
|
Farkona S, Pastrello C, Konvalinka A. Proteomics: Its Promise and Pitfalls in Shaping Precision Medicine in Solid Organ Transplantation. Transplantation 2023; 107:2126-2142. [PMID: 36808112 DOI: 10.1097/tp.0000000000004539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Abstract
Solid organ transplantation is an established treatment of choice for end-stage organ failure. However, all transplant patients are at risk of developing complications, including allograft rejection and death. Histological analysis of graft biopsy is still the gold standard for evaluation of allograft injury, but it is an invasive procedure and prone to sampling errors. The past decade has seen an increased number of efforts to develop minimally invasive procedures for monitoring allograft injury. Despite the recent progress, limitations such as the complexity of proteomics-based technology, the lack of standardization, and the heterogeneity of populations that have been included in different studies have hindered proteomic tools from reaching clinical transplantation. This review focuses on the role of proteomics-based platforms in biomarker discovery and validation in solid organ transplantation. We also emphasize the value of biomarkers that provide potential mechanistic insights into the pathophysiology of allograft injury, dysfunction, or rejection. Additionally, we forecast that the growth of publicly available data sets, combined with computational methods that effectively integrate them, will facilitate a generation of more informed hypotheses for potential subsequent evaluation in preclinical and clinical studies. Finally, we illustrate the value of combining data sets through the integration of 2 independent data sets that pinpointed hub proteins in antibody-mediated rejection.
Collapse
Affiliation(s)
- Sofia Farkona
- Toronto General Hospital Research Institute, University Health Network, Toronto, ON, Canada
- Soham and Shaila Ajmera Family Transplant Centre, University Health Network, Toronto, ON, Canada
| | - Chiara Pastrello
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute University Health Network, Toronto, ON, Canada
- Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, Toronto Western Hospital, University Health Network, Toronto, ON, Canada
| | - Ana Konvalinka
- Toronto General Hospital Research Institute, University Health Network, Toronto, ON, Canada
- Soham and Shaila Ajmera Family Transplant Centre, University Health Network, Toronto, ON, Canada
- Department of Medicine, Division of Nephrology, University Health Network, Toronto, ON, Canada
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Institute of Medical Science, University of Toronto, Toronto, ON, Canada
- Canadian Donation and Transplantation Research Program, Edmonton, AB, Canada
| |
Collapse
|
7
|
Wu J, Moheimani H, Li S, Kar UK, Bonaroti J, Miller RS, Daley BJ, Harbrecht BG, Claridge JA, Gruen DS, Phelan HA, Guyette FX, Neal MD, Das J, Sperry JL, Billiar TR. High Dimensional Multiomics Reveals Unique Characteristics of Early Plasma Administration in Polytrauma Patients With TBI. Ann Surg 2022; 276:673-683. [PMID: 35861072 PMCID: PMC9463104 DOI: 10.1097/sla.0000000000005610] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES The authors sought to identify causal factors that explain the selective benefit of prehospital administration of thawed plasma (TP) in traumatic brain injury (TBI) patients using mediation analysis of a multiomic database. BACKGROUND The Prehospital Air Medical Plasma (PAMPer) Trial showed that patients with TBI and a pronounced systemic response to injury [defined as endotype 2 (E2)], have a survival benefit from prehospital administration of TP. An interrogation of high dimensional proteomics, lipidomics and metabolomics previously demonstrated unique patterns in circulating biomarkers in patients receiving prehospital TP, suggesting that a deeper analysis could reveal causal features specific to TBI patients. METHODS A novel proteomic database (SomaLogic Inc., aptamer-based assay, 7K platform) was generated using admission blood samples from a subset of patients (n=149) from the PAMPer Trial. This proteomic dataset was combined with previously reported metabolomic and lipidomic datasets from these same patients. A 2-step analysis was performed to identify factors that promote survival in E2-TBI patients who had received early TP. First, features were selected using both linear and multivariate-latent-factor regression analyses. Then, the selected features were entered into the causal mediation analysis. RESULTS Causal mediation analysis of observable features identified 16 proteins and 41 lipids with a high proportion of mediated effect (>50%) to explain the survival benefit of early TP in E2-TBI patients. The multivariate latent-factor regression analyses also uncovered 5 latent clusters of features with a proportion effect >30%, many in common with the observable features. Among the observable and latent features were protease inhibitors known to inhibit activated protein C and block fibrinolysis (SERPINA5 and CPB2), a clotting factor (factor XI), as well as proteins involved in lipid transport and metabolism (APOE3 and sPLA(2)-XIIA). CONCLUSIONS These findings suggest that severely injured patients with TBI process exogenous plasma differently than those without TBI. The beneficial effects of early TP in E2-TBI patients may be the result of improved blood clotting and the effect of brain protective factors independent of coagulation.
Collapse
Affiliation(s)
- Junru Wu
- Department of Cardiology, The 3rd Xiangya Hospital, Central South University, Changsha, China
- Department of Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Pittsburgh Trauma Research Center, Division of Trauma and Acute Care Surgery, Pittsburgh, Pennsylvania, USA
| | - Hamed Moheimani
- Department of Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Pittsburgh Trauma Research Center, Division of Trauma and Acute Care Surgery, Pittsburgh, Pennsylvania, USA
| | - Shimena Li
- Department of Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Pittsburgh Trauma Research Center, Division of Trauma and Acute Care Surgery, Pittsburgh, Pennsylvania, USA
| | - Upendra K. Kar
- Department of Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Pittsburgh Trauma Research Center, Division of Trauma and Acute Care Surgery, Pittsburgh, Pennsylvania, USA
| | - Jillian Bonaroti
- Department of Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Pittsburgh Trauma Research Center, Division of Trauma and Acute Care Surgery, Pittsburgh, Pennsylvania, USA
| | | | - Brian J. Daley
- Department of Surgery, University of Tennessee Health Science Center, Knoxville, TN, USA
| | | | - Jeffrey A. Claridge
- Metro Health Medical Center, Case Western Reserve University, Cleveland, OH, USA
| | - Danielle S. Gruen
- Department of Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Pittsburgh Trauma Research Center, Division of Trauma and Acute Care Surgery, Pittsburgh, Pennsylvania, USA
| | - Herbert A. Phelan
- Department of Surgery, University Medical Center-New Orleans Burn Program, New Orleans, LA, USA
| | - Francis X. Guyette
- Department of Emergency Medicine, Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Matthew D. Neal
- Department of Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Pittsburgh Trauma Research Center, Division of Trauma and Acute Care Surgery, Pittsburgh, Pennsylvania, USA
| | - Jishnu Das
- Center for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh School of Medicine. Pittsburgh, Pennsylvania, USA
| | - Jason L. Sperry
- Department of Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Pittsburgh Trauma Research Center, Division of Trauma and Acute Care Surgery, Pittsburgh, Pennsylvania, USA
| | - Timothy R. Billiar
- Department of Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Pittsburgh Trauma Research Center, Division of Trauma and Acute Care Surgery, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
8
|
Jia M, Yuan DY, Lovelace TC, Hu M, Benos PV. Causal Discovery in High-dimensional, Multicollinear Datasets. FRONTIERS IN EPIDEMIOLOGY 2022; 2:899655. [PMID: 36778756 PMCID: PMC9910507 DOI: 10.3389/fepid.2022.899655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 08/05/2022] [Indexed: 11/13/2022]
Abstract
As the cost of high-throughput genomic sequencing technology declines, its application in clinical research becomes increasingly popular. The collected datasets often contain tens or hundreds of thousands of biological features that need to be mined to extract meaningful information. One area of particular interest is discovering underlying causal mechanisms of disease outcomes. Over the past few decades, causal discovery algorithms have been developed and expanded to infer such relationships. However, these algorithms suffer from the curse of dimensionality and multicollinearity. A recently introduced, non-orthogonal, general empirical Bayes approach to matrix factorization has been demonstrated to successfully infer latent factors with interpretable structures from observed variables. We hypothesize that applying this strategy to causal discovery algorithms can solve both the high dimensionality and collinearity problems, inherent to most biomedical datasets. We evaluate this strategy on simulated data and apply it to two real-world datasets. In a breast cancer dataset, we identified important survival-associated latent factors and biologically meaningful enriched pathways within factors related to important clinical features. In a SARS-CoV-2 dataset, we were able to predict whether a patient (1) had Covid-19 and (2) would enter the ICU. Furthermore, we were able to associate factors with known Covid-19 related biological pathways.
Collapse
Affiliation(s)
- Minxue Jia
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
| | - Daniel Y. Yuan
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
| | - Tyler C. Lovelace
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
| | - Mengying Hu
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
| | - Panayiotis V. Benos
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
- Department of Epidemiology, University of Florida, Gainesville, FL, United States
| |
Collapse
|