1
|
Strobl EV, Gamazon E. Discovering root causal genes with high-throughput perturbations. eLife 2025; 13:RP100949. [PMID: 40042510 PMCID: PMC11882141 DOI: 10.7554/elife.100949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2025] Open
Abstract
Root causal gene expression levels - or root causal genes for short - correspond to the initial changes to gene expression that generate patient symptoms as a downstream effect. Identifying root causal genes is critical towards developing treatments that modify disease near its onset, but no existing algorithms attempt to identify root causal genes from data. RNA-sequencing (RNA-seq) data introduces challenges such as measurement error, high dimensionality and non-linearity that compromise accurate estimation of root causal effects even with state-of-the-art approaches. We therefore instead leverage Perturb-seq, or high-throughput perturbations with single-cell RNA-seq readout, to learn the causal order between the genes. We then transfer the causal order to bulk RNA-seq and identify root causal genes specific to a given patient for the first time using a novel statistic. Experiments demonstrate large improvements in performance. Applications to macular degeneration and multiple sclerosis also reveal root causal genes that lie on known pathogenic pathways, delineate patient subgroups and implicate a newly defined omnigenic root causal model.
Collapse
Affiliation(s)
| | - Eric Gamazon
- Vanderbilt University Medical CenterNashvilleUnited States
| |
Collapse
|
2
|
Lovelace TC, Ryu MH, Jia M, Castaldi P, Sciurba FC, Hersh CP, Benos PV. Development and validation of a mortality risk prediction model for chronic obstructive pulmonary disease: a cross-sectional study using probabilistic graphical modelling. EClinicalMedicine 2024; 75:102786. [PMID: 39263674 PMCID: PMC11388367 DOI: 10.1016/j.eclinm.2024.102786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 07/22/2024] [Accepted: 07/26/2024] [Indexed: 09/13/2024] Open
Abstract
Background Chronic Obstructive Pulmonary Disease (COPD) is a leading cause of mortality. Predicting mortality risk in patients with COPD can be important for disease management strategies. Although all-cause mortality predictors have been developed previously, limited research exists on factors directly affecting COPD-specific mortality. Methods In a retrospective study, we used probabilistic graphs to analyse clinical cross-sectional data (COPDGene cohort), including demographics, spirometry, quantitative chest imaging, and symptom features, as well as gene expression data. COPDGene recruited current and former smokers, aged 45-80 years with >10 pack-years smoking history, from across the USA (Phase 1, 11/2007-4/2011) and invited them for a follow-up visit (Phase 2, 7/2013-7/2017). ECLIPSE cohort recruited current and former smokers (COPD patients and controls from USA and Europe), aged 45-80 with smoking history >10 pack-years (12/2005-11/2007). We applied graphical models on multi-modal data COPDGene Phase 1 participants to identify factors directly affecting all-cause and COPD-specific mortality (primary outcomes); and on Phase 2 follow-up cohort to identify additional molecular and social factors affecting mortality. We used penalized Cox regression with features selected by the causal graph to build VAPORED, a mortality risk prediction model. VAPORED was compared to existing scores (BODE: BMI, airflow obstruction, dyspnoea, exercise capacity; ADO: age, dyspnoea, airflow obstruction) on the ability to rank individuals by mortality risk, using four evaluation metrics (concordance, concordance probability estimate (CPE), cumulative/dynamic (C/D) area under the receiver operating characteristic curve (AUC), and integrated C/D AUC). The results were validated in ECLIPSE. Findings Graphical models, applied on the COPDGene Phase 1 samples (n = 8610), identified 11 and 7 variables directly linked to all-cause and COPD-specific mortality, respectively. Although many appear in both models, non-lung comorbidities appear only in the all-cause model, while forced vital capacity (FVC %predicted) appears in COPD-specific mortality model only. Additionally, the graph model of Phase 2 data (n = 3182) identified internet access, CD4 T cells and platelets to be linked to lower mortality risk. Furthermore, using the 7 variables linked to COPD-specific mortality (forced expiratory volume in 1 s/forced vital capacity (FEV1/FVC) ration, FVC %predicted, age, history of pneumonia, oxygen saturation, 6-min walk distance, dyspnoea) we developed VAPORED mortality risk score, which we validated on the ECLIPSE cohort (3-yr all-cause mortality data, n = 2312). VAPORED performed significantly better than ADO, BODE, and updated BODE indices in predicting all-cause mortality in ECLIPSE in terms of concordance (VAPORED [0.719] vs ADO [0.693; FDR p-value 0.014], BODE [0.695; FDR p-value 0.020], and updated BODE [0.694; FDR p-value 0.021]); CPE (VAPORED [0.714] vs ADO [0.673; FDR p-value <0.0001], BODE [0.662; FDR p-value <0.0001], and updated BODE [0.646; FDR p-value <0.0001]); 3-year C/D AUC (VAPORED [0.728] vs ADO [0.702; FDR p-value 0.017], BODE [0.704; FDR p-value 0.021], and updated BODE [0.703; FDR p-value 0.024]); integrated C/D AUC (VAPORED [0.723] vs ADO [0.698; FDR p-value 0.047], BODE [0.695; FDR p-value 0.024], and updated BODE [0.690; FDR p-value 0.021]). Finally, we developed a web tool to help clinicians calculate VAPORED mortality risk and compare it to ADO and BODE predictions. Interpretation Our work is an important step towards improving our identification of high-risk patients and generating hypotheses of potential biological mechanisms and social factors driving mortality in patients with COPD at the population level. The main limitation of our study is the fact that the analysed datasets consist of older people with extensive smoking history and limited racial diversity. Thus, the results are relevant to high-risk individuals or those diagnosed with COPD and the VAPORED score is validated for them. Funding This research was supported by NIH [NHLBI, NLM]. The COPDGene study is supported by the COPD Foundation, through grants from AstraZeneca, Bayer Pharmaceuticals, Boehringer Ingelheim, Genentech, GlaxoSmithKline, Novartis, Pfizer and Sunovion.
Collapse
Affiliation(s)
- Tyler C. Lovelace
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Joint CMU-Pitt PhD Program in Computational Biology, Pittsburgh, PA, USA
| | - Min Hyung Ryu
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Minxue Jia
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Joint CMU-Pitt PhD Program in Computational Biology, Pittsburgh, PA, USA
| | - Peter Castaldi
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Frank C. Sciurba
- Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Craig P. Hersh
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Panayiotis V. Benos
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Joint CMU-Pitt PhD Program in Computational Biology, Pittsburgh, PA, USA
- Department of Epidemiology, University of Florida, Gainesville, FL, USA
| |
Collapse
|
3
|
Kernfeld E, Keener R, Cahan P, Battle A. Transcriptome data are insufficient to control false discoveries in regulatory network inference. Cell Syst 2024; 15:709-724.e13. [PMID: 39173585 PMCID: PMC11642480 DOI: 10.1016/j.cels.2024.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 05/31/2024] [Accepted: 07/22/2024] [Indexed: 08/24/2024]
Abstract
Inference of causal transcriptional regulatory networks (TRNs) from transcriptomic data suffers notoriously from false positives. Approaches to control the false discovery rate (FDR), for example, via permutation, bootstrapping, or multivariate Gaussian distributions, suffer from several complications: difficulty in distinguishing direct from indirect regulation, nonlinear effects, and causal structure inference requiring "causal sufficiency," meaning experiments that are free of any unmeasured, confounding variables. Here, we use a recently developed statistical framework, model-X knockoffs, to control the FDR while accounting for indirect effects, nonlinear dose-response, and user-provided covariates. We adjust the procedure to estimate the FDR correctly even when measured against incomplete gold standards. However, benchmarking against chromatin immunoprecipitation (ChIP) and other gold standards reveals higher observed than reported FDR. This indicates that unmeasured confounding is a major driver of FDR in TRN inference. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Eric Kernfeld
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA
| | - Rebecca Keener
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA
| | - Patrick Cahan
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA; Institute for Cell Engineering, Johns Hopkins Medicine, Baltimore, MD, USA; Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA.
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA; Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA; Department of Genetic Medicine, Johns Hopkins Medicine, Baltimore, MD, USA; Malone Center for Engineering and Healthcare, Johns Hopkins University, Baltimore, MD, USA; Data Science and AI Institute, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
4
|
Zhao Y, Ansarullah, Kumar P, Mahoney JM, He H, Baker C, George J, Li S. Causal network perturbation analysis identifies known and novel type-2 diabetes driver genes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.22.595431. [PMID: 38826370 PMCID: PMC11142180 DOI: 10.1101/2024.05.22.595431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
The molecular pathogenesis of diabetes is multifactorial, involving genetic predisposition and environmental factors that are not yet fully understood. However, pancreatic β-cell failure remains among the primary reasons underlying the progression of type-2 diabetes (T2D) making targeting β-cell dysfunction an attractive pathway for diabetes treatment. To identify genetic contributors to β-cell dysfunction, we investigated single-cell gene expression changes in β-cells from healthy (C57BL/6J) and diabetic (NZO/HlLtJ) mice fed with normal or high-fat, high-sugar diet (HFHS). Our study presents an innovative integration of the causal network perturbation assessment (ssNPA) framework with meta-cell transcriptome analysis to explore the genetic underpinnings of type-2 diabetes (T2D). By generating a reference causal network and in silico perturbation, we identified novel genes implicated in T2D and validated our candidates using the Knockout Mouse Phenotyping (KOMP) Project database.
Collapse
Affiliation(s)
- Yue Zhao
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Ansarullah
- Center for Biometric Analysis, The Jackson Laboratory, Bar Harbor, ME, USA
| | - Parveen Kumar
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Hao He
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Candice Baker
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Joshy George
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Sheng Li
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut School of Medicine, Farmington, CT, USA
| |
Collapse
|
5
|
Deschildre J, Vandemoortele B, Loers JU, De Preter K, Vermeirssen V. Evaluation of single-sample network inference methods for precision oncology. NPJ Syst Biol Appl 2024; 10:18. [PMID: 38360881 PMCID: PMC10869342 DOI: 10.1038/s41540-024-00340-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 01/17/2024] [Indexed: 02/17/2024] Open
Abstract
A major challenge in precision oncology is to detect targetable cancer vulnerabilities in individual patients. Modeling high-throughput omics data in biological networks allows identifying key molecules and processes of tumorigenesis. Traditionally, network inference methods rely on many samples to contain sufficient information for learning, resulting in aggregate networks. However, to implement patient-tailored approaches in precision oncology, we need to interpret omics data at the level of individual patients. Several single-sample network inference methods have been developed that infer biological networks for an individual sample from bulk RNA-seq data. However, only a limited comparison of these methods has been made and many methods rely on 'normal tissue' samples as reference, which are not always available. Here, we conducted an evaluation of the single-sample network inference methods SSN, LIONESS, SWEET, iENA, CSN and SSPGI using transcriptomic profiles of lung and brain cancer cell lines from the CCLE database. The methods constructed functional gene networks with distinct network characteristics. Hub gene analyses revealed different degrees of subtype-specificity across methods. Single-sample networks were able to distinguish between tumor subtypes, as exemplified by node strength clustering, enrichment of known subtype-specific driver genes among hubs and differential node strength. We also showed that single-sample networks correlated better to other omics data from the same cell line as compared to aggregate networks. We conclude that single-sample network inference methods can reflect sample-specific biology when 'normal tissue' samples are absent and we point out peculiarities of each method.
Collapse
Affiliation(s)
- Joke Deschildre
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Boris Vandemoortele
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Jens Uwe Loers
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Katleen De Preter
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Lab of Translational Onco-genomics and Bio-informatics, Center for Medical Biotechnology (VIB-UGent), Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Vanessa Vermeirssen
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium.
- Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium.
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium.
| |
Collapse
|
6
|
Buschur KL, Riley C, Saferali A, Castaldi P, Zhang G, Aguet F, Ardlie KG, Durda P, Craig Johnson W, Kasela S, Liu Y, Manichaikul A, Rich SS, Rotter JI, Smith J, Taylor KD, Tracy RP, Lappalainen T, Graham Barr R, Sciurba F, Hersh CP, Benos PV. Distinct COPD subtypes in former smokers revealed by gene network perturbation analysis. Respir Res 2023; 24:30. [PMID: 36698131 PMCID: PMC9875487 DOI: 10.1186/s12931-023-02316-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 01/05/2023] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Chronic obstructive pulmonary disease (COPD) varies significantly in symptomatic and physiologic presentation. Identifying disease subtypes from molecular data, collected from easily accessible blood samples, can help stratify patients and guide disease management and treatment. METHODS Blood gene expression measured by RNA-sequencing in the COPDGene Study was analyzed using a network perturbation analysis method. Each COPD sample was compared against a learned reference gene network to determine the part that is deregulated. Gene deregulation values were used to cluster the disease samples. RESULTS The discovery set included 617 former smokers from COPDGene. Four distinct gene network subtypes are identified with significant differences in symptoms, exercise capacity and mortality. These clusters do not necessarily correspond with the levels of lung function impairment and are independently validated in two external cohorts: 769 former smokers from COPDGene and 431 former smokers in the Multi-Ethnic Study of Atherosclerosis (MESA). Additionally, we identify several genes that are significantly deregulated across these subtypes, including DSP and GSTM1, which have been previously associated with COPD through genome-wide association study (GWAS). CONCLUSIONS The identified subtypes differ in mortality and in their clinical and functional characteristics, underlining the need for multi-dimensional assessment potentially supplemented by selected markers of gene expression. The subtypes were consistent across cohorts and could be used for new patient stratification and disease prognosis.
Collapse
Affiliation(s)
- Kristina L Buschur
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Joint CMU-Pitt PhD Program in Computational Biology, Pittsburgh, PA, USA
- Division of General Medicine, Columbia University Medical Center, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | - Craig Riley
- Division of Pulmonary Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Aabida Saferali
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Peter Castaldi
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Grace Zhang
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Francois Aguet
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Peter Durda
- Department of Pathology and Laboratory Medicine, Larner College of Medicine, University of Vermont, Burlington, VT, USA
| | - W Craig Johnson
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Silva Kasela
- New York Genome Center, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Yongmei Liu
- Department of Medicine, Division of Cardiology, Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Josh Smith
- Northwest Genome Center, University of Washington, Seattle, WA, USA
| | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Russell P Tracy
- Department of Pathology and Laboratory Medicine, Larner College of Medicine, University of Vermont, Burlington, VT, USA
- Department of Biochemistry, Larner College of Medicine, University of Vermont, Burlington, VT, USA
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - R Graham Barr
- Division of General Medicine, Columbia University Medical Center, New York, NY, USA
| | - Frank Sciurba
- Division of Pulmonary Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Craig P Hersh
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Panayiotis V Benos
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
- Joint CMU-Pitt PhD Program in Computational Biology, Pittsburgh, PA, USA.
- Department of Epidemiology, University of Florida, 2004 Mowry Rd, Gainesville, FL, 32603, USA.
| |
Collapse
|
7
|
Jia M, Yuan DY, Lovelace TC, Hu M, Benos PV. Causal Discovery in High-dimensional, Multicollinear Datasets. FRONTIERS IN EPIDEMIOLOGY 2022; 2:899655. [PMID: 36778756 PMCID: PMC9910507 DOI: 10.3389/fepid.2022.899655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 08/05/2022] [Indexed: 11/13/2022]
Abstract
As the cost of high-throughput genomic sequencing technology declines, its application in clinical research becomes increasingly popular. The collected datasets often contain tens or hundreds of thousands of biological features that need to be mined to extract meaningful information. One area of particular interest is discovering underlying causal mechanisms of disease outcomes. Over the past few decades, causal discovery algorithms have been developed and expanded to infer such relationships. However, these algorithms suffer from the curse of dimensionality and multicollinearity. A recently introduced, non-orthogonal, general empirical Bayes approach to matrix factorization has been demonstrated to successfully infer latent factors with interpretable structures from observed variables. We hypothesize that applying this strategy to causal discovery algorithms can solve both the high dimensionality and collinearity problems, inherent to most biomedical datasets. We evaluate this strategy on simulated data and apply it to two real-world datasets. In a breast cancer dataset, we identified important survival-associated latent factors and biologically meaningful enriched pathways within factors related to important clinical features. In a SARS-CoV-2 dataset, we were able to predict whether a patient (1) had Covid-19 and (2) would enter the ICU. Furthermore, we were able to associate factors with known Covid-19 related biological pathways.
Collapse
Affiliation(s)
- Minxue Jia
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
| | - Daniel Y. Yuan
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
| | - Tyler C. Lovelace
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
| | - Mengying Hu
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
| | - Panayiotis V. Benos
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
- Department of Epidemiology, University of Florida, Gainesville, FL, United States
| |
Collapse
|
8
|
Qi Y, Su B, Lin X, Zhou H. A New Feature Selection Method Based on Feature Distinguishing Ability and Network Influence. J Biomed Inform 2022; 128:104048. [DOI: 10.1016/j.jbi.2022.104048] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 02/04/2022] [Accepted: 03/01/2022] [Indexed: 12/18/2022]
|
9
|
Maudsley S, Leysen H, van Gastel J, Martin B. Systems Pharmacology: Enabling Multidimensional Therapeutics. COMPREHENSIVE PHARMACOLOGY 2022:725-769. [DOI: 10.1016/b978-0-12-820472-6.00017-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
10
|
Ji Y, Lotfollahi M, Wolf FA, Theis FJ. Machine learning for perturbational single-cell omics. Cell Syst 2021; 12:522-537. [PMID: 34139164 DOI: 10.1016/j.cels.2021.05.016] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 05/04/2021] [Accepted: 05/19/2021] [Indexed: 12/18/2022]
Abstract
Cell biology is fundamentally limited in its ability to collect complete data on cellular phenotypes and the wide range of responses to perturbation. Areas such as computer vision and speech recognition have addressed this problem of characterizing unseen or unlabeled conditions with the combined advances of big data, deep learning, and computing resources in the past 5 years. Similarly, recent advances in machine learning approaches enabled by single-cell data start to address prediction tasks in perturbation response modeling. We first define objectives in learning perturbation response in single-cell omics; survey existing approaches, resources, and datasets (https://github.com/theislab/sc-pert); and discuss how a perturbation atlas can enable deep learning models to construct an informative perturbation latent space. We then examine future avenues toward more powerful and explainable modeling using deep neural networks, which enable the integration of disparate information sources and an understanding of heterogeneous, complex, and unseen systems.
Collapse
Affiliation(s)
- Yuge Ji
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany; Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany; TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - F Alexander Wolf
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany; Cellarity, Cambridge, MA, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany; Department of Mathematics, Technical University of Munich, Munich, Germany; Cellarity, Cambridge, MA, USA.
| |
Collapse
|
11
|
Detection of differentially abundant cell subpopulations in scRNA-seq data. Proc Natl Acad Sci U S A 2021; 118:2100293118. [PMID: 34001664 DOI: 10.1073/pnas.2100293118] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Comprehensive and accurate comparisons of transcriptomic distributions of cells from samples taken from two different biological states, such as healthy versus diseased individuals, are an emerging challenge in single-cell RNA sequencing (scRNA-seq) analysis. Current methods for detecting differentially abundant (DA) subpopulations between samples rely heavily on initial clustering of all cells in both samples. Often, this clustering step is inadequate since the DA subpopulations may not align with a clear cluster structure, and important differences between the two biological states can be missed. Here, we introduce DA-seq, a targeted approach for identifying DA subpopulations not restricted to clusters. DA-seq is a multiscale method that quantifies a local DA measure for each cell, which is computed from its k nearest neighboring cells across a range of k values. Based on this measure, DA-seq delineates contiguous significant DA subpopulations in the transcriptomic space. We apply DA-seq to several scRNA-seq datasets and highlight its improved ability to detect differences between distinct phenotypes in severe versus mildly ill COVID-19 patients, melanomas subjected to immune checkpoint therapy comparing responders to nonresponders, embryonic development at two time points, and young versus aging brain tissue. DA-seq enabled us to detect differences between these phenotypes. Importantly, we find that DA-seq not only recovers the DA cell types as discovered in the original studies but also reveals additional DA subpopulations that were not described before. Analysis of these subpopulations yields biological insights that would otherwise be undetected using conventional computational approaches.
Collapse
|
12
|
Jahagirdar S, Saccenti E. Evaluation of Single Sample Network Inference Methods for Metabolomics-Based Systems Medicine. J Proteome Res 2020; 20:932-949. [PMID: 33267585 PMCID: PMC7786380 DOI: 10.1021/acs.jproteome.0c00696] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
![]()
Networks
and network analyses are fundamental tools of systems
biology. Networks are built by inferring pair-wise relationships among
biological entities from a large number of samples such that subject-specific
information is lost. The possibility of constructing these sample
(individual)-specific networks from single molecular profiles might
offer new insights in systems and personalized medicine and as a consequence
is attracting more and more research interest. In this study, we evaluated
and compared LIONESS (Linear Interpolation to Obtain Network Estimates
for Single Samples) and ssPCC (single sample network based on Pearson
correlation) in the metabolomics context of metabolite–metabolite
association networks. We illustrated and explored the characteristics
of these two methods on (i) simulated data, (ii) data generated from
a dynamic metabolic model to simulate real-life observed metabolite
concentration profiles, and (iii) 22 metabolomic data sets and (iv)
we applied single sample network inference to a study case pertaining
to the investigation of necrotizing soft tissue infections to show
how these methods can be applied in metabolomics. We also proposed
some adaptations of the methods that can be used for data exploration.
Overall, despite some limitations, we found single sample networks
to be a promising tool for the analysis of metabolomics data.
Collapse
Affiliation(s)
- Sanjeevan Jahagirdar
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, The Netherlands
| |
Collapse
|
13
|
Li Y, Ma A, Mathé EA, Li L, Liu B, Ma Q. Elucidation of Biological Networks across Complex Diseases Using Single-Cell Omics. Trends Genet 2020; 36:951-966. [PMID: 32868128 DOI: 10.1016/j.tig.2020.08.004] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Revised: 07/29/2020] [Accepted: 08/04/2020] [Indexed: 12/14/2022]
Abstract
Single-cell multimodal omics (scMulti-omics) technologies have made it possible to trace cellular lineages during differentiation and to identify new cell types in heterogeneous cell populations. The derived information is especially promising for computing cell-type-specific biological networks encoded in complex diseases and improving our understanding of the underlying gene regulatory mechanisms. The integration of these networks could, therefore, give rise to a heterogeneous regulatory landscape (HRL) in support of disease diagnosis and drug therapeutics. In this review, we provide an overview of this field and pay particular attention to how diverse biological networks can be inferred in a specific cell type based on integrative methods. Then, we discuss how HRL can advance our understanding of regulatory mechanisms underlying complex diseases and aid in the prediction of prognosis and therapeutic responses. Finally, we outline challenges and future trends that will be central to bringing the field of HRL in complex diseases forward.
Collapse
Affiliation(s)
- Yang Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Ewy A Mathé
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health (NIH), Rockville, MD, 20892, USA
| | - Lang Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China.
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|