1
|
Sadria M, Swaroop V. Discovering governing equations of biological systems through representation learning and sparse model discovery. NAR Genom Bioinform 2025; 7:lqaf048. [PMID: 40290314 PMCID: PMC12034105 DOI: 10.1093/nargab/lqaf048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2024] [Revised: 03/19/2025] [Accepted: 04/11/2025] [Indexed: 04/30/2025] Open
Abstract
Understanding the governing rules of complex biological systems remains a significant challenge due to the nonlinear, high-dimensional nature of biological data. In this study, we present CLERA, a novel end-to-end computational framework designed to uncover parsimonious dynamical models and identify active gene programs from single-cell RNA sequencing data. By integrating a supervised autoencoder architecture with Sparse Identification of Nonlinear Dynamics, CLERA leverages prior knowledge to simultaneously extract related low-dimensional representation and uncover the underlying dynamical systems that drive the processes. Through the analysis of both synthetic and biological data, CLERA demonstrates robust performance in reconstructing gene expression dynamics, identifying key regulatory genes, and capturing temporal patterns across distinct cell types. CLERA's ability to generate dynamic interaction networks, combined with network rewiring using Personalized PageRank to highlight central genes and active gene programs, offers new insights into the complex regulatory mechanisms underlying cellular processes.
Collapse
Affiliation(s)
- Mehrshad Sadria
- Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Vasu Swaroop
- Department of Computer Science Information Systems, BITS-Pilani, Pilani Campus, Pilani 333031, India
| |
Collapse
|
2
|
da Silva JEH, Bernardino HS, de Oliveira IL, Camata JJ. A survey of the methodological process of modeling, inference, and evaluation of gene regulatory networks using scRNA-Seq data. Biosystems 2025; 253:105464. [PMID: 40409400 DOI: 10.1016/j.biosystems.2025.105464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 03/20/2025] [Accepted: 04/17/2025] [Indexed: 05/25/2025]
Abstract
The advent of scRNA-Seq sequencing technology has provided unprecedented resolutions in the analysis of gene regulatory networks (GRNs) at the single-cell level. However, new technical and methodological challenges also emerged. Factors such as the large number of zeros reported in expression levels, the biological variation due to the stochastic nature of gene expression, environmental niche, and effects created by the cell cycle make it difficult to correctly interpret the data obtained in the sequencing stage. On the other hand, the development of methods for the inference of GRNs, specifically using scRNA-Seq technology, proved to be of similar quality to random predictors. The lack of adequate pre-processing of gene expression data, including selection steps for subsets of genes of interest, smoothing, and discretization of gene expression, in addition to the different ways of modeling networks and network motifs, are factors that affect the performance of inference approaches. Finally, the lack of knowledge about the ground-truth network and the non-standardization of appropriate metrics to measure the quality of inferred networks make the process of comparing performance between algorithms a major problem, given the unbalanced nature of the data and the interpretation bias caused by the chosen metric. This article brings these issues to light, aiming to show how these factors influence both the inference process and the performance evaluation of inferred networks, through comparative computational experiments and provides suggestions for a more robust methodological process for researchers dealing with inference of GRNs.
Collapse
Affiliation(s)
- José Eduardo H da Silva
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil.
| | - Heder S Bernardino
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil
| | - Itamar L de Oliveira
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil
| | - José J Camata
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil
| |
Collapse
|
3
|
Li H, Zhang Z, Squires M, Chen X, Zhang X. scMultiSim: simulation of single-cell multi-omics and spatial data guided by gene regulatory networks and cell-cell interactions. Nat Methods 2025; 22:982-993. [PMID: 40247122 DOI: 10.1038/s41592-025-02651-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Accepted: 03/07/2025] [Indexed: 04/19/2025]
Abstract
Simulated single-cell data are essential for designing and evaluating computational methods in the absence of experimental ground truth. Here we present scMultiSim, a comprehensive simulator that generates multimodal single-cell data encompassing gene expression, chromatin accessibility, RNA velocity and spatial cell locations while accounting for the relationships between modalities. Unlike existing tools that focus on limited biological factors, scMultiSim simultaneously models cell identity, gene regulatory networks, cell-cell interactions and chromatin accessibility while incorporating technical noise. Moreover, it allows users to adjust each factor's effect easily. Here we show that scMultiSim generates data with expected biological effects, and demonstrate its applications by benchmarking a wide range of computational tasks, including multimodal and multi-batch data integration, RNA velocity estimation, gene regulatory network inference and cell-cell interaction inference using spatially resolved gene expression data. Compared to existing simulators, scMultiSim can benchmark a much broader range of existing computational problems and even new potential tasks.
Collapse
Affiliation(s)
- Hechen Li
- Georgia Institute of Technology, Atlanta, GA, USA
| | - Ziqi Zhang
- Georgia Institute of Technology, Atlanta, GA, USA
| | | | - Xi Chen
- Southern University of Science and Technology, Shenzhen, China
| | - Xiuwei Zhang
- Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
4
|
Kalfon J, Samaran J, Peyré G, Cantini L. scPRINT: pre-training on 50 million cells allows robust gene network predictions. Nat Commun 2025; 16:3607. [PMID: 40240364 PMCID: PMC12003772 DOI: 10.1038/s41467-025-58699-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Accepted: 03/24/2025] [Indexed: 04/18/2025] Open
Abstract
A cell is governed by the interaction of myriads of macromolecules. Inferring such a network of interactions has remained an elusive milestone in cellular biology. Building on recent advances in large foundation models and their ability to learn without supervision, we present scPRINT, a large cell model for the inference of gene networks pre-trained on more than 50 million cells from the cellxgene database. Using innovative pretraining tasks and model architecture, scPRINT pushes large transformer models towards more interpretability and usability when uncovering the complex biology of the cell. Based on our atlas-level benchmarks, scPRINT demonstrates superior performance in gene network inference to the state of the art, as well as competitive zero-shot abilities in denoising, batch effect correction, and cell label prediction. On an atlas of benign prostatic hyperplasia, scPRINT highlights the profound connections between ion exchange, senescence, and chronic inflammation.
Collapse
Affiliation(s)
- Jérémie Kalfon
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics group, F-75015, Paris, France
| | - Jules Samaran
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics group, F-75015, Paris, France
| | - Gabriel Peyré
- CNRS and DMA de l'Ecole Normale Supérieure, CNRS, Ecole Normale Supérieure, Université PSL, 75005, Paris, France
| | - Laura Cantini
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics group, F-75015, Paris, France.
| |
Collapse
|
5
|
Chevalley M, Roohani YH, Mehrjou A, Leskovec J, Schwab P. A large-scale benchmark for network inference from single-cell perturbation data. Commun Biol 2025; 8:412. [PMID: 40069299 PMCID: PMC11897147 DOI: 10.1038/s42003-025-07764-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Accepted: 02/18/2025] [Indexed: 03/15/2025] Open
Abstract
Mapping biological mechanisms in cellular systems is a fundamental step in early-stage drug discovery that serves to generate hypotheses on what disease-relevant molecular targets may effectively be modulated by pharmacological interventions. With the advent of high-throughput methods for measuring single-cell gene expression under genetic perturbations, we now have effective means for generating evidence for causal gene-gene interactions at scale. However, evaluating the performance of network inference methods in real-world environments is challenging due to the lack of ground-truth knowledge. Moreover, traditional evaluations conducted on synthetic datasets do not reflect the performance in real-world systems. We thus introduce CausalBench, a benchmark suite revolutionizing network inference evaluation with real-world, large-scale single-cell perturbation data. CausalBench, distinct from existing benchmarks, offers biologically-motivated metrics and distribution-based interventional measures, providing a more realistic evaluation of network inference methods. An initial systematic evaluation of state-of-the-art causal inference methods using our CausalBench suite highlights how poor scalability of existing methods limits performance. Moreover, methods that use interventional information do not outperform those that only use observational data, contrary to what is observed on synthetic benchmarks. CausalBench subsequently enables the development of numerous promising methods through a community challenge, thus demonstrating its potential as a transformative tool in the field of computational biology, bridging the gap between theoretical innovation and practical application in drug discovery and disease understanding. Thus, CausalBench opens new avenues for method developers in causal network inference research, and provides to practitioners a principled and reliable way to track progress in network methods for real-world interventional data.
Collapse
Affiliation(s)
| | - Yusuf H Roohani
- GSK.ai, Zug, Switzerland
- Stanford University, Stanford, CA, USA
| | | | | | | |
Collapse
|
6
|
Xu J, Lu C, Jin S, Meng Y, Fu X, Zeng X, Nussinov R, Cheng F. Deep learning-based cell-specific gene regulatory networks inferred from single-cell multiome data. Nucleic Acids Res 2025; 53:gkaf138. [PMID: 40037709 PMCID: PMC11879466 DOI: 10.1093/nar/gkaf138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 01/03/2025] [Accepted: 02/13/2025] [Indexed: 03/06/2025] Open
Abstract
Gene regulatory networks (GRNs) provide a global representation of how genetic/genomic information is transferred in living systems and are a key component in understanding genome regulation. Single-cell multiome data provide unprecedented opportunities to reconstruct GRNs at fine-grained resolution. However, the inference of GRNs is hindered by insufficient single omic profiles due to the characteristic high loss rate of single-cell sequencing data. In this study, we developed scMultiomeGRN, a deep learning framework to infer transcription factor (TF) regulatory networks via unique integration of single-cell genomic (single-cell RNA sequencing) and epigenomic (single-cell ATAC sequencing) data. We create scMultiomeGRN to elucidate these networks by conceptualizing TF network graph structures. Specifically, we build modality-specific neighbor aggregators and cross-modal attention modules to learn latent representations of TFs from single-cell multi-omics. We demonstrate that scMultiomeGRN outperforms state-of-the-art models on multiple benchmark datasets involved in diseases and health. Via scMultiomeGRN, we identified Alzheimer's disease-relevant regulatory network of SPI1 and RUNX1 for microglia. In summary, scMultiomeGRN offers a deep learning framework to identify cell type-specific gene regulatory network from single-cell multiome data.
Collapse
Affiliation(s)
- Junlin Xu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei 430065, China
| | - Changcheng Lu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Shuting Jin
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei 430065, China
| | - Yajie Meng
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, Hubei 430200, China
| | - Xiangzheng Fu
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Ruth Nussinov
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute at Frederick, Frederick, MD 21702, United States
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Feixiong Cheng
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, United States
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, United States
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, United States
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, United States
| |
Collapse
|
7
|
Dibaeinia P, Ojha A, Sinha S. Interpretable AI for inference of causal molecular relationships from omics data. SCIENCE ADVANCES 2025; 11:eadk0837. [PMID: 39951525 PMCID: PMC11827637 DOI: 10.1126/sciadv.adk0837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 01/14/2025] [Indexed: 02/16/2025]
Abstract
The discovery of molecular relationships from high-dimensional data is a major open problem in bioinformatics. Machine learning and feature attribution models have shown great promise in this context but lack causal interpretation. Here, we show that a popular feature attribution model, under certain assumptions, estimates an average of a causal quantity reflecting the direct influence of one variable on another. We leverage this insight to propose a precise definition of a gene regulatory relationship and implement a new tool, CIMLA (Counterfactual Inference by Machine Learning and Attribution Models), to identify differences in gene regulatory networks between biological conditions, a problem that has received great attention in recent years. Using extensive benchmarking on simulated data, we show that CIMLA is more robust to confounding variables and is more accurate than leading methods. Last, we use CIMLA to analyze a previously published single-cell RNA sequencing dataset from subjects with and without Alzheimer's disease (AD), discovering several potential regulators of AD.
Collapse
Affiliation(s)
- Payam Dibaeinia
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Abhishek Ojha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
- H. Milton School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
8
|
Wang C, Liu ZP. Diffusion-based generation of gene regulatory networks from scRNA-seq data with DigNet. Genome Res 2025; 35:340-354. [PMID: 39694856 PMCID: PMC11874984 DOI: 10.1101/gr.279551.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 12/10/2024] [Indexed: 12/20/2024]
Abstract
A gene regulatory network (GRN) intricately encodes the interconnectedness of identities and functionalities of genes within cells, ultimately shaping cellular specificity. Despite decades of endeavors, reverse engineering of GRNs from gene expression profiling data remains a profound challenge, particularly when it comes to reconstructing cell-specific GRNs that are tailored to precise cellular and genetic contexts. Here, we propose a discrete diffusion generation model, called DigNet, capable of generating corresponding GRNs from high-throughput single-cell RNA sequencing (scRNA-seq) data. DigNet embeds the network generation process into a multistep recovery procedure with Markov properties. Each intermediate step has a specific model to recover a portion of the gene regulatory architectures. It thus can ensure compatibility between global network structures and regulatory modules through the unique multistep diffusion procedure. Furthermore, through iMetacell integration and non-Euclidean discrete space modeling, DigNet is robust to the presence of noise in scRNA-seq data and the sparsity of GRNs. Benchmark evaluation results against more than a dozen state-of-the-art network inference methods demonstrate that DigNet achieves superior performance across various single-cell GRN reconstruction experiments. Furthermore, DigNet provides unique insights into the immune response in breast cancer, derived from differential gene regulation identified in T cells. As an open-source software, DigNet offers a powerful and effective tool for generating cell-specific GRNs from scRNA-seq data.
Collapse
Affiliation(s)
- Chuanyuan Wang
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
9
|
Khullar S, Huang X, Ramesh R, Svaren J, Wang D. NetREm: Network Regression Embeddings reveal cell-type transcription factor coordination for gene regulation. BIOINFORMATICS ADVANCES 2024; 5:vbae206. [PMID: 40260118 PMCID: PMC12011367 DOI: 10.1093/bioadv/vbae206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 10/22/2024] [Accepted: 12/18/2024] [Indexed: 04/23/2025]
Abstract
Motivation Transcription factor (TF) coordination plays a key role in gene regulation via direct and/or indirect protein-protein interactions (PPIs) and co-binding to regulatory elements on DNA. Single-cell technologies facilitate gene expression measurement for individual cells and cell-type identification, yet the connection between TF-TF coordination and target gene (TG) regulation of various cell types remains unclear. Results To address this, we introduce our innovative computational approach, Network Regression Embeddings (NetREm), to reveal cell-type TF-TF coordination activities for TG regulation. NetREm leverages network-constrained regularization, using prior knowledge of PPIs among TFs, to analyze single-cell gene expression data, uncovering cell-type coordinating TFs and identifying revolutionary TF-TG candidate regulatory network links. NetREm's performance is validated using simulation studies and benchmarked across several datasets in humans, mice, yeast. Further, we showcase NetREm's ability to prioritize valid novel human TF-TF coordination links in 9 peripheral blood mononuclear and 42 immune cell sub-types. We apply NetREm to examine cell-type networks in central and peripheral nerve systems (e.g. neuronal, glial, Schwann cells) and in Alzheimer's disease versus Controls. Top predictions are validated with experimental data from rat, mouse, and human models. Additional functional genomics data helps link genetic variants to our TF-TG regulatory and TF-TF coordination networks. Availability and implementation https://github.com/SaniyaKhullar/NetREm.
Collapse
Affiliation(s)
- Saniya Khullar
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, United States
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53076, United States
| | - Xiang Huang
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, United States
| | - Raghu Ramesh
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, United States
- Comparative Biomedical Sciences Training Program, University of Wisconsin-Madison, Madison, WI 53706, United States
| | - John Svaren
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, United States
- Department of Comparative Biosciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI 53706, United States
| | - Daifeng Wang
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, United States
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53076, United States
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, United States
| |
Collapse
|
10
|
Wang Y, Dede M, Mohanty V, Dou J, Li Z, Chen K. A statistical approach for systematic identification of transition cells from scRNA-seq data. CELL REPORTS METHODS 2024; 4:100913. [PMID: 39644902 PMCID: PMC11704623 DOI: 10.1016/j.crmeth.2024.100913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 09/01/2024] [Accepted: 11/13/2024] [Indexed: 12/09/2024]
Abstract
Decoding cellular state transitions is crucial for understanding complex biological processes in development and disease. While recent advancements in single-cell RNA sequencing (scRNA-seq) offer insights into cellular trajectories, existing tools primarily study expressional rather than regulatory state shifts. We present CellTran, a statistical approach utilizing paired-gene expression correlations to detect transition cells from scRNA-seq data without explicitly resolving gene regulatory networks. Applying our approach to various contexts, including tissue regeneration, embryonic development, preinvasive lesions, and humoral responses post-vaccination, reveals transition cells and their distinct gene expression profiles. Our study sheds light on the underlying molecular mechanisms driving cellular state transitions, enhancing our ability to identify therapeutic targets for disease interventions.
Collapse
Affiliation(s)
- Yuanxin Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Merve Dede
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Vakul Mohanty
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jinzhuang Dou
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Ziyi Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
11
|
Peng D, Cahan P. OneSC: a computational platform for recapitulating cell state transitions. Bioinformatics 2024; 40:btae703. [PMID: 39570626 PMCID: PMC11630913 DOI: 10.1093/bioinformatics/btae703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 11/13/2024] [Accepted: 11/19/2024] [Indexed: 11/22/2024] Open
Abstract
MOTIVATION Computational modeling of cell state transitions has been a great interest of many in the field of developmental biology, cancer biology, and cell fate engineering because it enables performing perturbation experiments in silico more rapidly and cheaply than could be achieved in a lab. Recent advancements in single-cell RNA-sequencing (scRNA-seq) allow the capture of high-resolution snapshots of cell states as they transition along temporal trajectories. Using these high-throughput datasets, we can train computational models to generate in silico "synthetic" cells that faithfully mimic the temporal trajectories. RESULTS Here we present OneSC, a platform that can simulate cell state transitions using systems of stochastic differential equations govern by a regulatory network of core transcription factors (TFs). Different from many current network inference methods, OneSC prioritizes on generating Boolean network that produces faithful cell state transitions and terminal cell states that mimic real biological systems. Applying OneSC to real data, we inferred a core TF network using a mouse myeloid progenitor scRNA-seq dataset and showed that the dynamical simulations of that network generate synthetic single-cell expression profiles that faithfully recapitulate the four myeloid differentiation trajectories going into differentiated cell states (erythrocytes, megakaryocytes, granulocytes, and monocytes). Finally, through the in silico perturbations of the mouse myeloid progenitor core network, we showed that OneSC can accurately predict cell fate decision biases of TF perturbations that closely match with previous experimental observations. AVAILABILITY AND IMPLEMENTATION OneSC is implemented as a Python package on GitHub (https://github.com/CahanLab/oneSC) and on Zenodo (https://zenodo.org/records/14052421).
Collapse
Affiliation(s)
- Da Peng
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, United States
| | - Patrick Cahan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, United States
- Institute for Cell Engineering, Johns Hopkins University, Baltimore, MD 21205, United States
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD 21205, United States
| |
Collapse
|
12
|
Karamveer, Uzun Y. Approaches for Benchmarking Single-Cell Gene Regulatory Network Methods. Bioinform Biol Insights 2024; 18:11779322241287120. [PMID: 39502448 PMCID: PMC11536393 DOI: 10.1177/11779322241287120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 09/10/2024] [Indexed: 11/08/2024] Open
Abstract
Gene regulatory networks are powerful tools for modeling genetic interactions that control the expression of genes driving cell differentiation, and single-cell sequencing offers a unique opportunity to build these networks with high-resolution genomic data. There are many proposed computational methods to build these networks using single-cell data, and different approaches are used to benchmark these methods. However, a comprehensive discussion specifically focusing on benchmarking approaches is missing. In this article, we lay the GRN terminology, present an overview of common gold-standard studies and data sets, and define the performance metrics for benchmarking network construction methodologies. We also point out the advantages and limitations of different benchmarking approaches, suggest alternative ground truth data sets that can be used for benchmarking, and specify additional considerations in this context.
Collapse
Affiliation(s)
- Karamveer
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Penn State Cancer Institute, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
13
|
Dong J, Li J, Wang F. Deep Learning in Gene Regulatory Network Inference: A Survey. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2089-2101. [PMID: 39137088 DOI: 10.1109/tcbb.2024.3442536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2024]
Abstract
Understanding the intricate regulatory relationships among genes is crucial for comprehending the development, differentiation, and cellular response in living systems. Consequently, inferring gene regulatory networks (GRNs) based on observed data has gained significant attention as a fundamental goal in biological applications. The proliferation and diversification of available data present both opportunities and challenges in accurately inferring GRNs. Deep learning, a highly successful technique in various domains, holds promise in aiding GRN inference. Several GRN inference methods employing deep learning models have been proposed; however, the selection of an appropriate method remains a challenge for life scientists. In this survey, we provide a comprehensive analysis of 12 GRN inference methods that leverage deep learning models. We trace the evolution of these major methods and categorize them based on the types of applicable data. We delve into the core concepts and specific steps of each method, offering a detailed evaluation of their effectiveness and scalability across different scenarios. These insights enable us to make informed recommendations. Moreover, we explore the challenges faced by GRN inference methods utilizing deep learning and discuss future directions, providing valuable suggestions for the advancement of data scientists in this field.
Collapse
|
14
|
Aguirre M, Spence JP, Sella G, Pritchard JK. Gene regulatory network structure informs the distribution of perturbation effects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.04.602130. [PMID: 39005431 PMCID: PMC11245109 DOI: 10.1101/2024.07.04.602130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Gene regulatory networks (GRNs) govern many core developmental and biological processes underlying human complex traits. Even with broad-scale efforts to characterize the effects of molecular perturbations and interpret gene coexpression, it remains challenging to infer the architecture of gene regulation in a precise and efficient manner. Key properties of GRNs, like hierarchical structure, modular organization, and sparsity, provide both challenges and opportunities for this objective. Here, we seek to better understand properties of GRNs using a new approach to simulate their structure and model their function. We produce realistic network structures with a novel generating algorithm based on insights from small-world network theory, and we model gene expression regulation using stochastic differential equations formulated to accommodate modeling molecular perturbations. With these tools, we systematically describe the effects of gene knockouts within and across GRNs, finding a subset of networks that recapitulate features of a recent genome-scale perturbation study. With deeper analysis of these exemplar networks, we consider future avenues to map the architecture of gene expression regulation using data from cells in perturbed and unperturbed states, finding that while perturbation data are critical to discover specific regulatory interactions, data from unperturbed cells may be sufficient to reveal regulatory programs.
Collapse
Affiliation(s)
- Matthew Aguirre
- Department of Biomedical Data Science, Stanford University, Stanford CA
| | | | - Guy Sella
- Department of Biological Sciences, Columbia University, New York NY
- Program for Mathematical Genomics, Columbia University, New York NY
| | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford CA
- Department of Biology, Stanford University, Stanford CA
| |
Collapse
|
15
|
Su C, Pastor WA, Emad A. Deciphering lineage-relevant gene regulatory networks during endoderm formation by InPheRNo-ChIP. Brief Bioinform 2024; 25:bbae592. [PMID: 39535258 PMCID: PMC11558691 DOI: 10.1093/bib/bbae592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 10/09/2024] [Accepted: 11/01/2024] [Indexed: 11/16/2024] Open
Abstract
Deciphering the underlying gene regulatory networks (GRNs) that govern early human embryogenesis is critical for understanding developmental mechanisms yet remains challenging due to limited sample availability and the inherent complexity of the biological processes involved. To address this, we developed InPheRNo-ChIP, a computational framework that integrates multimodal data, including RNA-seq, transcription factor (TF)-specific ChIP-seq, and phenotypic labels, to reconstruct phenotype-relevant GRNs associated with endoderm development. The core of this method is a probabilistic graphical model that models the simultaneous effect of TFs on their putative target genes to influence a particular phenotypic outcome. Unlike the majority of existing GRN inference methods that are agnostic to the phenotypic outcomes, InPheRNo-ChIP directly incorporates phenotypic information during GRN inference, enabling the distinction between lineage-specific and general regulatory interactions. We integrated data from three experimental studies and applied InPheRNo-ChIP to infer the GRN governing the differentiation of human embryonic stem cells into definitive endoderm. Benchmarking against a scRNA-seq CRISPRi study demonstrated InPheRNo-ChIP's ability to identify regulatory interactions involving endoderm markers FOXA2, SMAD2, and SOX17, outperforming other methods. This highlights the importance of incorporating the phenotypic context during network inference. Furthermore, an ablation study confirms the synergistic contribution of ChIP-seq, RNA-seq, and phenotypic data, highlighting the value of multimodal integration for accurate phenotype-relevant GRN reconstruction.
Collapse
Affiliation(s)
- Chen Su
- Department of Electrical and Computer Engineering, McGill University, 845 Sherbrooke Street West, Montreal, Quebec H3A 0G4, Canada
| | - William A Pastor
- Department of Biochemistry, McGill University, 845 Sherbrooke Street West, Montreal, Quebec H3A 0G4, Canada
- The Rosalind and Morris Goodman Cancer Institute, 1160 Pine Avenue, Montreal, Quebec H3A 1A3, Canada
| | - Amin Emad
- Department of Electrical and Computer Engineering, McGill University, 845 Sherbrooke Street West, Montreal, Quebec H3A 0G4, Canada
- The Rosalind and Morris Goodman Cancer Institute, 1160 Pine Avenue, Montreal, Quebec H3A 1A3, Canada
- Mila, Quebec AI Institute, 6666 St-Urbain Street #200, Montreal, Quebec H2S 3H1, Canada
| |
Collapse
|
16
|
Zhao W, Larschan E, Sandstede B, Singh R. Optimal transport reveals dynamic gene regulatory networks via gene velocity estimation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.12.612590. [PMID: 39345416 PMCID: PMC11429941 DOI: 10.1101/2024.09.12.612590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Inferring gene regulatory networks from gene expression data is an important and challenging problem in the biology community. We propose OTVelo, a methodology that takes time-stamped single-cell gene expression data as input and predicts gene regulation across two time points. It is known that the rate of change of gene expression, which we will refer to as gene velocity, provides crucial information that enhances such inference; however, this information is not always available due to the limitations in sequencing depth. Our algorithm overcomes this limitation by estimating gene velocities using optimal transport. We then infer gene regulation using time-lagged correlation and Granger causality via regularized linear regression. Instead of providing an aggregated network across all time points, our method uncovers the underlying dynamical mechanism across time points. We validate our algorithm on 13 simulated datasets with both synthetic and curated networks and demonstrate its efficacy on 4 experimental data sets.
Collapse
Affiliation(s)
- Wenjun Zhao
- Division of Applied Mathematics, Brown University, Providence, RI 02912, USA
| | - Erica Larschan
- Department of Molecular Biology, Cell Biology and Biochemistry, Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| | - Björn Sandstede
- Division of Applied Mathematics , Brown University, Providence, RI 02912, USA
| | - Ritambhara Singh
- Department of Computer Science, Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| |
Collapse
|
17
|
Mizukoshi C, Kojima Y, Nomura S, Hayashi S, Abe K, Shimamura T. DeepKINET: a deep generative model for estimating single-cell RNA splicing and degradation rates. Genome Biol 2024; 25:229. [PMID: 39237934 PMCID: PMC11378460 DOI: 10.1186/s13059-024-03367-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 08/04/2024] [Indexed: 09/07/2024] Open
Abstract
Messenger RNA splicing and degradation are critical for gene expression regulation, the abnormality of which leads to diseases. Previous methods for estimating kinetic rates have limitations, assuming uniform rates across cells. DeepKINET is a deep generative model that estimates splicing and degradation rates at single-cell resolution from scRNA-seq data. DeepKINET outperforms existing methods on simulated and metabolic labeling datasets. Applied to forebrain and breast cancer data, it identifies RNA-binding proteins responsible for kinetic rate diversity. DeepKINET also analyzes the effects of splicing factor mutations on target genes in erythroid lineage cells. DeepKINET effectively reveals cellular heterogeneity in post-transcriptional regulation.
Collapse
Affiliation(s)
- Chikara Mizukoshi
- Division of Systems Biology, Graduate School of Medicine, Nagoya University, Aichi, Japan.
- Nagoya University Hospital, Aichi, Japan.
| | - Yasuhiro Kojima
- Laboratory of Computational Life Science, National Cancer Center Research Institute, Tokyo, Japan.
- Department of Computational and Systems Biology, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan.
| | - Satoshi Nomura
- Division of Systems Biology, Graduate School of Medicine, Nagoya University, Aichi, Japan
| | - Shuto Hayashi
- Department of Computational and Systems Biology, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
| | - Ko Abe
- Department of Computational and Systems Biology, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
| | - Teppei Shimamura
- Division of Systems Biology, Graduate School of Medicine, Nagoya University, Aichi, Japan.
- Department of Computational and Systems Biology, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan.
| |
Collapse
|
18
|
Sadria M, Bury TM. FateNet: an integration of dynamical systems and deep learning for cell fate prediction. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae525. [PMID: 39177093 PMCID: PMC11399232 DOI: 10.1093/bioinformatics/btae525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 06/28/2024] [Accepted: 08/21/2024] [Indexed: 08/24/2024]
Abstract
MOTIVATION Understanding cellular decision-making, particularly its timing and impact on the biological system such as tissue health and function, is a fundamental challenge in biology and medicine. Existing methods for inferring fate decisions and cellular state dynamics from single-cell RNA sequencing data lack precision regarding decision points and broader tissue implications. Addressing this gap, we present FateNet, a computational approach integrating dynamical systems theory and deep learning to probe the cell decision-making process using scRNA-seq data. RESULTS By leveraging information about normal forms and scaling behavior near bifurcations common to many dynamical systems, FateNet predicts cell decision occurrence with higher accuracy than conventional methods and offers qualitative insights into the new state of the biological system. Also, through in-silico perturbation experiments, FateNet identifies key genes and pathways governing the differentiation process in hematopoiesis. Validated using different scRNA-seq data, FateNet emerges as a user-friendly and valuable tool for predicting critical points in biological processes, providing insights into complex trajectories. AVAILABILITY AND IMPLEMENTATION github.com/ThomasMBury/fatenet.
Collapse
Affiliation(s)
- Mehrshad Sadria
- Department of Applied Mathematics, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| | - Thomas M Bury
- Department of Physiology, McGill University, Montreal, QC H3G 1Y6, Canada
| |
Collapse
|
19
|
Luo E, Hao M, Wei L, Zhang X. scDiffusion: conditional generation of high-quality single-cell data using diffusion model. Bioinformatics 2024; 40:btae518. [PMID: 39171840 PMCID: PMC11368386 DOI: 10.1093/bioinformatics/btae518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 08/10/2024] [Accepted: 08/20/2024] [Indexed: 08/23/2024] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) data are important for studying the laws of life at single-cell level. However, it is still challenging to obtain enough high-quality scRNA-seq data. To mitigate the limited availability of data, generative models have been proposed to computationally generate synthetic scRNA-seq data. Nevertheless, the data generated with current models are not very realistic yet, especially when we need to generate data with controlled conditions. In the meantime, diffusion models have shown their power in generating data with high fidelity, providing a new opportunity for scRNA-seq generation. RESULTS In this study, we developed scDiffusion, a generative model combining the diffusion model and foundation model to generate high-quality scRNA-seq data with controlled conditions. We designed multiple classifiers to guide the diffusion process simultaneously, enabling scDiffusion to generate data under multiple condition combinations. We also proposed a new control strategy called Gradient Interpolation. This strategy allows the model to generate continuous trajectories of cell development from a given cell state. Experiments showed that scDiffusion could generate single-cell gene expression data closely resembling real scRNA-seq data. Also, scDiffusion can conditionally produce data on specific cell types including rare cell types. Furthermore, we could use the multiple-condition generation of scDiffusion to generate cell type that was out of the training data. Leveraging the Gradient Interpolation strategy, we generated a continuous developmental trajectory of mouse embryonic cells. These experiments demonstrate that scDiffusion is a powerful tool for augmenting the real scRNA-seq data and can provide insights into cell fate research. AVAILABILITY AND IMPLEMENTATION scDiffusion is openly available at the GitHub repository https://github.com/EperLuo/scDiffusion or Zenodo https://zenodo.org/doi/10.5281/zenodo.13268742.
Collapse
Affiliation(s)
- Erpai Luo
- MOE Key Lab of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Minsheng Hao
- MOE Key Lab of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Lei Wei
- MOE Key Lab of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- MOE Key Lab of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
- School of Life Sciences and School of Medicine, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
20
|
Sadria M, Layton A, Goyal S, Bader GD. Fatecode enables cell fate regulator prediction using classification-supervised autoencoder perturbation. CELL REPORTS METHODS 2024; 4:100819. [PMID: 38986613 PMCID: PMC11294839 DOI: 10.1016/j.crmeth.2024.100819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 11/20/2023] [Accepted: 06/18/2024] [Indexed: 07/12/2024]
Abstract
Cell reprogramming, which guides the conversion between cell states, is a promising technology for tissue repair and regeneration, with the ultimate goal of accelerating recovery from diseases or injuries. To accomplish this, regulators must be identified and manipulated to control cell fate. We propose Fatecode, a computational method that predicts cell fate regulators based only on single-cell RNA sequencing (scRNA-seq) data. Fatecode learns a latent representation of the scRNA-seq data using a deep learning-based classification-supervised autoencoder and then performs in silico perturbation experiments on the latent representation to predict genes that, when perturbed, would alter the original cell type distribution to increase or decrease the population size of a cell type of interest. We assessed Fatecode's performance using simulations from a mechanistic gene-regulatory network model and scRNA-seq data mapping blood and brain development of different organisms. Our results suggest that Fatecode can detect known cell fate regulators from single-cell transcriptomics datasets.
Collapse
Affiliation(s)
- Mehrshad Sadria
- Department of Applied Mathematics, University of Waterloo, Waterloo, ON, Canada.
| | - Anita Layton
- Department of Applied Mathematics, University of Waterloo, Waterloo, ON, Canada; Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada; Department of Biology, University of Waterloo, Waterloo, ON, Canada; School of Pharmacy, University of Waterloo, Waterloo, ON, Canada
| | - Sidhartha Goyal
- Department of Physics, University of Toronto, Toronto, ON, Canada
| | - Gary D Bader
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada; The Donnelly Centre, University of Toronto, Toronto, ON, Canada; Department of Computer Science, University of Toronto, Toronto, ON, Canada; The Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada; Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada; Canadian Institute for Advanced Research (CIFAR), Toronto, ON, Canada
| |
Collapse
|
21
|
Magaña-López G, Calzone L, Zinovyev A, Paulevé L. scBoolSeq: Linking scRNA-seq statistics and Boolean dynamics. PLoS Comput Biol 2024; 20:e1011620. [PMID: 38976751 PMCID: PMC11257695 DOI: 10.1371/journal.pcbi.1011620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 07/18/2024] [Accepted: 06/24/2024] [Indexed: 07/10/2024] Open
Abstract
Boolean networks are largely employed to model the qualitative dynamics of cell fate processes by describing the change of binary activation states of genes and transcription factors with time. Being able to bridge such qualitative states with quantitative measurements of gene expression in cells, as scRNA-seq, is a cornerstone for data-driven model construction and validation. On one hand, scRNA-seq binarisation is a key step for inferring and validating Boolean models. On the other hand, the generation of synthetic scRNA-seq data from baseline Boolean models provides an important asset to benchmark inference methods. However, linking characteristics of scRNA-seq datasets, including dropout events, with Boolean states is a challenging task. We present scBoolSeq, a method for the bidirectional linking of scRNA-seq data and Boolean activation state of genes. Given a reference scRNA-seq dataset, scBoolSeq computes statistical criteria to classify the empirical gene pseudocount distributions as either unimodal, bimodal, or zero-inflated, and fit a probabilistic model of dropouts, with gene-dependent parameters. From these learnt distributions, scBoolSeq can perform both binarisation of scRNA-seq datasets, and generate synthetic scRNA-seq datasets from Boolean traces, as issued from Boolean networks, using biased sampling and dropout simulation. We present a case study demonstrating the application of scBoolSeq's binarisation scheme in data-driven model inference. Furthermore, we compare synthetic scRNA-seq data generated by scBoolSeq with BoolODE's, data for the same Boolean Network model. The comparison shows that our method better reproduces the statistics of real scRNA-seq datasets, such as the mean-variance and mean-dropout relationships while exhibiting clearly defined trajectories in two-dimensional projections of the data.
Collapse
Affiliation(s)
| | - Laurence Calzone
- Institut Curie, Université PSL, Paris, France
- INSERM, U900, Paris, France
- Mines ParisTech, Université PSL, Paris, France
| | | | - Loïc Paulevé
- Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800, Talence, France
| |
Collapse
|
22
|
Li C, Shao X, Zhang S, Wang Y, Jin K, Yang P, Lu X, Fan X, Wang Y. scRank infers drug-responsive cell types from untreated scRNA-seq data using a target-perturbed gene regulatory network. Cell Rep Med 2024; 5:101568. [PMID: 38754419 PMCID: PMC11228399 DOI: 10.1016/j.xcrm.2024.101568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 12/27/2023] [Accepted: 04/21/2024] [Indexed: 05/18/2024]
Abstract
Cells respond divergently to drugs due to the heterogeneity among cell populations. Thus, it is crucial to identify drug-responsive cell populations in order to accurately elucidate the mechanism of drug action, which is still a great challenge. Here, we address this problem with scRank, which employs a target-perturbed gene regulatory network to rank drug-responsive cell populations via in silico drug perturbations using untreated single-cell transcriptomic data. We benchmark scRank on simulated and real datasets, which shows the superior performance of scRank over existing methods. When applied to medulloblastoma and major depressive disorder datasets, scRank identifies drug-responsive cell types that are consistent with the literature. Moreover, scRank accurately uncovers the macrophage subpopulation responsive to tanshinone IIA and its potential targets in myocardial infarction, with experimental validation. In conclusion, scRank enables the inference of drug-responsive cell types using untreated single-cell data, thus providing insights into the cellular-level impacts of therapeutic interventions.
Collapse
Affiliation(s)
- Chengyu Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China; National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314103, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China; National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314103, China.
| | - Shujing Zhang
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China; Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou, China
| | - Yingchao Wang
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China; Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou, China
| | - Kaiyu Jin
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China; National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314103, China
| | - Penghui Yang
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China; National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314103, China
| | - Xiaoyan Lu
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China; Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou, China
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China; National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314103, China; Jinhua Institute of Zhejiang University, Jinhua 321299, China; Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou 310006, China.
| | - Yi Wang
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China; National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314103, China.
| |
Collapse
|
23
|
Peng D, Cahan P. OneSC: A computational platform for recapitulating cell state transitions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.31.596831. [PMID: 38895453 PMCID: PMC11185539 DOI: 10.1101/2024.05.31.596831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Computational modelling of cell state transitions has been a great interest of many in the field of developmental biology, cancer biology and cell fate engineering because it enables performing perturbation experiments in silico more rapidly and cheaply than could be achieved in a wet lab. Recent advancements in single-cell RNA sequencing (scRNA-seq) allow the capture of high-resolution snapshots of cell states as they transition along temporal trajectories. Using these high-throughput datasets, we can train computational models to generate in silico 'synthetic' cells that faithfully mimic the temporal trajectories. Here we present OneSC, a platform that can simulate synthetic cells across developmental trajectories using systems of stochastic differential equations govern by a core transcription factors (TFs) regulatory network. Different from the current network inference methods, OneSC prioritizes on generating Boolean network that produces faithful cell state transitions and steady cell states that mimic real biological systems. Applying OneSC to real data, we inferred a core TF network using a mouse myeloid progenitor scRNA-seq dataset and showed that the dynamical simulations of that network generate synthetic single-cell expression profiles that faithfully recapitulate the four myeloid differentiation trajectories going into differentiated cell states (erythrocytes, megakaryocytes, granulocytes and monocytes). Finally, through the in-silico perturbations of the mouse myeloid progenitor core network, we showed that OneSC can accurately predict cell fate decision biases of TF perturbations that closely match with previous experimental observations.
Collapse
Affiliation(s)
- Da Peng
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
| | - Patrick Cahan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
- Institute for Cell Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, Maryland, 21205, USA
| |
Collapse
|
24
|
Singh R, Wu AP, Mudide A, Berger B. Causal gene regulatory analysis with RNA velocity reveals an interplay between slow and fast transcription factors. Cell Syst 2024; 15:462-474.e5. [PMID: 38754366 DOI: 10.1016/j.cels.2024.04.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 08/25/2023] [Accepted: 04/18/2024] [Indexed: 05/18/2024]
Abstract
Single-cell expression dynamics, from differentiation trajectories or RNA velocity, have the potential to reveal causal links between transcription factors (TFs) and their target genes in gene regulatory networks (GRNs). However, existing methods either overlook these expression dynamics or necessitate that cells be ordered along a linear pseudotemporal axis, which is incompatible with branching trajectories. We introduce Velorama, an approach to causal GRN inference that represents single-cell differentiation dynamics as a directed acyclic graph of cells, constructed from pseudotime or RNA velocity measurements. Additionally, Velorama enables the estimation of the speed at which TFs influence target genes. Applying Velorama, we uncover evidence that the speed of a TF's interactions is tied to its regulatory function. For human corticogenesis, we find that slow TFs are linked to gliomas, while fast TFs are associated with neuropsychiatric diseases. We expect Velorama to become a critical part of the RNA velocity toolkit for investigating the causal drivers of differentiation and disease.
Collapse
Affiliation(s)
- Rohit Singh
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA.
| | - Alexander P Wu
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA
| | - Anish Mudide
- Phillips Exeter Academy, Exeter, NH 03883, USA; Computer Science and Artificial Intelligence Laboratory and Department of Mathematics, MIT, Cambridge, MA 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory and Department of Mathematics, MIT, Cambridge, MA 02139, USA.
| |
Collapse
|
25
|
Zinati Y, Takiddeen A, Emad A. GRouNdGAN: GRN-guided simulation of single-cell RNA-seq data using causal generative adversarial networks. Nat Commun 2024; 15:4055. [PMID: 38744843 PMCID: PMC11525796 DOI: 10.1038/s41467-024-48516-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 05/01/2024] [Indexed: 05/16/2024] Open
Abstract
We introduce GRouNdGAN, a gene regulatory network (GRN)-guided reference-based causal implicit generative model for simulating single-cell RNA-seq data, in silico perturbation experiments, and benchmarking GRN inference methods. Through the imposition of a user-defined GRN in its architecture, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating transcription factors (TFs). Training on six experimental reference datasets, we show that our model captures non-linear TF-gene dependencies and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise, with no user manipulation and only implicit parameterization. GRouNdGAN can synthesize cells under new conditions to perform in silico TF knockout experiments. Benchmarking various GRN inference algorithms reveals that GRouNdGAN effectively bridges the existing gap between simulated and biological data benchmarks of GRN inference algorithms, providing gold standard ground truth GRNs and realistic cells corresponding to the biological system of interest.
Collapse
Affiliation(s)
- Yazdan Zinati
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada
| | - Abdulrahman Takiddeen
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada
| | - Amin Emad
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada.
- Mila, Quebec AI Institute, Montreal, QC, Canada.
- The Rosalind and Morris Goodman Cancer Institute, Montreal, QC, Canada.
| |
Collapse
|
26
|
Guo C, Huang Z, Chen J, Yu G, Wang Y, Wang X. Identification of Novel Regulators of Leaf Senescence Using a Deep Learning Model. PLANTS (BASEL, SWITZERLAND) 2024; 13:1276. [PMID: 38732491 PMCID: PMC11085074 DOI: 10.3390/plants13091276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 04/26/2024] [Accepted: 04/29/2024] [Indexed: 05/13/2024]
Abstract
Deep learning has emerged as a powerful tool for investigating intricate biological processes in plants by harnessing the potential of large-scale data. Gene regulation is a complex process that transcription factors (TFs), cooperating with their target genes, participate in through various aspects of biological processes. Despite its significance, the study of gene regulation has primarily focused on a limited number of notable instances, leaving numerous aspects and interactions yet to be explored comprehensively. Here, we developed DEGRN (Deep learning on Expression for Gene Regulatory Network), an innovative deep learning model designed to decipher gene interactions by leveraging high-dimensional expression data obtained from bulk RNA-Seq and scRNA-Seq data in the model plant Arabidopsis. DEGRN exhibited a compared level of predictive power when applied to various datasets. Through the utilization of DEGRN, we successfully identified an extensive set of 3,053,363 high-quality interactions, encompassing 1430 TFs and 13,739 non-TF genes. Notably, DEGRN's predictive capabilities allowed us to uncover novel regulators involved in a range of complex biological processes, including development, metabolism, and stress responses. Using leaf senescence as an example, we revealed a complex network underpinning this process composed of diverse TF families, including bHLH, ERF, and MYB. We also identified a novel TF, named MAF5, whose expression showed a strong linear regression relation during the progression of senescence. The mutant maf5 showed early leaf decay compared to the wild type, indicating a potential role in the regulation of leaf senescence. This hypothesis was further supported by the expression patterns observed across four stages of leaf development, as well as transcriptomics analysis. Overall, the comprehensive coverage provided by DEGRN expands our understanding of gene regulatory networks and paves the way for further investigations into their functional implications.
Collapse
Affiliation(s)
| | | | | | | | | | - Xu Wang
- Shanghai Collaborative Innovation Center of Agri-Seeds, Joint Center for Single Cell Biology, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China; (C.G.); (Z.H.); (J.C.); (G.Y.); (Y.W.)
| |
Collapse
|
27
|
Koshkin A, Herbach U, Martínez MR, Gandrillon O, Crauste F. Stochastic modeling of a gene regulatory network driving B cell development in germinal centers. PLoS One 2024; 19:e0301022. [PMID: 38547073 PMCID: PMC10977792 DOI: 10.1371/journal.pone.0301022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 03/08/2024] [Indexed: 04/02/2024] Open
Abstract
Germinal centers (GCs) are the key histological structures of the adaptive immune system, responsible for the development and selection of B cells producing high-affinity antibodies against antigens. Due to their level of complexity, unexpected malfunctioning may lead to a range of pathologies, including various malignant formations. One promising way to improve the understanding of malignant transformation is to study the underlying gene regulatory networks (GRNs) associated with cell development and differentiation. Evaluation and inference of the GRN structure from gene expression data is a challenging task in systems biology: recent achievements in single-cell (SC) transcriptomics allow the generation of SC gene expression data, which can be used to sharpen the knowledge on GRN structure. In order to understand whether a particular network of three key gene regulators (BCL6, IRF4, BLIMP1), influenced by two external stimuli signals (surface receptors BCR and CD40), is able to describe GC B cell differentiation, we used a stochastic model to fit SC transcriptomic data from a human lymphoid organ dataset. The model is defined mathematically as a piecewise-deterministic Markov process. We showed that after parameter tuning, the model qualitatively recapitulates mRNA distributions corresponding to GC and plasmablast stages of B cell differentiation. Thus, the model can assist in validating the GRN structure and, in the future, could lead to better understanding of the different types of dysfunction of the regulatory mechanisms.
Collapse
Affiliation(s)
- Alexey Koshkin
- Inria Dracula, Villeurbanne, France
- Laboratory of Biology and Modelling of the Cell, Universite de Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Lyon, France
| | - Ulysse Herbach
- Université de Lorraine, CNRS, Inria, IECL, Nancy, France
| | | | - Olivier Gandrillon
- Inria Dracula, Villeurbanne, France
- Laboratory of Biology and Modelling of the Cell, Universite de Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Lyon, France
| | | |
Collapse
|
28
|
Pan X, Zhang X. Studying temporal dynamics of single cells: expression, lineage and regulatory networks. Biophys Rev 2024; 16:57-67. [PMID: 38495440 PMCID: PMC10937865 DOI: 10.1007/s12551-023-01090-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 06/27/2023] [Indexed: 03/19/2024] Open
Abstract
Learning how multicellular organs are developed from single cells to different cell types is a fundamental problem in biology. With the high-throughput scRNA-seq technology, computational methods have been developed to reveal the temporal dynamics of single cells from transcriptomic data, from phenomena on cell trajectories to the underlying mechanism that formed the trajectory. There are several distinct families of computational methods including Trajectory Inference (TI), Lineage Tracing (LT), and Gene Regulatory Network (GRN) Inference which are involved in such studies. This review summarizes these computational approaches which use scRNA-seq data to study cell differentiation and cell fate specification as well as the advantages and limitations of different methods. We further discuss how GRNs can potentially affect cell fate decisions and trajectory structures. Supplementary Information The online version contains supplementary material available at 10.1007/s12551-023-01090-5.
Collapse
Affiliation(s)
- Xinhai Pan
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA
| | - Xiuwei Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA
| |
Collapse
|
29
|
Song D, Wang Q, Yan G, Liu T, Sun T, Li JJ. scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics. Nat Biotechnol 2024; 42:247-252. [PMID: 37169966 PMCID: PMC11182337 DOI: 10.1038/s41587-023-01772-1] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 03/30/2023] [Indexed: 05/13/2023]
Abstract
We present a statistical simulator, scDesign3, to generate realistic single-cell and spatial omics data, including various cell states, experimental designs and feature modalities, by learning interpretable parameters from real data. Using a unified probabilistic model for single-cell and spatial omics data, scDesign3 infers biologically meaningful parameters; assesses the goodness-of-fit of inferred cell clusters, trajectories and spatial locations; and generates in silico negative and positive controls for benchmarking computational tools.
Collapse
Affiliation(s)
- Dongyuan Song
- Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, CA, USA
| | - Qingyang Wang
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Guanao Yan
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Tianyang Liu
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Tianyi Sun
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Jingyi Jessica Li
- Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, CA, USA.
- Department of Statistics, University of California, Los Angeles, CA, USA.
- Department of Human Genetics, University of California, Los Angeles, CA, USA.
- Department of Computational Medicine, University of California, Los Angeles, CA, USA.
- Department of Biostatistics, University of California, Los Angeles, CA, USA.
- Radcliffe Institute for Advanced Study, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
30
|
Tyler SR, Lozano-Ojalvo D, Guccione E, Schadt EE. Anti-correlated feature selection prevents false discovery of subpopulations in scRNAseq. Nat Commun 2024; 15:699. [PMID: 38267438 PMCID: PMC10808220 DOI: 10.1038/s41467-023-43406-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 11/07/2023] [Indexed: 01/26/2024] Open
Abstract
While sub-clustering cell-populations has become popular in single cell-omics, negative controls for this process are lacking. Popular feature-selection/clustering algorithms fail the null-dataset problem, allowing erroneous subdivisions of homogenous clusters until nearly each cell is called its own cluster. Using real and synthetic datasets, we find that anti-correlated gene selection reduces or eliminates erroneous subdivisions, increases marker-gene selection efficacy, and efficiently scales to millions of cells.
Collapse
Affiliation(s)
- Scott R Tyler
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Oncological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Daniel Lozano-Ojalvo
- Department of Dermatology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ernesto Guccione
- Department of Oncological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Therapeutics Discovery, Department of Oncological Sciences and Pharmacological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Bioinformatics for Next Generation Sequencing (BiNGS) Shared Resource Facility, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eric E Schadt
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
31
|
Wu Z, Sinha S. SPREd: a simulation-supervised neural network tool for gene regulatory network reconstruction. BIOINFORMATICS ADVANCES 2024; 4:vbae011. [PMID: 38444538 PMCID: PMC10913396 DOI: 10.1093/bioadv/vbae011] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 11/08/2023] [Accepted: 01/18/2024] [Indexed: 03/07/2024]
Abstract
Summary Reconstruction of gene regulatory networks (GRNs) from expression data is a significant open problem. Common approaches train a machine learning (ML) model to predict a gene's expression using transcription factors' (TFs') expression as features and designate important features/TFs as regulators of the gene. Here, we present an entirely different paradigm, where GRN edges are directly predicted by the ML model. The new approach, named "SPREd," is a simulation-supervised neural network for GRN inference. Its inputs comprise expression relationships (e.g. correlation, mutual information) between the target gene and each TF and between pairs of TFs. The output includes binary labels indicating whether each TF regulates the target gene. We train the neural network model using synthetic expression data generated by a biophysics-inspired simulation model that incorporates linear as well as non-linear TF-gene relationships and diverse GRN configurations. We show SPREd to outperform state-of-the-art GRN reconstruction tools GENIE3, ENNET, PORTIA, and TIGRESS on synthetic datasets with high co-expression among TFs, similar to that seen in real data. A key advantage of the new approach is its robustness to relatively small numbers of conditions (columns) in the expression matrix, which is a common problem faced by existing methods. Finally, we evaluate SPREd on real data sets in yeast that represent gold-standard benchmarks of GRN reconstruction and show it to perform significantly better than or comparably to existing methods. In addition to its high accuracy and speed, SPREd marks a first step toward incorporating biophysics principles of gene regulation into ML-based approaches to GRN reconstruction. Availability and implementation Data and code are available from https://github.com/iiiime/SPREd.
Collapse
Affiliation(s)
- Zijun Wu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
- H. Milton Steward School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
| |
Collapse
|
32
|
Liang Q, Huang Y, He S, Chen K. Pathway centric analysis for single-cell RNA-seq and spatial transcriptomics data with GSDensity. Nat Commun 2023; 14:8416. [PMID: 38110427 PMCID: PMC10728201 DOI: 10.1038/s41467-023-44206-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 12/04/2023] [Indexed: 12/20/2023] Open
Abstract
Advances in single-cell technology have enabled molecular dissection of heterogeneous biospecimens at unprecedented scales and resolutions. Cluster-centric approaches are widely applied in analyzing single-cell data, however they have limited power in dissecting and interpreting highly heterogenous, dynamically evolving data. Here, we present GSDensity, a graph-modeling approach that allows users to obtain pathway-centric interpretation and dissection of single-cell and spatial transcriptomics (ST) data without performing clustering. Using pathway gene sets, we show that GSDensity can accurately detect biologically distinct cells and reveal novel cell-pathway associations ignored by existing methods. Moreover, GSDensity, combined with trajectory analysis can identify curated pathways that are active at various stages of mouse brain development. Finally, GSDensity can identify spatially relevant pathways in mouse brains and human tumors including those following high-order organizational patterns in the ST data. Particularly, we create a pan-cancer ST map revealing spatially relevant and recurrently active pathways across six different tumor types.
Collapse
Affiliation(s)
- Qingnan Liang
- Department of Bioinformatics and Computational Biology, UT MD Anderson Cancer Center, Houston, TX, USA
| | - Yuefan Huang
- Department of Bioinformatics and Computational Biology, UT MD Anderson Cancer Center, Houston, TX, USA
| | - Shan He
- Department of Bioinformatics and Computational Biology, UT MD Anderson Cancer Center, Houston, TX, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, UT MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
33
|
Sadria M, Layton A, Bader GD. Adversarial training improves model interpretability in single-cell RNA-seq analysis. BIOINFORMATICS ADVANCES 2023; 3:vbad166. [PMID: 38099262 PMCID: PMC10719216 DOI: 10.1093/bioadv/vbad166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 09/28/2023] [Accepted: 11/22/2023] [Indexed: 12/17/2023]
Abstract
Motivation Predictive computational models must be accurate, robust, and interpretable to be considered reliable in important areas such as biology and medicine. A sufficiently robust model should not have its output affected significantly by a slight change in the input. Also, these models should be able to explain how a decision is made to support user trust in the results. Efforts have been made to improve the robustness and interpretability of predictive computational models independently; however, the interaction of robustness and interpretability is poorly understood. Results As an example task, we explore the computational prediction of cell type based on single-cell RNA-seq data and show that it can be made more robust by adversarially training a deep learning model. Surprisingly, we find this also leads to improved model interpretability, as measured by identifying genes important for classification using a range of standard interpretability methods. Our results suggest that adversarial training may be generally useful to improve deep learning robustness and interpretability and that it should be evaluated on a range of tasks. Availability and implementation Our Python implementation of all analysis in this publication can be found at: https://github.com/MehrshadSD/robustness-interpretability. The analysis was conducted using numPy 0.2.5, pandas 2.0.3, scanpy 1.9.3, tensorflow 2.10.0, matplotlib 3.7.1, seaborn 0.12.2, sklearn 1.1.1, shap 0.42.0, lime 0.2.0.1, matplotlib_venn 0.11.9.
Collapse
Affiliation(s)
- Mehrshad Sadria
- Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Anita Layton
- Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- Department of Biology, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- School of Pharmacy, University of Waterloo, Waterloo, Ontario N2G 1C5, Canada
| | - Gary D Bader
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- The Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- The Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario M5G 1X5, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario M5G 2M9, Canada
| |
Collapse
|
34
|
Wu Z, Sinha S. SPREd: A simulation-supervised neural network tool for gene regulatory network reconstruction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.09.566399. [PMID: 38014297 PMCID: PMC10680606 DOI: 10.1101/2023.11.09.566399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Reconstruction of gene regulatory networks (GRNs) from expression data is a significant open problem. Common approaches train a machine learning (ML) model to predict a gene's expression using transcription factors' (TFs') expression as features and designate important features/TFs as regulators of the gene. Here, we present an entirely different paradigm, where GRN edges are directly predicted by the ML model. The new approach, named "SPREd" is a simulation-supervised neural network for GRN inference. Its inputs comprise expression relationships (e.g., correlation, mutual information) between the target gene and each TF and between pairs of TFs. The output includes binary labels indicating whether each TF regulates the target gene. We train the neural network model using synthetic expression data generated by a biophysics-inspired simulation model that incorporates linear as well as non-linear TF-gene relationships and diverse GRN configurations. We show SPREd to outperform state-of-the-art GRN reconstruction tools GENIE3, ENNET, PORTIA and TIGRESS on synthetic datasets with high co-expression among TFs, similar to that seen in real data. A key advantage of the new approach is its robustness to relatively small numbers of conditions (columns) in the expression matrix, which is a common problem faced by existing methods. Finally, we evaluate SPREd on real data sets in yeast that represent gold standard benchmarks of GRN reconstruction and show it to perform significantly better than or comparably to existing methods. In addition to its high accuracy and speed, SPREd marks a first step towards incorporating biophysics principles of gene regulation into ML-based approaches to GRN reconstruction.
Collapse
Affiliation(s)
- Zijun Wu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
- H. Milton Steward School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30318, USA
| |
Collapse
|
35
|
Shojaee A, Huang SSC. Robust discovery of gene regulatory networks from single-cell gene expression data by Causal Inference Using Composition of Transactions. Brief Bioinform 2023; 24:bbad370. [PMID: 37897702 PMCID: PMC10612495 DOI: 10.1093/bib/bbad370] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 09/06/2023] [Accepted: 09/29/2023] [Indexed: 10/30/2023] Open
Abstract
Gene regulatory networks (GRNs) drive organism structure and functions, so the discovery and characterization of GRNs is a major goal in biological research. However, accurate identification of causal regulatory connections and inference of GRNs using gene expression datasets, more recently from single-cell RNA-seq (scRNA-seq), has been challenging. Here we employ the innovative method of Causal Inference Using Composition of Transactions (CICT) to uncover GRNs from scRNA-seq data. The basis of CICT is that if all gene expressions were random, a non-random regulatory gene should induce its targets at levels different from the background random process, resulting in distinct patterns in the whole relevance network of gene-gene associations. CICT proposes novel network features derived from a relevance network, which enable any machine learning algorithm to predict causal regulatory edges and infer GRNs. We evaluated CICT using simulated and experimental scRNA-seq data in a well-established benchmarking pipeline and showed that CICT outperformed existing network inference methods representing diverse approaches with many-fold higher accuracy. Furthermore, we demonstrated that GRN inference with CICT was robust to different levels of sparsity in scRNA-seq data, the characteristics of data and ground truth, the choice of association measure and the complexity of the supervised machine learning algorithm. Our results suggest aiming at directly predicting causality to recover regulatory relationships in complex biological networks substantially improves accuracy in GRN inference.
Collapse
Affiliation(s)
- Abbas Shojaee
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA
| | - Shao-shan Carol Huang
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA
| |
Collapse
|
36
|
Li H, Zhang Z, Squires M, Chen X, Zhang X. scMultiSim: simulation of single cell multi-omics and spatial data guided by gene regulatory networks and cell-cell interactions. RESEARCH SQUARE 2023:rs.3.rs-3301625. [PMID: 37790516 PMCID: PMC10543280 DOI: 10.21203/rs.3.rs-3301625/v1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Simulated single-cell data is essential for designing and evaluating computational methods in the absence of experimental ground truth. Existing simulators typically focus on modeling one or two specific biological factors or mechanisms that affect the output data, which limits their capacity to simulate the complexity and multi-modality in real data. Here, we present scMultiSim, an in silico simulator that generates multi-modal single-cell data, including gene expression, chromatin accessibility, RNA velocity, and spatial cell locations while accounting for the relationships between modalities. scMultiSim jointly models various biological factors that affect the output data, including cell identity, within-cell gene regulatory networks (GRNs), cell-cell interactions (CCIs), and chromatin accessibility, hile also incorporating technical noises. Moreover, it allows users to adjust each factor's effect easily. We validated scMultiSim's simulated biological effects and demonstrated its applications by benchmarking a wide range of computational tasks, including multi-modal and multi-batch data integration, RNA velocity estimation, GRN inference and CCI inference using spatially resolved gene expression data, many of them were not benchmarked before due to the lack of proper tools. Compared to existing simulators, scMultiSim can benchmark a much broader range of existing computational problems and even new potential tasks.
Collapse
Affiliation(s)
- Hechen Li
- Georgia Institute of Technology, Atlanta, USA
| | - Ziqi Zhang
- Georgia Institute of Technology, Atlanta, USA
| | | | - Xi Chen
- Southern University of Science and Technology, Shenzhen, China
| | | |
Collapse
|
37
|
Yang Y, Li G, Zhong Y, Xu Q, Chen BJ, Lin YT, Chapkin R, Cai JJ. Gene knockout inference with variational graph autoencoder learning single-cell gene regulatory networks. Nucleic Acids Res 2023; 51:6578-6592. [PMID: 37246643 PMCID: PMC10359630 DOI: 10.1093/nar/gkad450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 05/02/2023] [Accepted: 05/11/2023] [Indexed: 05/30/2023] Open
Abstract
In this paper, we introduce Gene Knockout Inference (GenKI), a virtual knockout (KO) tool for gene function prediction using single-cell RNA sequencing (scRNA-seq) data in the absence of KO samples when only wild-type (WT) samples are available. Without using any information from real KO samples, GenKI is designed to capture shifting patterns in gene regulation caused by the KO perturbation in an unsupervised manner and provide a robust and scalable framework for gene function studies. To achieve this goal, GenKI adapts a variational graph autoencoder (VGAE) model to learn latent representations of genes and interactions between genes from the input WT scRNA-seq data and a derived single-cell gene regulatory network (scGRN). The virtual KO data is then generated by computationally removing all edges of the KO gene-the gene to be knocked out for functional study-from the scGRN. The differences between WT and virtual KO data are discerned by using their corresponding latent parameters derived from the trained VGAE model. Our simulations show that GenKI accurately approximates the perturbation profiles upon gene KO and outperforms the state-of-the-art under a series of evaluation conditions. Using publicly available scRNA-seq data sets, we demonstrate that GenKI recapitulates discoveries of real-animal KO experiments and accurately predicts cell type-specific functions of KO genes. Thus, GenKI provides an in-silico alternative to KO experiments that may partially replace the need for genetically modified animals or other genetically perturbed systems.
Collapse
Affiliation(s)
- Yongjian Yang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Guanxun Li
- Department of Statistics, Texas A&M University, College Station, TX 77843, USA
| | - Yan Zhong
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, 3663 North Zhongshan Road, Shanghai 200062, China
| | - Qian Xu
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Bo-Jia Chen
- Graduate Institute of Microbiology and Public Health, College of Veterinary Medicine, National Chung Hsing University, Taichung 402, Taiwan
| | - Yu-Te Lin
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| | - Robert S Chapkin
- Program in Integrative & Complex Diseases, Department of Nutrition, Texas A&M University, College Station, TX 77843, USA
| | - James J Cai
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
- Interdisciplinary Program of Genetics, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
38
|
Li L, Sun L, Chen G, Wong CW, Ching WK, Liu ZP. LogBTF: gene regulatory network inference using Boolean threshold network model from single-cell gene expression data. Bioinformatics 2023; 39:btad256. [PMID: 37079737 PMCID: PMC10172039 DOI: 10.1093/bioinformatics/btad256] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 02/25/2023] [Accepted: 04/13/2023] [Indexed: 04/22/2023] Open
Abstract
MOTIVATION From a systematic perspective, it is crucial to infer and analyze gene regulatory network (GRN) from high-throughput single-cell RNA sequencing data. However, most existing GRN inference methods mainly focus on the network topology, only few of them consider how to explicitly describe the updated logic rules of regulation in GRNs to obtain their dynamics. Moreover, some inference methods also fail to deal with the over-fitting problem caused by the noise in time series data. RESULTS In this article, we propose a novel embedded Boolean threshold network method called LogBTF, which effectively infers GRN by integrating regularized logistic regression and Boolean threshold function. First, the continuous gene expression values are converted into Boolean values and the elastic net regression model is adopted to fit the binarized time series data. Then, the estimated regression coefficients are applied to represent the unknown Boolean threshold function of the candidate Boolean threshold network as the dynamical equations. To overcome the multi-collinearity and over-fitting problems, a new and effective approach is designed to optimize the network topology by adding a perturbation design matrix to the input data and thereafter setting sufficiently small elements of the output coefficient vector to zeros. In addition, the cross-validation procedure is implemented into the Boolean threshold network model framework to strengthen the inference capability. Finally, extensive experiments on one simulated Boolean value dataset, dozens of simulation datasets, and three real single-cell RNA sequencing datasets demonstrate that the LogBTF method can infer GRNs from time series data more accurately than some other alternative methods for GRN inference. AVAILABILITY AND IMPLEMENTATION The source data and code are available at https://github.com/zpliulab/LogBTF.
Collapse
Affiliation(s)
- Lingyu Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Liangjie Sun
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Guangyi Chen
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
| | - Chi-Wing Wong
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Wai-Ki Ching
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
| |
Collapse
|
39
|
Rommelfanger MK, Behrends M, Chen Y, Martinez J, Bens M, Xiong L, Rudolph KL, MacLean AL. Gene regulatory network inference with popInfer reveals dynamic regulation of hematopoietic stem cell quiescence upon diet restriction and aging. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.18.537360. [PMID: 37131596 PMCID: PMC10153203 DOI: 10.1101/2023.04.18.537360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Inference of gene regulatory networks (GRNs) can reveal cell state transitions from single-cell genomics data. However, obstacles to temporal inference from snapshot data are difficult to overcome. Single-nuclei multiomics data offer means to bridge this gap and derive temporal information from snapshot data using joint measurements of gene expression and chromatin accessibility in the same single cells. We developed popInfer to infer networks that characterize lineage-specific dynamic cell state transitions from joint gene expression and chromatin accessibility data. Benchmarking against alternative methods for GRN inference, we showed that popInfer achieves higher accuracy in the GRNs inferred. popInfer was applied to study single-cell multiomics data characterizing hematopoietic stem cells (HSCs) and the transition from HSC to a multipotent progenitor cell state during murine hematopoiesis across age and dietary conditions. From networks predicted by popInfer, we discovered gene interactions controlling entry to/exit from HSC quiescence that are perturbed in response to diet or aging.
Collapse
Affiliation(s)
- Megan K. Rommelfanger
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Marthe Behrends
- Research Group on Stem Cell and Metabolism Aging, Leibniz Institute on Aging, Fritz Lipmann Institute (FLI), Jena, Germany
| | - Yulin Chen
- Research Group on Stem Cell and Metabolism Aging, Leibniz Institute on Aging, Fritz Lipmann Institute (FLI), Jena, Germany
| | - Jonathan Martinez
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Martin Bens
- Core Facility Next Generation Sequencing, Leibniz Institute on Aging, Fritz Lipmann Institute (FLI), Jena, Germany
| | - Lingyun Xiong
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Department of Stem Cell Biology and Regenerative Medicine, Broad-CIRM Center, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - K. Lenhard Rudolph
- Research Group on Stem Cell and Metabolism Aging, Leibniz Institute on Aging, Fritz Lipmann Institute (FLI), Jena, Germany
- Medical Faculty, Jena University Hospital, Friedrich Schiller University, Jena, Germany
| | - Adam L. MacLean
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
40
|
Crowell HL, Morillo Leonardo SX, Soneson C, Robinson MD. The shaky foundations of simulating single-cell RNA sequencing data. Genome Biol 2023; 24:62. [PMID: 36991470 PMCID: PMC10061781 DOI: 10.1186/s13059-023-02904-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 03/20/2023] [Indexed: 03/31/2023] Open
Abstract
BACKGROUND With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant-on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data. RESULTS Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity. CONCLUSIONS Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons.
Collapse
Affiliation(s)
- Helena L Crowell
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | | | - Charlotte Soneson
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
- Current address: Friedrich Miescher Institute for Biomedical Research and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Mark D Robinson
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland.
| |
Collapse
|
41
|
Li H, Zhang Z, Squires M, Chen X, Zhang X. scMultiSim: simulation of multi-modality single cell data guided by cell-cell interactions and gene regulatory networks. RESEARCH SQUARE 2023:rs.3.rs-2675530. [PMID: 36993284 PMCID: PMC10055660 DOI: 10.21203/rs.3.rs-2675530/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Simulated single-cell data is essential for designing and evaluating computational methods in the absence of experimental ground truth. Existing simulators typically focus on modeling one or two specific biological factors or mechanisms that affect the output data, which limits their capacity to simulate the complexity and multi-modality in real data. Here, we present scMultiSim, an in silico simulator that generates multi-modal single-cell data, including gene expression, chromatin accessibility, RNA velocity, and spatial cell locations while accounting for the relationships between modalities. scMultiSim jointly models various biological factors that affect the output data, including cell identity, within-cell gene regulatory networks (GRNs), cell-cell interactions (CCIs), and chromatin accessibility, while also incorporating technical noises. Moreover, it allows users to adjust each factor's effect easily. We validated scMultiSim's simulated biological effects and demonstrated its applications by benchmarking a wide range of computational tasks, including cell clustering and trajectory inference, multi-modal and multi-batch data integration, RNA velocity estimation, GRN inference and CCI inference using spatially resolved gene expression data. Compared to existing simulators, scMultiSim can benchmark a much broader range of existing computational problems and even new potential tasks.
Collapse
Affiliation(s)
- Hechen Li
- Georgia Institute of Technology, Atlanta, USA
| | - Ziqi Zhang
- Georgia Institute of Technology, Atlanta, USA
| | | | - Xi Chen
- Southern University of Science and Technology, China
| | | |
Collapse
|
42
|
Ventre E, Herbach U, Espinasse T, Benoit G, Gandrillon O. One model fits all: Combining inference and simulation of gene regulatory networks. PLoS Comput Biol 2023; 19:e1010962. [PMID: 36972296 PMCID: PMC10079230 DOI: 10.1371/journal.pcbi.1010962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 04/06/2023] [Accepted: 02/17/2023] [Indexed: 03/29/2023] Open
Abstract
The rise of single-cell data highlights the need for a nondeterministic view of gene expression, while offering new opportunities regarding gene regulatory network inference. We recently introduced two strategies that specifically exploit time-course data, where single-cell profiling is performed after a stimulus: HARISSA, a mechanistic network model with a highly efficient simulation procedure, and CARDAMOM, a scalable inference method seen as model calibration. Here, we combine the two approaches and show that the same model driven by transcriptional bursting can be used simultaneously as an inference tool, to reconstruct biologically relevant networks, and as a simulation tool, to generate realistic transcriptional profiles emerging from gene interactions. We verify that CARDAMOM quantitatively reconstructs causal links when the data is simulated from HARISSA, and demonstrate its performance on experimental data collected on in vitro differentiating mouse embryonic stem cells. Overall, this integrated strategy largely overcomes the limitations of disconnected inference and simulation.
Collapse
Affiliation(s)
- Elias Ventre
- Laboratoire de Biologie et Modélisation de la Cellule, École Normale Supérieure de Lyon, CNRS, UMR 5239, Inserm, U1293, Université Claude Bernard Lyon 1, Lyon, France
- Inria Center Grenoble Rhône-Alpes, Équipe Dracula, Villeurbanne, France
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5208, Institut Camille Jordan, Villeurbanne, France
| | - Ulysse Herbach
- Université de Lorraine, CNRS, Inria, IECL, Nancy, France
| | - Thibault Espinasse
- Inria Center Grenoble Rhône-Alpes, Équipe Dracula, Villeurbanne, France
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5208, Institut Camille Jordan, Villeurbanne, France
| | - Gérard Benoit
- Laboratoire de Biologie et Modélisation de la Cellule, École Normale Supérieure de Lyon, CNRS, UMR 5239, Inserm, U1293, Université Claude Bernard Lyon 1, Lyon, France
| | - Olivier Gandrillon
- Laboratoire de Biologie et Modélisation de la Cellule, École Normale Supérieure de Lyon, CNRS, UMR 5239, Inserm, U1293, Université Claude Bernard Lyon 1, Lyon, France
- Inria Center Grenoble Rhône-Alpes, Équipe Dracula, Villeurbanne, France
| |
Collapse
|
43
|
Oubounyt M, Elkjaer ML, Laske T, Grønning AB, Moeller M, Baumbach J. De-novo reconstruction and identification of transcriptional gene regulatory network modules differentiating single-cell clusters. NAR Genom Bioinform 2023; 5:lqad018. [PMID: 36879901 PMCID: PMC9985332 DOI: 10.1093/nargab/lqad018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 01/16/2023] [Accepted: 02/09/2023] [Indexed: 03/07/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technology provides an unprecedented opportunity to understand gene functions and interactions at single-cell resolution. While computational tools for scRNA-seq data analysis to decipher differential gene expression profiles and differential pathway expression exist, we still lack methods to learn differential regulatory disease mechanisms directly from the single-cell data. Here, we provide a new methodology, named DiNiro, to unravel such mechanisms de novo and report them as small, easily interpretable transcriptional regulatory network modules. We demonstrate that DiNiro is able to uncover novel, relevant, and deep mechanistic models that not just predict but explain differential cellular gene expression programs. DiNiro is available at https://exbio.wzw.tum.de/diniro/.
Collapse
Affiliation(s)
- Mhaned Oubounyt
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Maria L Elkjaer
- Department of Neurology, Odense University Hospital, Odense, Denmark
- Institute of Clinical Research, University of Southern Denmark, Odense, Denmark
- Institute of Molecular Medicine, University of Southern Denmark, Odense, Denmark
| | - Tanja Laske
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Alexander G B Grønning
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Marcus J Moeller
- Heisenberg Chair of Preventive and Translational Nephrology, Department of Nephrology, Rheumatology and Clinical Immunology, RWTH Aachen University, Aachen, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
44
|
van der Sande M, Frölich S, van Heeringen SJ. Computational approaches to understand transcription regulation in development. Biochem Soc Trans 2023; 51:1-12. [PMID: 36695505 PMCID: PMC9988001 DOI: 10.1042/bst20210145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/07/2023] [Accepted: 01/13/2023] [Indexed: 01/26/2023]
Abstract
Gene regulatory networks (GRNs) serve as useful abstractions to understand transcriptional dynamics in developmental systems. Computational prediction of GRNs has been successfully applied to genome-wide gene expression measurements with the advent of microarrays and RNA-sequencing. However, these inferred networks are inaccurate and mostly based on correlative rather than causative interactions. In this review, we highlight three approaches that significantly impact GRN inference: (1) moving from one genome-wide functional modality, gene expression, to multi-omics, (2) single cell sequencing, to measure cell type-specific signals and predict context-specific GRNs, and (3) neural networks as flexible models. Together, these experimental and computational developments have the potential to significantly impact the quality of inferred GRNs. Ultimately, accurately modeling the regulatory interactions between transcription factors and their target genes will be essential to understand the role of transcription factors in driving developmental gene expression programs and to derive testable hypotheses for validation.
Collapse
Affiliation(s)
| | | | - Simon J. van Heeringen
- Radboud University, Department of Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, 6525GA Nijmegen, The Netherlands
| |
Collapse
|
45
|
Zhang J, Singh R. Investigating the Complexity of Gene Co-expression Estimation for Single-cell Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.24.525447. [PMID: 36747724 PMCID: PMC9900775 DOI: 10.1101/2023.01.24.525447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
With the rapid advance of single-cell RNA sequencing (scRNA-seq) technology, understanding biological processes at a more refined single-cell level is becoming possible. Gene co-expression estimation is an essential step in this direction. It can annotate functionalities of unknown genes or construct the basis of gene regulatory network inference. This study thoroughly tests the existing gene co-expression estimation methods on simulation datasets with known ground truth co-expression networks. We generate these novel datasets using two simulation processes that use the parameters learned from the experimental data. We demonstrate that these simulations better capture the underlying properties of the real-world single-cell datasets than previously tested simulations for the task. Our performance results on tens of simulated and eight experimental datasets show that all methods produce estimations with a high false discovery rate potentially caused by high-sparsity levels in the data. Finally, we find that commonly used pre-processing approaches, such as normalization and imputation, do not improve the co-expression estimation. Overall, our benchmark setup contributes to the co-expression estimator development, and our study provides valuable insights for the community of single-cell data analyses.
Collapse
Affiliation(s)
- Jiaqi Zhang
- Department of Computer Science, Brown University
| | - Ritambhara Singh
- Department of Computer Science, Center for Computational Molecular Biology, Brown University
| |
Collapse
|
46
|
Sun L, Wang G, Zhang Z. SimCH: simulation of single-cell RNA sequencing data by modeling cellular heterogeneity at gene expression level. Brief Bioinform 2023; 24:6961608. [PMID: 36575569 DOI: 10.1093/bib/bbac590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 11/08/2022] [Accepted: 12/02/2022] [Indexed: 12/29/2022] Open
Abstract
Single-cell ribonucleic acid (RNA) sequencing (scRNA-seq) has been a powerful technology for transcriptome analysis. However, the systematic validation of diverse computational tools used in scRNA-seq analysis remains challenging. Here, we propose a novel simulation tool, termed as Simulation of Cellular Heterogeneity (SimCH), for the flexible and comprehensive assessment of scRNA-seq computational methods. The Gaussian Copula framework is recruited to retain gene coexpression of experimental data shown to be associated with cellular heterogeneity. The synthetic count matrices generated by suitable SimCH modes closely match experimental data originating from either homogeneous or heterogeneous cell populations and either unique molecular identifier (UMI)-based or non-UMI-based techniques. We demonstrate how SimCH can benchmark several types of computational methods, including cell clustering, discovery of differentially expressed genes, trajectory inference, batch correction and imputation. Moreover, we show how SimCH can be used to conduct power evaluation of cell clustering methods. Given these merits, we believe that SimCH can accelerate single-cell research.
Collapse
Affiliation(s)
- Lei Sun
- School of Information Engineering, Yangzhou University, Yangzhou, P.R. China.,School of Artificial Intelligence, Yangzhou University, Yangzhou, P.R. China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation, Beijing, P.R. China
| | - Gongming Wang
- School of Information Engineering, Yangzhou University, Yangzhou, P.R. China.,School of Artificial Intelligence, Yangzhou University, Yangzhou, P.R. China.,China Unicom Software Research Institute Jinan Branch, Jinan, P.R. China
| | - Zhihua Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation, Beijing, P.R. China.,School of Life Science, University of Chinese Academy of Sciences, Beijing, P.R. China
| |
Collapse
|
47
|
Shu H, Ding F, Zhou J, Xue Y, Zhao D, Zeng J, Ma J. Boosting single-cell gene regulatory network reconstruction via bulk-cell transcriptomic data. Brief Bioinform 2022; 23:6693602. [PMID: 36070863 DOI: 10.1093/bib/bbac389] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 08/09/2022] [Accepted: 08/11/2022] [Indexed: 11/12/2022] Open
Abstract
Computational recovery of gene regulatory network (GRN) has recently undergone a great shift from bulk-cell towards designing algorithms targeting single-cell data. In this work, we investigate whether the widely available bulk-cell data could be leveraged to assist the GRN predictions for single cells. We infer cell-type-specific GRNs from both the single-cell RNA sequencing data and the generic GRN derived from the bulk cells by constructing a weakly supervised learning framework based on the axial transformer. We verify our assumption that the bulk-cell transcriptomic data are a valuable resource, which could improve the prediction of single-cell GRN by conducting extensive experiments. Our GRN-transformer achieves the state-of-the-art prediction accuracy in comparison to existing supervised and unsupervised approaches. In addition, we show that our method can identify important transcription factors and potential regulations for Alzheimer's disease risk genes by using the predicted GRN. Availability: The implementation of GRN-transformer is available at https://github.com/HantaoShu/GRN-Transformer.
Collapse
Affiliation(s)
- Hantao Shu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Fan Ding
- Department of Computer Science, Purdue University, IN 47907, United States
| | - Jingtian Zhou
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States.,Bioinformatics Program, University of California, San Diego, La Jolla, CA 92093, United States
| | - Yexiang Xue
- Department of Computer Science, Purdue University, IN 47907, United States
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Jianzhu Ma
- Institute for Artificial Intelligence, Peking University, Beijing 100091, China
| |
Collapse
|
48
|
Nagaharu K, Kojima Y, Hirose H, Minoura K, Hinohara K, Minami H, Kageyama Y, Sugimoto Y, Masuya M, Nii S, Seki M, Suzuki Y, Tawara I, Shimamura T, Katayama N, Nishikawa H, Ohishi K. A bifurcation concept for B-lymphoid/plasmacytoid dendritic cells with largely fluctuating transcriptome dynamics. Cell Rep 2022; 40:111260. [PMID: 36044861 DOI: 10.1016/j.celrep.2022.111260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 06/02/2022] [Accepted: 08/04/2022] [Indexed: 11/24/2022] Open
Abstract
Hematopoiesis was considered a hierarchical stepwise process but was revised to a continuous process following single-cell RNA sequencing. However, the uncertainty or fluctuation of single-cell transcriptome dynamics during differentiation was not considered, and the dendritic cell (DC) pathway in the lymphoid context remains unclear. Here, we identify human B-plasmacytoid DC (pDC) bifurcation as large fluctuating transcriptome dynamics in the putative B/NK progenitor region by dry and wet methods. By converting splicing kinetics into diffusion dynamics in a deep generative model, our original computational methodology reveals strong fluctuation at B/pDC bifurcation in IL-7Rα+ regions, and LFA-1 fluctuates positively in the pDC direction at the bifurcation. These expectancies are validated by the presence of B/pDC progenitors in the IL-7Rα+ fraction and preferential expression of LFA-1 in pDC-biased progenitors with a niche-like culture system. We provide a model of fluctuation-based differentiation, which reconciles continuous and discrete models and is applicable to other developmental systems.
Collapse
Affiliation(s)
- Keiki Nagaharu
- Department of Hematology and Oncology, Mie University Graduate School of Medicine, Tsu 514-8507, Japan
| | - Yasuhiro Kojima
- Division of Systems Biology, Nagoya University Graduate School of Medicine, Nagoya 466-8550, Japan
| | - Haruka Hirose
- Division of Systems Biology, Nagoya University Graduate School of Medicine, Nagoya 466-8550, Japan
| | - Kodai Minoura
- Division of Systems Biology, Nagoya University Graduate School of Medicine, Nagoya 466-8550, Japan
| | - Kunihiko Hinohara
- Department of Immunology, Nagoya University Graduate School of Medicine, Nagoya 466-8550, Japan; Institute for Advanced Research, Nagoya University, Nagoya, Japan
| | - Hirohito Minami
- Department of Hematology and Oncology, Mie University Graduate School of Medicine, Tsu 514-8507, Japan
| | - Yuki Kageyama
- Department of Hematology and Oncology, Mie University Graduate School of Medicine, Tsu 514-8507, Japan
| | - Yuka Sugimoto
- Department of Hematology and Oncology, Mie University Graduate School of Medicine, Tsu 514-8507, Japan
| | - Masahiro Masuya
- Department of Hematology and Oncology, Mie University Graduate School of Medicine, Tsu 514-8507, Japan
| | - Shigeru Nii
- Shiroko Women's Hospital, Suzuka 510-0235, Japan
| | - Masahide Seki
- Department of Computational Biology and Medical Sciences, The University of Tokyo, Kashiwa 277-8561, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, The University of Tokyo, Kashiwa 277-8561, Japan
| | - Isao Tawara
- Department of Hematology and Oncology, Mie University Graduate School of Medicine, Tsu 514-8507, Japan
| | - Teppei Shimamura
- Division of Systems Biology, Nagoya University Graduate School of Medicine, Nagoya 466-8550, Japan; Institute for Advanced Research, Nagoya University, Nagoya, Japan
| | - Naoyuki Katayama
- Department of Hematology and Oncology, Mie University Graduate School of Medicine, Tsu 514-8507, Japan
| | - Hiroyoshi Nishikawa
- Department of Immunology, Nagoya University Graduate School of Medicine, Nagoya 466-8550, Japan; Institute for Advanced Research, Nagoya University, Nagoya, Japan; Division of Cancer Immunology, Research Institute, National Cancer Center, Tokyo 104-0045, Japan; Division of Cancer Immunology, Exploratory Oncology Research and Clinical Trial Center (EPOC), National Cancer Center, Chiba 277-8577, Japan.
| | - Kohshi Ohishi
- Department of Transfusion Medicine and Cell Therapy, Mie University Hospital, Tsu 514-8507, Japan.
| |
Collapse
|
49
|
Pan X, Li H, Zhang X. TedSim: temporal dynamics simulation of single-cell RNA sequencing data and cell division history. Nucleic Acids Res 2022; 50:4272-4288. [PMID: 35412632 PMCID: PMC9071466 DOI: 10.1093/nar/gkac235] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 03/23/2022] [Accepted: 03/31/2022] [Indexed: 11/18/2022] Open
Abstract
Recently, lineage tracing technology using CRISPR/Cas9 genome editing has enabled simultaneous readouts of gene expressions and lineage barcodes, which allows for the reconstruction of the cell division tree and makes it possible to reconstruct ancestral cell types and trace the origin of each cell type. Meanwhile, trajectory inference methods are widely used to infer cell trajectories and pseudotime in a dynamic process using gene expression data of present-day cells. Here, we present TedSim (single-cell temporal dynamics simulator), which simulates the cell division events from the root cell to present-day cells, simultaneously generating two data modalities for each single cell: the lineage barcode and gene expression data. TedSim is a framework that connects the two problems: lineage tracing and trajectory inference. Using TedSim, we conducted analysis to show that (i) TedSim generates realistic gene expression and barcode data, as well as realistic relationships between these two data modalities; (ii) trajectory inference methods can recover the underlying cell state transition mechanism with balanced cell type compositions; and (iii) integrating gene expression and barcode data can provide more insights into the temporal dynamics in cell differentiation compared to using only one type of data, but better integration methods need to be developed.
Collapse
Affiliation(s)
- Xinhai Pan
- School of Computational Science and Engineering, College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Hechen Li
- School of Computational Science and Engineering, College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Xiuwei Zhang
- School of Computational Science and Engineering, College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
50
|
Osorio D, Zhong Y, Li G, Xu Q, Yang Y, Tian Y, Chapkin RS, Huang JZ, Cai JJ. scTenifoldKnk: An efficient virtual knockout tool for gene function predictions via single-cell gene regulatory network perturbation. PATTERNS (NEW YORK, N.Y.) 2022; 3:100434. [PMID: 35510185 PMCID: PMC9058914 DOI: 10.1016/j.patter.2022.100434] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 11/13/2021] [Accepted: 01/04/2022] [Indexed: 11/20/2022]
Abstract
Gene knockout (KO) experiments are a proven, powerful approach for studying gene function. However, systematic KO experiments targeting a large number of genes are usually prohibitive due to the limit of experimental and animal resources. Here, we present scTenifoldKnk, an efficient virtual KO tool that enables systematic KO investigation of gene function using data from single-cell RNA sequencing (scRNA-seq). In scTenifoldKnk analysis, a gene regulatory network (GRN) is first constructed from scRNA-seq data of wild-type samples, and a target gene is then virtually deleted from the constructed GRN. Manifold alignment is used to align the resulting reduced GRN to the original GRN to identify differentially regulated genes, which are used to infer target gene functions in analyzed cells. We demonstrate that the scTenifoldKnk-based virtual KO analysis recapitulates the main findings of real-animal KO experiments and recovers the expected functions of genes in relevant cell types.
Collapse
Affiliation(s)
- Daniel Osorio
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Yan Zhong
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai 200062, China
| | - Guanxun Li
- Department of Statistics, Texas A&M University, College Station, TX 77843, USA
| | - Qian Xu
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Yongjian Yang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Yanan Tian
- Department of Veterinary Physiology and Pharmacology, Texas A&M University, College Station, TX 77843, USA
| | - Robert S. Chapkin
- Department of Nutrition, Texas A&M University, College Station, TX 77843, USA
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, TX 77843, USA
| | - Jianhua Z. Huang
- Department of Statistics, Texas A&M University, College Station, TX 77843, USA
- School of Data Science, The Chinese University of Hong Kong, Shenzhen, Guangdong 518172, China
| | - James J. Cai
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
- Interdisciplinary Program of Genetics, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|