1
|
da Silva JEH, Bernardino HS, de Oliveira IL, Camata JJ. A survey of the methodological process of modeling, inference, and evaluation of gene regulatory networks using scRNA-Seq data. Biosystems 2025; 253:105464. [PMID: 40409400 DOI: 10.1016/j.biosystems.2025.105464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 03/20/2025] [Accepted: 04/17/2025] [Indexed: 05/25/2025]
Abstract
The advent of scRNA-Seq sequencing technology has provided unprecedented resolutions in the analysis of gene regulatory networks (GRNs) at the single-cell level. However, new technical and methodological challenges also emerged. Factors such as the large number of zeros reported in expression levels, the biological variation due to the stochastic nature of gene expression, environmental niche, and effects created by the cell cycle make it difficult to correctly interpret the data obtained in the sequencing stage. On the other hand, the development of methods for the inference of GRNs, specifically using scRNA-Seq technology, proved to be of similar quality to random predictors. The lack of adequate pre-processing of gene expression data, including selection steps for subsets of genes of interest, smoothing, and discretization of gene expression, in addition to the different ways of modeling networks and network motifs, are factors that affect the performance of inference approaches. Finally, the lack of knowledge about the ground-truth network and the non-standardization of appropriate metrics to measure the quality of inferred networks make the process of comparing performance between algorithms a major problem, given the unbalanced nature of the data and the interpretation bias caused by the chosen metric. This article brings these issues to light, aiming to show how these factors influence both the inference process and the performance evaluation of inferred networks, through comparative computational experiments and provides suggestions for a more robust methodological process for researchers dealing with inference of GRNs.
Collapse
Affiliation(s)
- José Eduardo H da Silva
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil.
| | - Heder S Bernardino
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil
| | - Itamar L de Oliveira
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil
| | - José J Camata
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil
| |
Collapse
|
2
|
Zhao W, Larschan E, Sandstede B, Singh R. Optimal transport reveals dynamic gene regulatory networks via gene velocity estimation. PLoS Comput Biol 2025; 21:e1012476. [PMID: 40341271 DOI: 10.1371/journal.pcbi.1012476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Accepted: 04/10/2025] [Indexed: 05/10/2025] Open
Abstract
Inferring gene regulatory networks from gene expression data is an important and challenging problem in the biology community. We propose OTVelo, a methodology that takes time-stamped single-cell gene expression data as input and predicts gene regulation across two time points. It is known that the rate of change of gene expression, which we will refer to as gene velocity, provides crucial information that enhances such inference; however, this information is not always available due to the limitations in sequencing depth. Our algorithm overcomes this limitation by estimating gene velocities using optimal transport. We then infer gene regulation using time-lagged correlation and Granger causality via regularized linear regression. Instead of providing an aggregated network across all time points, our method uncovers the underlying dynamical mechanism across time points. We validate our algorithm on 13 simulated datasets with both synthetic and curated networks and demonstrate its efficacy on 9 experimental data sets.
Collapse
Affiliation(s)
- Wenjun Zhao
- Division of Applied Mathematics, Brown University, Providence, Rhode Island, United States of America
- Department of Mathematics, University of British Columbia, Vancouver, Canada
| | - Erica Larschan
- Department of Molecular Biology, Cell Biology and Biochemistry, Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| | - Björn Sandstede
- Division of Applied Mathematics, Brown University, Providence, Rhode Island, United States of America
| | - Ritambhara Singh
- Department of Computer Science, Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
3
|
Pan Q, Ding L, Hladyshau S, Yao X, Zhou J, Yan L, Dhungana Y, Shi H, Qian C, Dong X, Burdyshaw C, Veloso JP, Khatamian A, Xie Z, Risch I, Yang X, Yang J, Huang X, Fang J, Jain A, Jain A, Rusch M, Brewer M, Peng J, Yan KK, Chi H, Yu J. scMINER: a mutual information-based framework for clustering and hidden driver inference from single-cell transcriptomics data. Nat Commun 2025; 16:4305. [PMID: 40341143 PMCID: PMC12062461 DOI: 10.1038/s41467-025-59620-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2025] [Accepted: 04/28/2025] [Indexed: 05/10/2025] Open
Abstract
Single-cell transcriptomics data present challenges due to their inherent stochasticity and sparsity, complicating both cell clustering and cell type-specific network inference. To address these challenges, we introduce scMINER (single-cell Mutual Information-based Network Engineering Ranger), an integrative framework for unsupervised cell clustering, transcription factor and signaling protein network inference, and identification of hidden drivers from single-cell transcriptomic data. scMINER demonstrates superior accuracy in cell clustering, outperforming five state-of-the-art algorithms and excelling in distinguishing closely related cell populations. For network inference, scMINER outperforms three established methods, as validated by ATAC-seq and CROP-seq. In particular, it surpasses SCENIC in revealing key transcription factor drivers involved in T cell exhaustion and Treg tissue specification. Moreover, scMINER enables the inference of signaling protein networks and drivers with high accuracy, which presents an advantage in multimodal single cell data analysis. In addition, we establish scMINER Portal, an interactive visualization tool to facilitate exploration of scMINER results.
Collapse
Affiliation(s)
- Qingfei Pan
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Liang Ding
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Siarhei Hladyshau
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xiangyu Yao
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Jiayu Zhou
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Lei Yan
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Yogesh Dhungana
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Graduate School of Biomedical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Hao Shi
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Chenxi Qian
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xinran Dong
- Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai, 201102, P.R. China
| | - Chad Burdyshaw
- Department of Information Services, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Joao Pedro Veloso
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Alireza Khatamian
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Zhen Xie
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Physiology, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
| | - Isabel Risch
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xu Yang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Jiyuan Yang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xin Huang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Precision Research Center for Refractory Diseases, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 201620, China
| | - Jason Fang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Anuj Jain
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Arihant Jain
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Michael Rusch
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Michael Brewer
- Department of Information Services, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Junmin Peng
- Department of Structural Biology and Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Koon-Kiu Yan
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Hongbo Chi
- Graduate School of Biomedical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Jiyang Yu
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
- Graduate School of Biomedical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
| |
Collapse
|
4
|
Wang JC, Chen YJ, Zou Q. GRACE: Unveiling Gene Regulatory Networks With Causal Mechanistic Graph Neural Networks in Single-Cell RNA-Sequencing Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:9005-9017. [PMID: 38896510 DOI: 10.1109/tnnls.2024.3412753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Reconstructing gene regulatory networks (GRNs) using single-cell RNA sequencing (scRNA-seq) data holds great promise for unraveling cellular fate development and heterogeneity. While numerous machine-learning methods have been proposed to infer GRNs from scRNA-seq gene expression data, many of them operate solely in a statistical or black box manner, limiting their capacity for making causal inferences between genes. In this study, we introduce GRN inference with Accuracy and Causal Explanation (GRACE), a novel graph-based causal autoencoder framework that combines a structural causal model (SCM) with graph neural networks (GNNs) to enable GRN inference and gene causal reasoning from scRNA-seq data. By explicitly modeling causal relationships between genes, GRACE facilitates the learning of regulatory context and gene embeddings. With the learned gene signals, our model successfully decoding the causal structures and alleviates the accurate determination of multiple attributes of gene regulation that is important to determine the regulatory levels. Through extensive evaluations on seven benchmarks, we demonstrate that GRACE outperforms 14 state-of-the-art GRN inference methods, with the incorporation of causal mechanisms significantly enhancing the accuracy of GRN and gene causality inference. Furthermore, the application to human peripheral blood mononuclear cell (PBMC) samples reveals cell type-specific regulators in monocyte phagocytosis and immune regulation, validated through network analysis and functional enrichment analysis.
Collapse
|
5
|
Li H, Zhang Z, Squires M, Chen X, Zhang X. scMultiSim: simulation of single-cell multi-omics and spatial data guided by gene regulatory networks and cell-cell interactions. Nat Methods 2025; 22:982-993. [PMID: 40247122 DOI: 10.1038/s41592-025-02651-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Accepted: 03/07/2025] [Indexed: 04/19/2025]
Abstract
Simulated single-cell data are essential for designing and evaluating computational methods in the absence of experimental ground truth. Here we present scMultiSim, a comprehensive simulator that generates multimodal single-cell data encompassing gene expression, chromatin accessibility, RNA velocity and spatial cell locations while accounting for the relationships between modalities. Unlike existing tools that focus on limited biological factors, scMultiSim simultaneously models cell identity, gene regulatory networks, cell-cell interactions and chromatin accessibility while incorporating technical noise. Moreover, it allows users to adjust each factor's effect easily. Here we show that scMultiSim generates data with expected biological effects, and demonstrate its applications by benchmarking a wide range of computational tasks, including multimodal and multi-batch data integration, RNA velocity estimation, gene regulatory network inference and cell-cell interaction inference using spatially resolved gene expression data. Compared to existing simulators, scMultiSim can benchmark a much broader range of existing computational problems and even new potential tasks.
Collapse
Affiliation(s)
- Hechen Li
- Georgia Institute of Technology, Atlanta, GA, USA
| | - Ziqi Zhang
- Georgia Institute of Technology, Atlanta, GA, USA
| | | | - Xi Chen
- Southern University of Science and Technology, Shenzhen, China
| | - Xiuwei Zhang
- Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
6
|
Su G, Wang H, Zhang Y, Wilkins MR, Canete PF, Yu D, Yang Y, Zhang W. Inferring gene regulatory networks by hypergraph generative model. CELL REPORTS METHODS 2025; 5:101026. [PMID: 40220759 DOI: 10.1016/j.crmeth.2025.101026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 01/16/2025] [Accepted: 03/20/2025] [Indexed: 04/14/2025]
Abstract
We present hypergraph variational autoencoder (HyperG-VAE), a Bayesian deep generative model that leverages hypergraph representation to model single-cell RNA sequencing (scRNA-seq) data. The model features a cell encoder with a structural equation model to account for cellular heterogeneity and construct gene regulatory networks (GRNs) alongside a gene encoder using hypergraph self-attention to identify gene modules. The synergistic optimization of encoders via a decoder improves GRN inference, single-cell clustering, and data visualization, as validated by benchmarks. HyperG-VAE effectively uncovers gene regulation patterns and demonstrates robustness in downstream analyses, as shown in B cell development data from bone marrow. Gene set enrichment analysis of overlapping genes in predicted GRNs confirms the gene encoder's role in refining GRN inference. Offering an efficient solution for scRNA-seq analysis and GRN construction, HyperG-VAE also holds the potential for extending GRN modeling to temporal and multimodal single-cell omics.
Collapse
Affiliation(s)
- Guangxin Su
- School of Computer Science and Engineering, The University of New South Wales, Sydney, NSW, Australia; ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems (MACSYS), Melbourne, VIC, Australia
| | - Hanchen Wang
- ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems (MACSYS), Melbourne, VIC, Australia; Australian Artificial Intelligence Institute, The University of Technology Sydney, Sydney, NSW, Australia
| | - Ying Zhang
- School of Computer Science and Technology, Zhejiang Gongshang University, Zhejiang, China
| | - Marc R Wilkins
- ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems (MACSYS), Melbourne, VIC, Australia; Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW, Australia
| | - Pablo F Canete
- Frazer Institute, Faculty of Health, Medicine and Behaviour Sciences, The University of Queensland, Brisbane, QLD, Australia
| | - Di Yu
- Frazer Institute, Faculty of Health, Medicine and Behaviour Sciences, The University of Queensland, Brisbane, QLD, Australia; Ian Frazer Centre for Children's Immunotherapy Research, Child Health Research Centre, Faculty of Health, Medicine and Behaviour Sciences, The University of Queensland, Brisbane, QLD, Australia
| | - Yang Yang
- Frazer Institute, Faculty of Health, Medicine and Behaviour Sciences, The University of Queensland, Brisbane, QLD, Australia.
| | - Wenjie Zhang
- School of Computer Science and Engineering, The University of New South Wales, Sydney, NSW, Australia; ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems (MACSYS), Melbourne, VIC, Australia.
| |
Collapse
|
7
|
Abe H, Lin P, Zhou D, Ruderfer DM, Gamazon ER. Mapping dynamic regulation of gene expression using single-cell transcriptomics and application to complex disease genetics. HGG ADVANCES 2025; 6:100397. [PMID: 39741416 PMCID: PMC11830375 DOI: 10.1016/j.xhgg.2024.100397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 12/24/2024] [Accepted: 12/24/2024] [Indexed: 01/03/2025] Open
Abstract
Single-cell transcriptome data can provide insights into how genetic variation influences biological processes involved in human physiology and disease. However, the identification of gene-level associations in distinct cell types faces several challenges, including the limited reference resources from population-scale studies, data sparsity in single-cell RNA sequencing, and the complex cell state pattern of expression within individual cell types. Here, we develop genetic models of cell-type-specific and cell-state-adjusted gene expression in mid-brain neurons undergoing differentiation from induced pluripotent stem cells. The resulting framework quantifies the dynamics of the genetic regulation of gene expression and estimates its cell-type specificity. As an application, we show that the approach detects known and new genes associated with schizophrenia and enables insights into context-dependent disease mechanisms. We provide a genomic resource from a phenome-wide application of our models to more than 1,500 phenotypes from the UK Biobank. Using longitudinal, genetically determined expression, we implement a predictive causality framework, evaluating the prediction of future values of a target gene expression using prior values of a putative regulatory gene. Collectively, the results of this work demonstrate the insights that can be gained into the molecular underpinnings of disease by quantifying the genetic control of gene expression at single-cell resolution.
Collapse
Affiliation(s)
- Hanna Abe
- Vanderbilt University, Nashville, TN, USA.
| | - Phillip Lin
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Dan Zhou
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Douglas M Ruderfer
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Biomedical Informatics and Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Eric R Gamazon
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Clare Hall, University of Cambridge, Cambridge, UK.
| |
Collapse
|
8
|
Chen L, Dautle M, Gao R, Zhang S, Chen Y. Inferring gene regulatory networks from time-series scRNA-seq data via GRANGER causal recurrent autoencoders. Brief Bioinform 2025; 26:bbaf089. [PMID: 40062616 PMCID: PMC11891664 DOI: 10.1093/bib/bbaf089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 01/26/2025] [Accepted: 02/18/2025] [Indexed: 05/13/2025] Open
Abstract
The development of single-cell RNA sequencing (scRNA-seq) technology provides valuable data resources for inferring gene regulatory networks (GRNs), enabling deeper insights into cellular mechanisms and diseases. While many methods exist for inferring GRNs from static scRNA-seq data, current approaches face challenges in accurately handling time-series scRNA-seq data due to high noise levels and data sparsity. The temporal dimension introduces additional complexity by requiring models to capture dynamic changes, increasing sensitivity to noise, and exacerbating data sparsity across time points. In this study, we introduce GRANGER, an unsupervised deep learning-based method that integrates multiple advanced techniques, including a recurrent variational autoencoder, GRANGER causality, sparsity-inducing penalties, and negative binomial (NB)-based loss functions, to infer GRNs. GRANGER was evaluated using multiple popular benchmarking datasets, where it demonstrated superior performance compared to eight well-known GRN inference methods. The integration of a NB-based loss function and sparsity-inducing penalties in GRANGER significantly enhanced its capacity to address dropout noise and sparsity in scRNA-seq data. Additionally, GRANGER exhibited robustness against high levels of dropout noise. We applied GRANGER to scRNA-seq data from the whole mouse brain obtained through the BRAIN Initiative project and identified GRNs for five transcription regulators: E2f7, Gbx1, Sox10, Prox1, and Onecut2, which play crucial roles in diverse brain cell types. The inferred GRNs not only recalled many known regulatory relationships but also revealed sets of novel regulatory interactions with functional potential. These findings demonstrate that GRANGER is a highly effective tool for real-world applications in discovering novel gene regulatory relationships.
Collapse
Affiliation(s)
- Liang Chen
- College of Computer and Information Engineering, Tianjin Normal University, 393 Binshui W Ave, Tianjin, Tianjin 300387, China
| | - Madison Dautle
- Department of Biological and Biomedical Sciences, Rowan University, 201 Mullica Hill Road, Glassboro, NJ 08028, United States
| | - Ruoying Gao
- College of Computer and Information Engineering, Tianjin Normal University, 393 Binshui W Ave, Tianjin, Tianjin 300387, China
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, 393 Binshui W Ave, Tianjin, Tianjin 300387, China
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, 201 Mullica Hill Road, Glassboro, NJ 08028, United States
| |
Collapse
|
9
|
Wang C, Liu ZP. Diffusion-based generation of gene regulatory networks from scRNA-seq data with DigNet. Genome Res 2025; 35:340-354. [PMID: 39694856 PMCID: PMC11874984 DOI: 10.1101/gr.279551.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 12/10/2024] [Indexed: 12/20/2024]
Abstract
A gene regulatory network (GRN) intricately encodes the interconnectedness of identities and functionalities of genes within cells, ultimately shaping cellular specificity. Despite decades of endeavors, reverse engineering of GRNs from gene expression profiling data remains a profound challenge, particularly when it comes to reconstructing cell-specific GRNs that are tailored to precise cellular and genetic contexts. Here, we propose a discrete diffusion generation model, called DigNet, capable of generating corresponding GRNs from high-throughput single-cell RNA sequencing (scRNA-seq) data. DigNet embeds the network generation process into a multistep recovery procedure with Markov properties. Each intermediate step has a specific model to recover a portion of the gene regulatory architectures. It thus can ensure compatibility between global network structures and regulatory modules through the unique multistep diffusion procedure. Furthermore, through iMetacell integration and non-Euclidean discrete space modeling, DigNet is robust to the presence of noise in scRNA-seq data and the sparsity of GRNs. Benchmark evaluation results against more than a dozen state-of-the-art network inference methods demonstrate that DigNet achieves superior performance across various single-cell GRN reconstruction experiments. Furthermore, DigNet provides unique insights into the immune response in breast cancer, derived from differential gene regulation identified in T cells. As an open-source software, DigNet offers a powerful and effective tool for generating cell-specific GRNs from scRNA-seq data.
Collapse
Affiliation(s)
- Chuanyuan Wang
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
10
|
Yuan Q, Duren Z. Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data. Nat Biotechnol 2025; 43:247-257. [PMID: 38609714 PMCID: PMC11825371 DOI: 10.1038/s41587-024-02182-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 02/26/2024] [Indexed: 04/14/2024]
Abstract
Existing methods for gene regulatory network (GRN) inference rely on gene expression data alone or on lower resolution bulk data. Despite the recent integration of chromatin accessibility and RNA sequencing data, learning complex mechanisms from limited independent data points still presents a daunting challenge. Here we present LINGER (Lifelong neural network for gene regulation), a machine-learning method to infer GRNs from single-cell paired gene expression and chromatin accessibility data. LINGER incorporates atlas-scale external bulk data across diverse cellular contexts and prior knowledge of transcription factor motifs as a manifold regularization. LINGER achieves a fourfold to sevenfold relative increase in accuracy over existing methods and reveals a complex regulatory landscape of genome-wide association studies, enabling enhanced interpretation of disease-associated variants and genes. Following the GRN inference from reference single-cell multiome data, LINGER enables the estimation of transcription factor activity solely from bulk or single-cell gene expression data, leveraging the abundance of available gene expression data to identify driver regulators from case-control studies.
Collapse
Affiliation(s)
- Qiuyue Yuan
- Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, USA
| | - Zhana Duren
- Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, USA.
| |
Collapse
|
11
|
Jung S. Advances in modeling cellular state dynamics: integrating omics data and predictive techniques. Anim Cells Syst (Seoul) 2025; 29:72-83. [PMID: 39807350 PMCID: PMC11727055 DOI: 10.1080/19768354.2024.2449518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 12/19/2024] [Accepted: 12/29/2024] [Indexed: 01/16/2025] Open
Abstract
Dynamic modeling of cellular states has emerged as a pivotal approach for understanding complex biological processes such as cell differentiation, disease progression, and tissue development. This review provides a comprehensive overview of current approaches for modeling cellular state dynamics, focusing on techniques ranging from dynamic or static biomolecular network models to deep learning models. We highlight how these approaches integrated with various omics data such as transcriptomics, and single-cell RNA sequencing could be used to capture and predict cellular behavior and transitions. We also discuss applications of these modeling approaches in predicting gene knockout effects, designing targeted interventions, and simulating organ development. This review emphasizes the importance of selecting appropriate modeling strategies based on scalability and resolution requirements, which vary according to the complexity and size of biological systems under study. By evaluating strengths, limitations, and recent advancements of these methodologies, we aim to guide future research in developing more robust and interpretable models for understanding and manipulating cellular state dynamics in various biological contexts, ultimately advancing therapeutic strategies and precision medicine.
Collapse
Affiliation(s)
- Sungwon Jung
- Department of Genome Medicine and Science, Gachon University College of Medicine, Incheon, Republic of Korea
- Gachon Institute of Genome Medicine and Science, Gachon University Gil Medical Center, Incheon, Republic of Korea
| |
Collapse
|
12
|
Cao G, Chen D. Unveiling Long Non-coding RNA Networks from Single-Cell Omics Data Through Artificial Intelligence. Methods Mol Biol 2025; 2883:257-279. [PMID: 39702712 DOI: 10.1007/978-1-0716-4290-0_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
Single-cell omics technologies have revolutionized the study of long non-coding RNAs (lncRNAs), offering unprecedented resolution in elucidating their expression dynamics, cell-type specificity, and associated gene regulatory networks (GRNs). Concurrently, the integration of artificial intelligence (AI) methodologies has significantly advanced our understanding of lncRNA functions and its implications in disease pathogenesis. This chapter discusses the progress in single-cell omics data analysis, emphasizing its pivotal role in unraveling the molecular mechanisms underlying cellular heterogeneity and the associated regulatory networks involving lncRNAs. Additionally, we provide a summary of single-cell omics resources and AI models for constructing single-cell gene regulatory networks (scGRNs). Finally, we explore the challenges and prospects of exploring scGRNs in the context of lncRNA biology.
Collapse
Affiliation(s)
- Guangshuo Cao
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| | - Dijun Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China.
| |
Collapse
|
13
|
Weng G, Martin P, Kim H, Won KJ. Integrating Prior Knowledge Using Transformer for Gene Regulatory Network Inference. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2409990. [PMID: 39605181 PMCID: PMC11744656 DOI: 10.1002/advs.202409990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Revised: 10/23/2024] [Indexed: 11/29/2024]
Abstract
Gene regulatory network (GRN) inference, a process of reconstructing gene regulatory rules from experimental data, has the potential to discover new regulatory rules. However, existing methods often struggle to generalize across diverse cell types and account for unseen regulators. Here, this work presents GRNPT, a novel Transformer-based framework that integrates large language model (LLM) embeddings from publicly accessible biological data and a temporal convolutional network (TCN) autoencoder to capture regulatory patterns from single-cell RNA sequencing (scRNA-seq) trajectories. GRNPT significantly outperforms both supervised and unsupervised methods in inferring GRNs, particularly when training data is limited. Notably, GRNPT exhibits exceptional generalizability, accurately predicting regulatory relationships in previously unseen cell types and even regulators. By combining LLMs ability to distillate biological knowledge from text and deep learning methodologies capturing complex patterns in gene expression data, GRNPT overcomes the limitations of traditional GRN inference methods and enables more accurate and comprehensive understanding of gene regulatory dynamics.
Collapse
Affiliation(s)
- Guangzheng Weng
- Biotech Research and Innovation Centre (BRIC)University of CopenhagenOle Maaløes Vej 5Copenhagen2200Denmark
| | - Patrick Martin
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCA90069USA
| | - Hyobin Kim
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCA90069USA
| | - Kyoung Jae Won
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCA90069USA
| |
Collapse
|
14
|
Peng D, Cahan P. OneSC: a computational platform for recapitulating cell state transitions. Bioinformatics 2024; 40:btae703. [PMID: 39570626 PMCID: PMC11630913 DOI: 10.1093/bioinformatics/btae703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 11/13/2024] [Accepted: 11/19/2024] [Indexed: 11/22/2024] Open
Abstract
MOTIVATION Computational modeling of cell state transitions has been a great interest of many in the field of developmental biology, cancer biology, and cell fate engineering because it enables performing perturbation experiments in silico more rapidly and cheaply than could be achieved in a lab. Recent advancements in single-cell RNA-sequencing (scRNA-seq) allow the capture of high-resolution snapshots of cell states as they transition along temporal trajectories. Using these high-throughput datasets, we can train computational models to generate in silico "synthetic" cells that faithfully mimic the temporal trajectories. RESULTS Here we present OneSC, a platform that can simulate cell state transitions using systems of stochastic differential equations govern by a regulatory network of core transcription factors (TFs). Different from many current network inference methods, OneSC prioritizes on generating Boolean network that produces faithful cell state transitions and terminal cell states that mimic real biological systems. Applying OneSC to real data, we inferred a core TF network using a mouse myeloid progenitor scRNA-seq dataset and showed that the dynamical simulations of that network generate synthetic single-cell expression profiles that faithfully recapitulate the four myeloid differentiation trajectories going into differentiated cell states (erythrocytes, megakaryocytes, granulocytes, and monocytes). Finally, through the in silico perturbations of the mouse myeloid progenitor core network, we showed that OneSC can accurately predict cell fate decision biases of TF perturbations that closely match with previous experimental observations. AVAILABILITY AND IMPLEMENTATION OneSC is implemented as a Python package on GitHub (https://github.com/CahanLab/oneSC) and on Zenodo (https://zenodo.org/records/14052421).
Collapse
Affiliation(s)
- Da Peng
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, United States
| | - Patrick Cahan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, United States
- Institute for Cell Engineering, Johns Hopkins University, Baltimore, MD 21205, United States
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD 21205, United States
| |
Collapse
|
15
|
Li R, Wu J, Li G, Liu J, Liu J, Xuan J, Deng Z. SIGRN: Inferring Gene Regulatory Network with Soft Introspective Variational Autoencoders. Int J Mol Sci 2024; 25:12741. [PMID: 39684451 DOI: 10.3390/ijms252312741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 11/21/2024] [Accepted: 11/25/2024] [Indexed: 12/18/2024] Open
Abstract
Gene regulatory networks (GRNs) exhibit the complex regulatory relationships among genes, which are essential for understanding developmental biology and uncovering the fundamental aspects of various biological phenomena. It is an effective and economical way to infer GRNs from single-cell RNA sequencing (scRNA-seq) with computational methods. Recent researches have been done on the problem by using variational autoencoder (VAE) and structural equation model (SEM). Due to the shortcoming of VAE generating poor-quality data, in this paper, a soft introspective adversarial gene regulatory network unsupervised inference model, called SIGRN, is proposed by introducing adversarial mechanism in building a variational autoencoder model. SIGRN applies "soft" introspective adversarial mode to avoid training additional neural networks and adding additional training parameters. It demonstrates superior inference accuracy across most benchmark datasets when compared to nine leading-edge methods. In addition, method SIGRN also achieves better performance on representing cells and generating scRNA-seq data in most datasets. All of which have been verified via substantial experiments. The SIGRN method shows promise for generating scRNA-seq data and inferring GRNs.
Collapse
Affiliation(s)
- Rongyuan Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Jingli Wu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Gaoshi Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Jiafei Liu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Jinlu Liu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Junbo Xuan
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Zheng Deng
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| |
Collapse
|
16
|
Huang W, Hu Y, Wang L, Wu G, Zhang C, Shi Q. Spatially aligned graph transfer learning for characterizing spatial regulatory heterogeneity. Brief Bioinform 2024; 26:bbaf021. [PMID: 39841593 PMCID: PMC11752617 DOI: 10.1093/bib/bbaf021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 12/26/2024] [Accepted: 01/08/2025] [Indexed: 01/24/2025] Open
Abstract
Spatially resolved transcriptomics (SRT) technologies facilitate the exploration of cell fates or states within tissue microenvironments. Despite these advances, the field has not adequately addressed the regulatory heterogeneity influenced by microenvironmental factors. Here, we propose a novel Spatially Aligned Graph Transfer Learning (SpaGTL), pretrained on a large-scale multi-modal SRT data of about 100 million cells/spots to enable inference of context-specific spatial gene regulatory networks across multiple scales in data-limited settings. As a novel cross-dimensional transfer learning architecture, SpaGTL aligns spatial graph representations across gene-level graph transformers and cell/spot-level manifold-dominated variational autoencoder. This alignment facilitates the exploration of microenvironmental variations in cell types and functional domains from a molecular regulatory perspective, all within a self-supervised framework. We verified SpaGTL's precision, robustness, and speed over existing state-of-the-art algorithms and show SpaGTL's potential that facilitates the discovery of novel regulatory programs that exhibit strong associations with tissue functional regions and cell types. Importantly, SpaGTL could be extended to process multi-slice SRT data and map molecular regulatory landscape associated with three-dimensional spatial-temporal changes during development.
Collapse
Affiliation(s)
- Wendong Huang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Yaofeng Hu
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| | - Lequn Wang
- State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
| | - Guangsheng Wu
- School of Mathematics and Computer Science, Xinyu University, Xinyu 338004, Jiangxi, China
| | - Chuanchao Zhang
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| | - Qianqian Shi
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
17
|
Karamveer, Uzun Y. Approaches for Benchmarking Single-Cell Gene Regulatory Network Methods. Bioinform Biol Insights 2024; 18:11779322241287120. [PMID: 39502448 PMCID: PMC11536393 DOI: 10.1177/11779322241287120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 09/10/2024] [Indexed: 11/08/2024] Open
Abstract
Gene regulatory networks are powerful tools for modeling genetic interactions that control the expression of genes driving cell differentiation, and single-cell sequencing offers a unique opportunity to build these networks with high-resolution genomic data. There are many proposed computational methods to build these networks using single-cell data, and different approaches are used to benchmark these methods. However, a comprehensive discussion specifically focusing on benchmarking approaches is missing. In this article, we lay the GRN terminology, present an overview of common gold-standard studies and data sets, and define the performance metrics for benchmarking network construction methodologies. We also point out the advantages and limitations of different benchmarking approaches, suggest alternative ground truth data sets that can be used for benchmarking, and specify additional considerations in this context.
Collapse
Affiliation(s)
- Karamveer
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Penn State Cancer Institute, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
18
|
Wang Y, Zheng P, Cheng YC, Wang Z, Aravkin A. WENDY: Covariance dynamics based gene regulatory network inference. Math Biosci 2024; 377:109284. [PMID: 39168402 DOI: 10.1016/j.mbs.2024.109284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/25/2024] [Accepted: 08/16/2024] [Indexed: 08/23/2024]
Abstract
Determining gene regulatory network (GRN) structure is a central problem in biology, with a variety of inference methods available for different types of data. For a widely prevalent and challenging use case, namely single-cell gene expression data measured after intervention at multiple time points with unknown joint distributions, there is only one known specifically developed method, which does not fully utilize the rich information contained in this data type. We develop an inference method for the GRN in this case, netWork infErence by covariaNce DYnamics, dubbed WENDY. The core idea of WENDY is to model the dynamics of the covariance matrix, and solve this dynamics as an optimization problem to determine the regulatory relationships. To evaluate its effectiveness, we compare WENDY with other inference methods using synthetic data and experimental data. Our results demonstrate that WENDY performs well across different data sets.
Collapse
Affiliation(s)
- Yue Wang
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, 10027, NY, USA.
| | - Peng Zheng
- Institute for Health Metrics and Evaluation, Seattle, 98195, WA, USA; Department of Health Metrics Sciences, University of Washington, Seattle, 98195, WA, USA
| | - Yu-Chen Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA; Center for Cancer Evolution, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Zikun Wang
- Laboratory of Genetics, The Rockefeller University, New York, 10065, NY, USA
| | - Aleksandr Aravkin
- Department of Applied Mathematics, University of Washington, Seattle, 98195, WA, USA
| |
Collapse
|
19
|
Dong J, Li J, Wang F. Deep Learning in Gene Regulatory Network Inference: A Survey. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2089-2101. [PMID: 39137088 DOI: 10.1109/tcbb.2024.3442536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2024]
Abstract
Understanding the intricate regulatory relationships among genes is crucial for comprehending the development, differentiation, and cellular response in living systems. Consequently, inferring gene regulatory networks (GRNs) based on observed data has gained significant attention as a fundamental goal in biological applications. The proliferation and diversification of available data present both opportunities and challenges in accurately inferring GRNs. Deep learning, a highly successful technique in various domains, holds promise in aiding GRN inference. Several GRN inference methods employing deep learning models have been proposed; however, the selection of an appropriate method remains a challenge for life scientists. In this survey, we provide a comprehensive analysis of 12 GRN inference methods that leverage deep learning models. We trace the evolution of these major methods and categorize them based on the types of applicable data. We delve into the core concepts and specific steps of each method, offering a detailed evaluation of their effectiveness and scalability across different scenarios. These insights enable us to make informed recommendations. Moreover, we explore the challenges faced by GRN inference methods utilizing deep learning and discuss future directions, providing valuable suggestions for the advancement of data scientists in this field.
Collapse
|
20
|
Graham J, Zhang Y, He L, Gonzalez-Fernandez T. CRISPR-GEM: A Novel Machine Learning Model for CRISPR Genetic Target Discovery and Evaluation. ACS Synth Biol 2024; 13:3413-3429. [PMID: 39375864 PMCID: PMC11494708 DOI: 10.1021/acssynbio.4c00473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 09/17/2024] [Accepted: 09/27/2024] [Indexed: 10/09/2024]
Abstract
CRISPR gene editing strategies are shaping cell therapies through precise and tunable control over gene expression. However, limitations in safely delivering high quantities of CRISPR machinery demand careful target gene selection to achieve reliable therapeutic effects. Informed target gene selection requires a thorough understanding of the involvement of target genes in gene regulatory networks (GRNs) and thus their impact on cell phenotype. Effective decoding of these complex networks has been achieved using machine learning models, but current techniques are limited to single cell types and focus mainly on transcription factors, limiting their applicability to CRISPR strategies. To address this, we present CRISPR-GEM, a multilayer perceptron (MLP) based synthetic GRN constructed to accurately predict the downstream effects of CRISPR gene editing. First, input and output nodes are identified as differentially expressed genes between defined experimental and target cell/tissue types, respectively. Then, MLP training learns regulatory relationships in a black-box approach allowing accurate prediction of output gene expression using only input gene expression. Finally, CRISPR-mimetic perturbations are made to each input gene individually, and the resulting model predictions are compared to those for the target group to score and assess each input gene as a CRISPR candidate. The top scoring genes provided by CRISPR-GEM therefore best modulate experimental group GRNs to motivate transcriptomic shifts toward a target group phenotype. This machine learning model is the first of its kind for predicting optimal CRISPR target genes and serves as a powerful tool for enhanced CRISPR strategies across a range of cell therapies.
Collapse
Affiliation(s)
- Joshua
P. Graham
- Department
of Bioengineering, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Yu Zhang
- Department
of Bioengineering, Lehigh University, Bethlehem, Pennsylvania 18015, United States
- Department
of Electrical and Computer Engineering, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Lifang He
- Department
of Computer Science and Engineering, Lehigh
University, Bethlehem, Pennsylvania 18015, United States
| | | |
Collapse
|
21
|
K Lodi M, Chernikov A, Ghosh P. COFFEE: consensus single cell-type specific inference for gene regulatory networks. Brief Bioinform 2024; 25:bbae457. [PMID: 39311699 PMCID: PMC11418232 DOI: 10.1093/bib/bbae457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/22/2024] [Accepted: 09/02/2024] [Indexed: 09/26/2024] Open
Abstract
The inference of gene regulatory networks (GRNs) is crucial to understanding the regulatory mechanisms that govern biological processes. GRNs may be represented as edges in a graph, and hence, it have been inferred computationally for scRNA-seq data. A wisdom of crowds approach to integrate edges from several GRNs to create one composite GRN has demonstrated improved performance when compared with individual algorithm implementations on bulk RNA-seq and microarray data. In an effort to extend this approach to scRNA-seq data, we present COFFEE (COnsensus single cell-type speciFic inFerence for gEnE regulatory networks), a Borda voting-based consensus algorithm that integrates information from 10 established GRN inference methods. We conclude that COFFEE has improved performance across synthetic, curated, and experimental datasets when compared with baseline methods. Additionally, we show that a modified version of COFFEE can be leveraged to improve performance on newer cell-type specific GRN inference methods. Overall, our results demonstrate that consensus-based methods with pertinent modifications continue to be valuable for GRN inference at the single cell level. While COFFEE is benchmarked on 10 algorithms, it is a flexible strategy that can incorporate any set of GRN inference algorithms according to user preference. A Python implementation of COFFEE may be found on GitHub: https://github.com/lodimk2/coffee.
Collapse
Affiliation(s)
- Musaddiq K Lodi
- Integrative Life Sciences, Virginia Commonwealth University, 1000 W Cary St, Richmond, VA 23284, United States
| | - Anna Chernikov
- Center for Biological Data Science, Virginia Commonwealth University, 1015 Floyd Ave, Richmond, VA 23284, United States
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, 401 W Main St, Richmond, VA 23284, United States
| |
Collapse
|
22
|
Loers JU, Vermeirssen V. A single-cell multimodal view on gene regulatory network inference from transcriptomics and chromatin accessibility data. Brief Bioinform 2024; 25:bbae382. [PMID: 39207727 PMCID: PMC11359808 DOI: 10.1093/bib/bbae382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 06/27/2024] [Accepted: 07/23/2024] [Indexed: 09/04/2024] Open
Abstract
Eukaryotic gene regulation is a combinatorial, dynamic, and quantitative process that plays a vital role in development and disease and can be modeled at a systems level in gene regulatory networks (GRNs). The wealth of multi-omics data measured on the same samples and even on the same cells has lifted the field of GRN inference to the next stage. Combinations of (single-cell) transcriptomics and chromatin accessibility allow the prediction of fine-grained regulatory programs that go beyond mere correlation of transcription factor and target gene expression, with enhancer GRNs (eGRNs) modeling molecular interactions between transcription factors, regulatory elements, and target genes. In this review, we highlight the key components for successful (e)GRN inference from (sc)RNA-seq and (sc)ATAC-seq data exemplified by state-of-the-art methods as well as open challenges and future developments. Moreover, we address preprocessing strategies, metacell generation and computational omics pairing, transcription factor binding site detection, and linear and three-dimensional approaches to identify chromatin interactions as well as dynamic and causal eGRN inference. We believe that the integration of transcriptomics together with epigenomics data at a single-cell level is the new standard for mechanistic network inference, and that it can be further advanced with integrating additional omics layers and spatiotemporal data, as well as with shifting the focus towards more quantitative and causal modeling strategies.
Collapse
Affiliation(s)
- Jens Uwe Loers
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| | - Vanessa Vermeirssen
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| |
Collapse
|
23
|
Chang Z, Xu Y, Dong X, Gao Y, Wang C. Single-cell and spatial multiomic inference of gene regulatory networks using SCRIPro. Bioinformatics 2024; 40:btae466. [PMID: 39024032 PMCID: PMC11288411 DOI: 10.1093/bioinformatics/btae466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 06/05/2024] [Accepted: 07/17/2024] [Indexed: 07/20/2024] Open
Abstract
MOTIVATION The burgeoning generation of single-cell or spatial multiomic data allows for the characterization of gene regulation networks (GRNs) at an unprecedented resolution. However, the accurate reconstruction of GRNs from sparse and noisy single-cell or spatial multiomic data remains challenging. RESULTS Here, we present SCRIPro, a comprehensive computational framework that robustly infers GRNs for both single-cell and spatial multi-omics data. SCRIPro first improves sample coverage through a density clustering approach based on multiomic and spatial similarities. Additionally, SCRIPro scans transcriptional regulator (TR) importance by performing chromatin reconstruction and in silico deletion analyses using a comprehensive reference covering 1,292 human and 994 mouse TRs. Finally, SCRIPro combines TR-target importance scores derived from multiomic data with TR-target expression levels to ensure precise GRN reconstruction. We benchmarked SCRIPro on various datasets, including single-cell multiomic data from human B-cell lymphoma, mouse hair follicle development, Stereo-seq of mouse embryos, and Spatial-ATAC-RNA from mouse brain. SCRIPro outperforms existing motif-based methods and accurately reconstructs cell type-specific, stage-specific, and region-specific GRNs. Overall, SCRIPro emerges as a streamlined and fast method capable of reconstructing TR activities and GRNs for both single-cell and spatial multi-omic data. AVAILABILITY SCRIPro is available at https://github.com/wanglabtongji/SCRIPro. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhanhe Chang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Frontier Science Center for Stem Cell Research, Tongji University, Shanghai, China
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Yunfan Xu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Frontier Science Center for Stem Cell Research, Tongji University, Shanghai, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Frontier Science Center for Stem Cell Research, Tongji University, Shanghai, China
| | - Yawei Gao
- Frontier Science Center for Stem Cell Research, Tongji University, Shanghai, China
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Frontier Science Center for Stem Cell Research, Tongji University, Shanghai, China
- National Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University, Shanghai 200120, China
- Frontier Science Center for Intelligent Autonomous Systems, Tongji University, Shanghai 200120, China
| |
Collapse
|
24
|
Graham JP, Zhang Y, He L, Gonzalez-Fernandez T. CRISPR-GEM: A Novel Machine Learning Model for CRISPR Genetic Target Discovery and Evaluation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.01.601587. [PMID: 39005295 PMCID: PMC11244939 DOI: 10.1101/2024.07.01.601587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
CRISPR gene editing strategies are shaping cell therapies through precise and tunable control over gene expression. However, achieving reliable therapeutic effects with improved safety and efficacy requires informed target gene selection. This depends on a thorough understanding of the involvement of target genes in gene regulatory networks (GRNs) that regulate cell phenotype and function. Machine learning models have been previously used for GRN reconstruction using RNA-seq data, but current techniques are limited to single cell types and focus mainly on transcription factors. This restriction overlooks many potential CRISPR target genes, such as those encoding extracellular matrix components, growth factors, and signaling molecules, thus limiting the applicability of these models for CRISPR strategies. To address these limitations, we have developed CRISPR-GEM, a multi-layer perceptron (MLP)-based synthetic GRN constructed to accurately predict the downstream effects of CRISPR gene editing. First, input and output nodes are identified as differentially expressed genes between defined experimental and target cell/tissue types respectively. Then, MLP training learns regulatory relationships in a black-box approach allowing accurate prediction of output gene expression using only input gene expression. Finally, CRISPR-mimetic perturbations are made to each input gene individually and the resulting model predictions are compared to those for the target group to score and assess each input gene as a CRISPR candidate. The top scoring genes provided by CRISPR-GEM therefore best modulate experimental group GRNs to motivate transcriptomic shifts towards a target group phenotype. This machine learning model is the first of its kind for predicting optimal CRISPR target genes and serves as a powerful tool for enhanced CRISPR strategies across a range of cell therapies.
Collapse
Affiliation(s)
- Josh P Graham
- Department of Bioengineering, Lehigh University, Bethlehem, PA, USA
| | - Yu Zhang
- Department of Bioengineering, Lehigh University, Bethlehem, PA, USA
- Department of Electrical and Computer Engineering, Lehigh University, Bethlehem, PA, USA
| | - Lifang He
- Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA
| | | |
Collapse
|
25
|
Zhou X, Pan J, Chen L, Zhang S, Chen Y. DeepIMAGER: Deeply Analyzing Gene Regulatory Networks from scRNA-seq Data. Biomolecules 2024; 14:766. [PMID: 39062480 PMCID: PMC11274664 DOI: 10.3390/biom14070766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 06/22/2024] [Accepted: 06/25/2024] [Indexed: 07/28/2024] Open
Abstract
Understanding the dynamics of gene regulatory networks (GRNs) across diverse cell types poses a challenge yet holds immense value in unraveling the molecular mechanisms governing cellular processes. Current computational methods, which rely solely on expression changes from bulk RNA-seq and/or scRNA-seq data, often result in high rates of false positives and low precision. Here, we introduce an advanced computational tool, DeepIMAGER, for inferring cell-specific GRNs through deep learning and data integration. DeepIMAGER employs a supervised approach that transforms the co-expression patterns of gene pairs into image-like representations and leverages transcription factor (TF) binding information for model training. It is trained using comprehensive datasets that encompass scRNA-seq profiles and ChIP-seq data, capturing TF-gene pair information across various cell types. Comprehensive validations on six cell lines show DeepIMAGER exhibits superior performance in ten popular GRN inference tools and has remarkable robustness against dropout-zero events. DeepIMAGER was applied to scRNA-seq datasets of multiple myeloma (MM) and detected potential GRNs for TFs of RORC, MITF, and FOXD2 in MM dendritic cells. This technical innovation, combined with its capability to accurately decode GRNs from scRNA-seq, establishes DeepIMAGER as a valuable tool for unraveling complex regulatory networks in various cell types.
Collapse
Affiliation(s)
- Xiguo Zhou
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Jingyi Pan
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Liang Chen
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| |
Collapse
|
26
|
Peng D, Cahan P. OneSC: A computational platform for recapitulating cell state transitions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.31.596831. [PMID: 38895453 PMCID: PMC11185539 DOI: 10.1101/2024.05.31.596831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Computational modelling of cell state transitions has been a great interest of many in the field of developmental biology, cancer biology and cell fate engineering because it enables performing perturbation experiments in silico more rapidly and cheaply than could be achieved in a wet lab. Recent advancements in single-cell RNA sequencing (scRNA-seq) allow the capture of high-resolution snapshots of cell states as they transition along temporal trajectories. Using these high-throughput datasets, we can train computational models to generate in silico 'synthetic' cells that faithfully mimic the temporal trajectories. Here we present OneSC, a platform that can simulate synthetic cells across developmental trajectories using systems of stochastic differential equations govern by a core transcription factors (TFs) regulatory network. Different from the current network inference methods, OneSC prioritizes on generating Boolean network that produces faithful cell state transitions and steady cell states that mimic real biological systems. Applying OneSC to real data, we inferred a core TF network using a mouse myeloid progenitor scRNA-seq dataset and showed that the dynamical simulations of that network generate synthetic single-cell expression profiles that faithfully recapitulate the four myeloid differentiation trajectories going into differentiated cell states (erythrocytes, megakaryocytes, granulocytes and monocytes). Finally, through the in-silico perturbations of the mouse myeloid progenitor core network, we showed that OneSC can accurately predict cell fate decision biases of TF perturbations that closely match with previous experimental observations.
Collapse
Affiliation(s)
- Da Peng
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
| | - Patrick Cahan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
- Institute for Cell Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, Maryland, 21205, USA
| |
Collapse
|
27
|
Wan R, Zhang Y, Peng Y, Tian F, Gao G, Tang F, Jia J, Ge H. Unveiling gene regulatory networks during cellular state transitions without linkage across time points. Sci Rep 2024; 14:12355. [PMID: 38811747 PMCID: PMC11137113 DOI: 10.1038/s41598-024-62850-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 05/22/2024] [Indexed: 05/31/2024] Open
Abstract
Time-stamped cross-sectional data, which lack linkage across time points, are commonly generated in single-cell transcriptional profiling. Many previous methods for inferring gene regulatory networks (GRNs) driving cell-state transitions relied on constructing single-cell temporal ordering. Introducing COSLIR (COvariance restricted Sparse LInear Regression), we presented a direct approach to reconstructing GRNs that govern cell-state transitions, utilizing only the first and second moments of samples between two consecutive time points. Simulations validated COSLIR's perfect accuracy in the oracle case and demonstrated its robust performance in real-world scenarios. When applied to single-cell RT-PCR and RNAseq datasets in developmental biology, COSLIR competed favorably with existing methods. Notably, its running time remained nearly independent of the number of cells. Therefore, COSLIR emerges as a promising addition to GRN reconstruction methods under cell-state transitions, bypassing the single-cell temporal ordering to enhance accuracy and efficiency in single-cell transcriptional profiling.
Collapse
Affiliation(s)
- Ruosi Wan
- Beijing International Center for Mathematical Research, Peking University, Beijing, China
| | - Yuhao Zhang
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China
| | - Yongli Peng
- Beijing International Center for Mathematical Research, Peking University, Beijing, China
| | - Feng Tian
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China
| | - Ge Gao
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics, Peking University, Beijing, China
| | - Fuchou Tang
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics, Peking University, Beijing, China
| | - Jinzhu Jia
- School of Public Health and Center for Statistical Science, Peking University, Beijing, China.
| | - Hao Ge
- Beijing International Center for Mathematical Research, Peking University, Beijing, China.
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China.
| |
Collapse
|
28
|
Lei Y, Huang XT, Guo X, Hang Katie Chan K, Gao L. DeepGRNCS: deep learning-based framework for jointly inferring gene regulatory networks across cell subpopulations. Brief Bioinform 2024; 25:bbae334. [PMID: 38980373 PMCID: PMC11232306 DOI: 10.1093/bib/bbae334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 06/03/2024] [Accepted: 07/01/2024] [Indexed: 07/10/2024] Open
Abstract
Inferring gene regulatory networks (GRNs) allows us to obtain a deeper understanding of cellular function and disease pathogenesis. Recent advances in single-cell RNA sequencing (scRNA-seq) technology have improved the accuracy of GRN inference. However, many methods for inferring individual GRNs from scRNA-seq data are limited because they overlook intercellular heterogeneity and similarities between different cell subpopulations, which are often present in the data. Here, we propose a deep learning-based framework, DeepGRNCS, for jointly inferring GRNs across cell subpopulations. We follow the commonly accepted hypothesis that the expression of a target gene can be predicted based on the expression of transcription factors (TFs) due to underlying regulatory relationships. We initially processed scRNA-seq data by discretizing data scattering using the equal-width method. Then, we trained deep learning models to predict target gene expression from TFs. By individually removing each TF from the expression matrix, we used pre-trained deep model predictions to infer regulatory relationships between TFs and genes, thereby constructing the GRN. Our method outperforms existing GRN inference methods for various simulated and real scRNA-seq datasets. Finally, we applied DeepGRNCS to non-small cell lung cancer scRNA-seq data to identify key genes in each cell subpopulation and analyzed their biological relevance. In conclusion, DeepGRNCS effectively predicts cell subpopulation-specific GRNs. The source code is available at https://github.com/Nastume777/DeepGRNCS.
Collapse
Affiliation(s)
- Yahui Lei
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| | - Xiao-Tai Huang
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| | - Xingli Guo
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| | - Kei Hang Katie Chan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong SAR, China
- Department of Epidemiology and Center for Global Cardiometabolic Health, Brown University, Providence, RI, United States
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| |
Collapse
|
29
|
Singh R, Wu AP, Mudide A, Berger B. Causal gene regulatory analysis with RNA velocity reveals an interplay between slow and fast transcription factors. Cell Syst 2024; 15:462-474.e5. [PMID: 38754366 DOI: 10.1016/j.cels.2024.04.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 08/25/2023] [Accepted: 04/18/2024] [Indexed: 05/18/2024]
Abstract
Single-cell expression dynamics, from differentiation trajectories or RNA velocity, have the potential to reveal causal links between transcription factors (TFs) and their target genes in gene regulatory networks (GRNs). However, existing methods either overlook these expression dynamics or necessitate that cells be ordered along a linear pseudotemporal axis, which is incompatible with branching trajectories. We introduce Velorama, an approach to causal GRN inference that represents single-cell differentiation dynamics as a directed acyclic graph of cells, constructed from pseudotime or RNA velocity measurements. Additionally, Velorama enables the estimation of the speed at which TFs influence target genes. Applying Velorama, we uncover evidence that the speed of a TF's interactions is tied to its regulatory function. For human corticogenesis, we find that slow TFs are linked to gliomas, while fast TFs are associated with neuropsychiatric diseases. We expect Velorama to become a critical part of the RNA velocity toolkit for investigating the causal drivers of differentiation and disease.
Collapse
Affiliation(s)
- Rohit Singh
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA.
| | - Alexander P Wu
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA
| | - Anish Mudide
- Phillips Exeter Academy, Exeter, NH 03883, USA; Computer Science and Artificial Intelligence Laboratory and Department of Mathematics, MIT, Cambridge, MA 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory and Department of Mathematics, MIT, Cambridge, MA 02139, USA.
| |
Collapse
|
30
|
Zinati Y, Takiddeen A, Emad A. GRouNdGAN: GRN-guided simulation of single-cell RNA-seq data using causal generative adversarial networks. Nat Commun 2024; 15:4055. [PMID: 38744843 PMCID: PMC11525796 DOI: 10.1038/s41467-024-48516-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 05/01/2024] [Indexed: 05/16/2024] Open
Abstract
We introduce GRouNdGAN, a gene regulatory network (GRN)-guided reference-based causal implicit generative model for simulating single-cell RNA-seq data, in silico perturbation experiments, and benchmarking GRN inference methods. Through the imposition of a user-defined GRN in its architecture, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating transcription factors (TFs). Training on six experimental reference datasets, we show that our model captures non-linear TF-gene dependencies and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise, with no user manipulation and only implicit parameterization. GRouNdGAN can synthesize cells under new conditions to perform in silico TF knockout experiments. Benchmarking various GRN inference algorithms reveals that GRouNdGAN effectively bridges the existing gap between simulated and biological data benchmarks of GRN inference algorithms, providing gold standard ground truth GRNs and realistic cells corresponding to the biological system of interest.
Collapse
Affiliation(s)
- Yazdan Zinati
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada
| | - Abdulrahman Takiddeen
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada
| | - Amin Emad
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada.
- Mila, Quebec AI Institute, Montreal, QC, Canada.
- The Rosalind and Morris Goodman Cancer Institute, Montreal, QC, Canada.
| |
Collapse
|
31
|
Raharinirina NA, Sunkara V, von Kleist M, Fackeldey K, Weber M. Multi-Input data ASsembly for joint Analysis (MIASA): A framework for the joint analysis of disjoint sets of variables. PLoS One 2024; 19:e0302425. [PMID: 38728301 PMCID: PMC11086896 DOI: 10.1371/journal.pone.0302425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 04/04/2024] [Indexed: 05/12/2024] Open
Abstract
The joint analysis of two datasets [Formula: see text] and [Formula: see text] that describe the same phenomena (e.g. the cellular state), but measure disjoint sets of variables (e.g. mRNA vs. protein levels) is currently challenging. Traditional methods typically analyze single interaction patterns such as variance or covariance. However, problem-tailored external knowledge may contain multiple different information about the interaction between the measured variables. We introduce MIASA, a holistic framework for the joint analysis of multiple different variables. It consists of assembling multiple different information such as similarity vs. association, expressed in terms of interaction-scores or distances, for subsequent clustering/classification. In addition, our framework includes a novel qualitative Euclidean embedding method (qEE-Transition) which enables using Euclidean-distance/vector-based clustering/classification methods on datasets that have a non-Euclidean-based interaction structure. As an alternative to conventional optimization-based multidimensional scaling methods which are prone to uncertainties, our qEE-Transition generates a new vector representation for each element of the dataset union [Formula: see text] in a common Euclidean space while strictly preserving the original ordering of the assembled interaction-distances. To demonstrate our work, we applied the framework to three types of simulated datasets: samples from families of distributions, samples from correlated random variables, and time-courses of statistical moments for three different types of stochastic two-gene interaction models. We then compared different clustering methods with vs. without the qEE-Transition. For all examples, we found that the qEE-Transition followed by Ward clustering had superior performance compared to non-agglomerative clustering methods but had a varied performance against ultrametric-based agglomerative methods. We also tested the qEE-Transition followed by supervised and unsupervised machine learning methods and found promising results, however, more work is needed for optimal parametrization of these methods. As a future perspective, our framework points to the importance of more developments and validation of distance-distribution models aiming to capture multiple-complex interactions between different variables.
Collapse
Affiliation(s)
- Nomenjanahary Alexia Raharinirina
- Department of Mathematics & Computer Science, Freie Universität Berlin, Berlin, Germany
- Departement of Modeling and Simulation of Complex Processes, Zuse Institute Berlin, Berlin, Germany
| | - Vikram Sunkara
- Departement of Visual and Data-Centric Computing, Zuse Institute Berlin, Berlin, Germany
| | - Max von Kleist
- Department of Mathematics & Computer Science, Freie Universität Berlin, Berlin, Germany
- Project Groups, Robert-Koch Institute, Berlin, Germany
| | - Konstantin Fackeldey
- Departement of Modeling and Simulation of Complex Processes, Zuse Institute Berlin, Berlin, Germany
- Institute of Mathematics, Technical University Berlin, Berlin, Germany
| | - Marcus Weber
- Departement of Modeling and Simulation of Complex Processes, Zuse Institute Berlin, Berlin, Germany
| |
Collapse
|
32
|
Lee J, Kim N, Cho KH. Decoding the principle of cell-fate determination for its reverse control. NPJ Syst Biol Appl 2024; 10:47. [PMID: 38710700 PMCID: PMC11074314 DOI: 10.1038/s41540-024-00372-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 04/16/2024] [Indexed: 05/08/2024] Open
Abstract
Understanding and manipulating cell fate determination is pivotal in biology. Cell fate is determined by intricate and nonlinear interactions among molecules, making mathematical model-based quantitative analysis indispensable for its elucidation. Nevertheless, obtaining the essential dynamic experimental data for model development has been a significant obstacle. However, recent advancements in large-scale omics data technology are providing the necessary foundation for developing such models. Based on accumulated experimental evidence, we can postulate that cell fate is governed by a limited number of core regulatory circuits. Following this concept, we present a conceptual control framework that leverages single-cell RNA-seq data for dynamic molecular regulatory network modeling, aiming to identify and manipulate core regulatory circuits and their master regulators to drive desired cellular state transitions. We illustrate the proposed framework by applying it to the reversion of lung cancer cell states, although it is more broadly applicable to understanding and controlling a wide range of cell-fate determination processes.
Collapse
Affiliation(s)
- Jonghoon Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Namhee Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- biorevert, Inc., Daejeon, Republic of Korea
| | - Kwang-Hyun Cho
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
| |
Collapse
|
33
|
Gan Y, Yu J, Xu G, Yan C, Zou G. Inferring gene regulatory networks from single-cell transcriptomics based on graph embedding. Bioinformatics 2024; 40:btae291. [PMID: 38810116 PMCID: PMC11142726 DOI: 10.1093/bioinformatics/btae291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 03/06/2024] [Accepted: 05/28/2024] [Indexed: 05/31/2024] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) encode gene regulation in living organisms, and have become a critical tool to understand complex biological processes. However, due to the dynamic and complex nature of gene regulation, inferring GRNs from scRNA-seq data is still a challenging task. Existing computational methods usually focus on the close connections between genes, and ignore the global structure and distal regulatory relationships. RESULTS In this study, we develop a supervised deep learning framework, IGEGRNS, to infer GRNs from scRNA-seq data based on graph embedding. In the framework, contextual information of genes is captured by GraphSAGE, which aggregates gene features and neighborhood structures to generate low-dimensional embedding for genes. Then, the k most influential nodes in the whole graph are filtered through Top-k pooling. Finally, potential regulatory relationships between genes are predicted by stacking CNNs. Compared with nine competing supervised and unsupervised methods, our method achieves better performance on six time-series scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION Our method IGEGRNS is implemented in Python using the Pytorch machine learning library, and it is freely available at https://github.com/DHUDBlab/IGEGRNS.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Jiacheng Yu
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Guangwei Xu
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Cairong Yan
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Guobing Zou
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| |
Collapse
|
34
|
Stock M, Popp N, Fiorentino J, Scialdone A. Topological benchmarking of algorithms to infer gene regulatory networks from single-cell RNA-seq data. Bioinformatics 2024; 40:btae267. [PMID: 38627250 PMCID: PMC11096270 DOI: 10.1093/bioinformatics/btae267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 02/28/2024] [Accepted: 04/16/2024] [Indexed: 05/18/2024] Open
Abstract
MOTIVATION In recent years, many algorithms for inferring gene regulatory networks from single-cell transcriptomic data have been published. Several studies have evaluated their accuracy in estimating the presence of an interaction between pairs of genes. However, these benchmarking analyses do not quantify the algorithms' ability to capture structural properties of networks, which are fundamental, e.g., for studying the robustness of a gene network to external perturbations. Here, we devise a three-step benchmarking pipeline called STREAMLINE that quantifies the ability of algorithms to capture topological properties of networks and identify hubs. RESULTS To this aim, we use data simulated from different types of networks as well as experimental data from three different organisms. We apply our benchmarking pipeline to four inference algorithms and provide guidance on which algorithm should be used depending on the global network property of interest. AVAILABILITY AND IMPLEMENTATION STREAMLINE is available at https://github.com/ScialdoneLab/STREAMLINE. The data generated in this study are available at https://doi.org/10.5281/zenodo.10710444.
Collapse
Affiliation(s)
- Marco Stock
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich 85354, Germany
| | - Niclas Popp
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
| | - Jonathan Fiorentino
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
| | - Antonio Scialdone
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
| |
Collapse
|
35
|
Fiorentino J, Armaos A, Colantoni A, Tartaglia G. Prediction of protein-RNA interactions from single-cell transcriptomic data. Nucleic Acids Res 2024; 52:e31. [PMID: 38364867 PMCID: PMC11014251 DOI: 10.1093/nar/gkae076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/12/2024] [Accepted: 01/26/2024] [Indexed: 02/18/2024] Open
Abstract
Proteins are crucial in regulating every aspect of RNA life, yet understanding their interactions with coding and noncoding RNAs remains limited. Experimental studies are typically restricted to a small number of cell lines and a limited set of RNA-binding proteins (RBPs). Although computational methods based on physico-chemical principles can predict protein-RNA interactions accurately, they often lack the ability to consider cell-type-specific gene expression and the broader context of gene regulatory networks (GRNs). Here, we assess the performance of several GRN inference algorithms in predicting protein-RNA interactions from single-cell transcriptomic data, and propose a pipeline, called scRAPID (single-cell transcriptomic-based RnA Protein Interaction Detection), that integrates these methods with the catRAPID algorithm, which can identify direct physical interactions between RBPs and RNA molecules. Our approach demonstrates that RBP-RNA interactions can be predicted from single-cell transcriptomic data, with performances comparable or superior to those achieved for the well-established task of inferring transcription factor-target interactions. The incorporation of catRAPID significantly enhances the accuracy of identifying interactions, particularly with long noncoding RNAs, and enables the identification of hub RBPs and RNAs. Additionally, we show that interactions between RBPs can be detected based on their inferred RNA targets. The software is freely available at https://github.com/tartaglialabIIT/scRAPID.
Collapse
Affiliation(s)
- Jonathan Fiorentino
- Center for Life Nano- and Neuro-Science, RNA Systems Biology Lab, Fondazione Istituto Italiano di Tecnologia (IIT), 00161 Rome, Italy
| | - Alexandros Armaos
- Centre for Human Technologies (CHT), RNA Systems Biology Lab, Fondazione Istituto Italiano di Tecnologia (IIT), 16152 Genova, Italy
| | - Alessio Colantoni
- Center for Life Nano- and Neuro-Science, RNA Systems Biology Lab, Fondazione Istituto Italiano di Tecnologia (IIT), 00161 Rome, Italy
- Department of Biology and Biotechnologies “Charles Darwin”, Sapienza University of Rome, 00185 Rome, Italy
| | - Gian Gaetano Tartaglia
- Center for Life Nano- and Neuro-Science, RNA Systems Biology Lab, Fondazione Istituto Italiano di Tecnologia (IIT), 00161 Rome, Italy
- Centre for Human Technologies (CHT), RNA Systems Biology Lab, Fondazione Istituto Italiano di Tecnologia (IIT), 16152 Genova, Italy
| |
Collapse
|
36
|
Pan X, Zhang X. Studying temporal dynamics of single cells: expression, lineage and regulatory networks. Biophys Rev 2024; 16:57-67. [PMID: 38495440 PMCID: PMC10937865 DOI: 10.1007/s12551-023-01090-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 06/27/2023] [Indexed: 03/19/2024] Open
Abstract
Learning how multicellular organs are developed from single cells to different cell types is a fundamental problem in biology. With the high-throughput scRNA-seq technology, computational methods have been developed to reveal the temporal dynamics of single cells from transcriptomic data, from phenomena on cell trajectories to the underlying mechanism that formed the trajectory. There are several distinct families of computational methods including Trajectory Inference (TI), Lineage Tracing (LT), and Gene Regulatory Network (GRN) Inference which are involved in such studies. This review summarizes these computational approaches which use scRNA-seq data to study cell differentiation and cell fate specification as well as the advantages and limitations of different methods. We further discuss how GRNs can potentially affect cell fate decisions and trajectory structures. Supplementary Information The online version contains supplementary material available at 10.1007/s12551-023-01090-5.
Collapse
Affiliation(s)
- Xinhai Pan
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA
| | - Xiuwei Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA
| |
Collapse
|
37
|
Li S, Liu Y, Shen LC, Yan H, Song J, Yu DJ. GMFGRN: a matrix factorization and graph neural network approach for gene regulatory network inference. Brief Bioinform 2024; 25:bbad529. [PMID: 38261340 PMCID: PMC10805180 DOI: 10.1093/bib/bbad529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/08/2023] [Accepted: 12/19/2023] [Indexed: 01/24/2024] Open
Abstract
The recent advances of single-cell RNA sequencing (scRNA-seq) have enabled reliable profiling of gene expression at the single-cell level, providing opportunities for accurate inference of gene regulatory networks (GRNs) on scRNA-seq data. Most methods for inferring GRNs suffer from the inability to eliminate transitive interactions or necessitate expensive computational resources. To address these, we present a novel method, termed GMFGRN, for accurate graph neural network (GNN)-based GRN inference from scRNA-seq data. GMFGRN employs GNN for matrix factorization and learns representative embeddings for genes. For transcription factor-gene pairs, it utilizes the learned embeddings to determine whether they interact with each other. The extensive suite of benchmarking experiments encompassing eight static scRNA-seq datasets alongside several state-of-the-art methods demonstrated mean improvements of 1.9 and 2.5% over the runner-up in area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). In addition, across four time-series datasets, maximum enhancements of 2.4 and 1.3% in AUROC and AUPRC were observed in comparison to the runner-up. Moreover, GMFGRN requires significantly less training time and memory consumption, with time and memory consumed <10% compared to the second-best method. These findings underscore the substantial potential of GMFGRN in the inference of GRNs. It is publicly available at https://github.com/Lishuoyy/GMFGRN.
Collapse
Affiliation(s)
- Shuo Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Yan Liu
- School of information Engineering, Yangzhou University, 196 West Huayang, Yangzhou, 225000, China
| | - Long-Chen Shen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - He Yan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| |
Collapse
|
38
|
Louarn M, Collet G, Barré È, Fest T, Dameron O, Siegel A, Chatonnet F. Regulus infers signed regulatory relations from few samples' information using discretization and likelihood constraints. PLoS Comput Biol 2024; 20:e1011816. [PMID: 38252636 PMCID: PMC10833539 DOI: 10.1371/journal.pcbi.1011816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 02/01/2024] [Accepted: 01/08/2024] [Indexed: 01/24/2024] Open
Abstract
MOTIVATION Transcriptional regulation is performed by transcription factors (TF) binding to DNA in context-dependent regulatory regions and determines the activation or inhibition of gene expression. Current methods of transcriptional regulatory circuits inference, based on one or all of TF, regions and genes activity measurements require a large number of samples for ranking the candidate TF-gene regulation relations and rarely predict whether they are activations or inhibitions. We hypothesize that transcriptional regulatory circuits can be inferred from fewer samples by (1) fully integrating information on TF binding, gene expression and regulatory regions accessibility, (2) reducing data complexity and (3) using biology-based likelihood constraints to determine the global consistency between a candidate TF-gene relation and patterns of genes expressions and region activations, as well as qualify regulations as activations or inhibitions. RESULTS We introduce Regulus, a method which computes TF-gene relations from gene expressions, regulatory region activities and TF binding sites data, together with the genomic locations of all entities. After aggregating gene expressions and region activities into patterns, data are integrated into a RDF (Resource Description Framework) endpoint. A dedicated SPARQL (SPARQL Protocol and RDF Query Language) query retrieves all potential relations between expressed TF and genes involving active regulatory regions. These TF-region-gene relations are then filtered using biological likelihood constraints allowing to qualify them as activation or inhibition. Regulus provides signed relations consistent with public databases and, when applied to biological data, identifies both known and potential new regulators. Regulus is devoted to context-specific transcriptional circuits inference in human settings where samples are scarce and cell populations are closely related, using discretization into patterns and likelihood reasoning to decipher the most robust regulatory relations.
Collapse
Affiliation(s)
- Marine Louarn
- Univ Rennes, CNRS, Inria, IRISA - UMR 6074, Rennes, France
- UMR_S 1236, Université Rennes 1, INSERM, Etablissement Français du Sang, Rennes, France
| | | | - Ève Barré
- Univ Rennes, CNRS, Inria, IRISA - UMR 6074, Rennes, France
| | - Thierry Fest
- UMR_S 1236, Université Rennes 1, INSERM, Etablissement Français du Sang, Rennes, France
- Laboratoire d’Hématologie, Pôle de Biologie, CHU de Rennes, Rennes, France
| | | | - Anne Siegel
- Univ Rennes, CNRS, Inria, IRISA - UMR 6074, Rennes, France
| | - Fabrice Chatonnet
- UMR_S 1236, Université Rennes 1, INSERM, Etablissement Français du Sang, Rennes, France
- Laboratoire d’Hématologie, Pôle de Biologie, CHU de Rennes, Rennes, France
| |
Collapse
|
39
|
Xin J, Wang M, Qu L, Chen Q, Wang W, Wang Z. BIC-LP: A Hybrid Higher-Order Dynamic Bayesian Network Score Function for Gene Regulatory Network Reconstruction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:188-199. [PMID: 38127613 DOI: 10.1109/tcbb.2023.3345317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Reconstructing gene regulatory networks(GRNs) is an increasingly hot topic in bioinformatics. Dynamic Bayesian network(DBN) is a stochastic graph model commonly used as a vital model for GRN reconstruction. But probabilistic characteristics of biological networks and the existence of data noise bring great challenges to GRN reconstruction and always lead to many false positive/negative edges. ScoreLasso is a hybrid DBN score function combining DBN and linear regression with good performance. Its performance is, however, limited by first-order assumption and ignorance of the initial network of DBN. In this article, an integrated model based on higher-order DBN model, higher-order Lasso linear regression model and Pearson correlation model is proposed. Based on this, a hybrid higher-order DBN score function for GRN reconstruction is proposed, namely BIC-LP. BIC-LP score function is constructed by adding terms based on Lasso linear regression coefficients and Pearson correlation coefficients on classical BIC score function. Therefore, it could capture more information from dataset and curb information loss, compared with both many existing Bayesian family score functions and many state-of-the-art methods for GRN reconstruction. Experimental results show that BIC-LP can reasonably eliminate some false positive edges while retaining most true positive edges, so as to achieve better GRN reconstruction performance.
Collapse
|
40
|
Kim H, Choi H, Lee D, Kim J. A review on gene regulatory network reconstruction algorithms based on single cell RNA sequencing. Genes Genomics 2024; 46:1-11. [PMID: 38032470 DOI: 10.1007/s13258-023-01473-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 10/24/2023] [Indexed: 12/01/2023]
Abstract
BACKGROUND Understanding gene regulatory networks (GRNs) is essential for unraveling the molecular mechanisms governing cellular behavior. With the advent of high-throughput transcriptome measurement technology, researchers have aimed to reverse engineer the biological systems, extracting gene regulatory rules from their outputs, which represented by gene expression data. Bulk RNA sequencing, a widely used method for measuring gene expression, has been employed for GRN reconstruction. However, it falls short in capturing dynamic changes in gene expression at the level of individual cells since it averages gene expression across mixed cell populations. OBJECTIVE In this review, we provide an overview of 15 GRN reconstruction tools and discuss their respective strengths and limitations, particularly in the context of single cell RNA sequencing (scRNA-seq). METHODS Recent advancements in scRNA-seq break new ground of GRN reconstruction. They offer snapshots of the individual cell transcriptomes and capturing dynamic changes. We emphasize how these technological breakthroughs have enhanced GRN reconstruction. CONCLUSION GRN reconstructors can be classified based on their requirement for cellular trajectory, which represents a dynamical cellular process including differentiation, aging, or disease progression. Benchmarking studies support the superiority of GRN reconstructors that do not require trajectory analysis in identifying regulator-target relationships. However, methods equipped with trajectory analysis demonstrate better performance in identifying key regulatory factors. In conclusion, researchers should select a suitable GRN reconstructor based on their specific research objectives.
Collapse
Affiliation(s)
- Hyeonkyu Kim
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul, 06978, Republic of Korea
| | - Hwisoo Choi
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul, 06978, Republic of Korea
| | - Daewon Lee
- School of Art and Technology, Chung-Ang University, 4726 Seodong-Daero, Anseong-Si, Gyeonggi-Do, 17546, Republic of Korea.
| | - Junil Kim
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul, 06978, Republic of Korea.
| |
Collapse
|
41
|
Abe H, Lin P, Zhou D, Ruderfer DM, Gamazon ER. Mapping the landscape of lineage-specific dynamic regulation of gene expression using single-cell transcriptomics and application to genetics of complex disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.24.23297476. [PMID: 37961453 PMCID: PMC10635195 DOI: 10.1101/2023.10.24.23297476] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Single-cell transcriptome data can provide insights into how genetic variation influences biological processes involved in human biology and disease. However, the identification of gene-level associations in distinct cell types faces several challenges, including the limited reference resource from population scale studies, data sparsity in single-cell RNA sequencing, and the complex cell-state pattern of expression within individual cell types. Here we develop genetic models of cell type specific and cell state adjusted gene expression in mid-brain neurons in the process of specializing from induced pluripotent stem cells. The resulting framework quantifies the dynamics of the genetic regulation of gene expression and estimates its cell type specificity. As an application, we show that the approach detects known and new genes associated with schizophrenia and enables insights into context-dependent disease mechanisms. We provide a genomic resource from a phenome-wide application of our models to more than 1500 phenotypes from the UK Biobank. Using longitudinal genetically determined expression, we implement a predictive causality framework, evaluating the prediction of future values of a target gene expression using prior values of a putative regulatory gene. Collectively, this work demonstrates the insights that can be gained into the molecular underpinnings of diseases by quantifying the genetic control of gene expression at single-cell resolution.
Collapse
Affiliation(s)
- Hanna Abe
- Vanderbilt University, Nashville, TN
| | - Phillip Lin
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Dan Zhou
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Douglas M Ruderfer
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
- Department of Biomedical Informatics and Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Eric R Gamazon
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
- Clare Hall, University of Cambridge, Cambridge, England
| |
Collapse
|
42
|
Cheng J, Cheng M, Lusis AJ, Yang X. Gene Regulatory Networks in Coronary Artery Disease. Curr Atheroscler Rep 2023; 25:1013-1023. [PMID: 38008808 PMCID: PMC11466510 DOI: 10.1007/s11883-023-01170-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/09/2023] [Indexed: 11/28/2023]
Abstract
PURPOSE OF REVIEW Coronary artery disease is a complex disorder and the leading cause of mortality worldwide. As technologies for the generation of high-throughput multiomics data have advanced, gene regulatory network modeling has become an increasingly powerful tool in understanding coronary artery disease. This review summarizes recent and novel gene regulatory network tools for bulk tissue and single cell data, existing databases for network construction, and applications of gene regulatory networks in coronary artery disease. RECENT FINDINGS New gene regulatory network tools can integrate multiomics data to elucidate complex disease mechanisms at unprecedented cellular and spatial resolutions. At the same time, updates to coronary artery disease expression data in existing databases have enabled researchers to build gene regulatory networks to study novel disease mechanisms. Gene regulatory networks have proven extremely useful in understanding CAD heritability beyond what is explained by GWAS loci and in identifying mechanisms and key driver genes underlying disease onset and progression. Gene regulatory networks can holistically and comprehensively address the complex nature of coronary artery disease. In this review, we discuss key algorithmic approaches to construct gene regulatory networks and highlight state-of-the-art methods that model specific modes of gene regulation. We also explore recent applications of these tools in coronary artery disease patient data repositories to understand disease heritability and shared and distinct disease mechanisms and key driver genes across tissues, between sexes, and between species.
Collapse
Affiliation(s)
- Jenny Cheng
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA, 90095, USA
- Molecular, Cellular and Integrative Physiology Interdepartmental Program, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA, 90095, USA
| | - Michael Cheng
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA, 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA, 90095, USA
| | - Aldons J Lusis
- Department of Medicine, Division of Cardiology, University of California, Los Angeles, 650 Charles E Young Drive South, Los Angeles, CA, 90095, USA.
- Departments of Human Genetics & Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA.
| | - Xia Yang
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA, 90095, USA.
- Molecular, Cellular and Integrative Physiology Interdepartmental Program, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA, 90095, USA.
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA, 90095, USA.
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA, 90095, USA.
| |
Collapse
|
43
|
Badia-I-Mompel P, Wessels L, Müller-Dott S, Trimbour R, Ramirez Flores RO, Argelaguet R, Saez-Rodriguez J. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023; 24:739-754. [PMID: 37365273 DOI: 10.1038/s41576-023-00618-5] [Citation(s) in RCA: 124] [Impact Index Per Article: 62.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2023] [Indexed: 06/28/2023]
Abstract
The interplay between chromatin, transcription factors and genes generates complex regulatory circuits that can be represented as gene regulatory networks (GRNs). The study of GRNs is useful to understand how cellular identity is established, maintained and disrupted in disease. GRNs can be inferred from experimental data - historically, bulk omics data - and/or from the literature. The advent of single-cell multi-omics technologies has led to the development of novel computational methods that leverage genomic, transcriptomic and chromatin accessibility information to infer GRNs at an unprecedented resolution. Here, we review the key principles of inferring GRNs that encompass transcription factor-gene interactions from transcriptomics and chromatin accessibility data. We focus on the comparison and classification of methods that use single-cell multimodal data. We highlight challenges in GRN inference, in particular with respect to benchmarking, and potential further developments using additional data modalities.
Collapse
Affiliation(s)
- Pau Badia-I-Mompel
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Lorna Wessels
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Department of Vascular Biology and Tumor Angiogenesis, European Center for Angioscience, Medical Faculty, MannHeim Heidelberg University, Mannheim, Germany
| | - Sophia Müller-Dott
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Rémi Trimbour
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, Paris, France
| | - Ricardo O Ramirez Flores
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany.
| |
Collapse
|
44
|
Velten B, Stegle O. Principles and challenges of modeling temporal and spatial omics data. Nat Methods 2023; 20:1462-1474. [PMID: 37710019 DOI: 10.1038/s41592-023-01992-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 07/31/2023] [Indexed: 09/16/2023]
Abstract
Studies with temporal or spatial resolution are crucial to understand the molecular dynamics and spatial dependencies underlying a biological process or system. With advances in high-throughput omic technologies, time- and space-resolved molecular measurements at scale are increasingly accessible, providing new opportunities to study the role of timing or structure in a wide range of biological questions. At the same time, analyses of the data being generated in the context of spatiotemporal studies entail new challenges that need to be considered, including the need to account for temporal and spatial dependencies and compare them across different scales, biological samples or conditions. In this Review, we provide an overview of common principles and challenges in the analysis of temporal and spatial omics data. We discuss statistical concepts to model temporal and spatial dependencies and highlight opportunities for adapting existing analysis methods to data with temporal and spatial dimensions.
Collapse
Affiliation(s)
- Britta Velten
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- Cellular Genetics Programme, Wellcome Sanger Institute, Hinxton, Cambridge, UK.
- Centre for Organismal Studies (COS) and Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany.
| | - Oliver Stegle
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- Cellular Genetics Programme, Wellcome Sanger Institute, Hinxton, Cambridge, UK.
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
| |
Collapse
|
45
|
Shojaee A, Huang SSC. Robust discovery of gene regulatory networks from single-cell gene expression data by Causal Inference Using Composition of Transactions. Brief Bioinform 2023; 24:bbad370. [PMID: 37897702 PMCID: PMC10612495 DOI: 10.1093/bib/bbad370] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 09/06/2023] [Accepted: 09/29/2023] [Indexed: 10/30/2023] Open
Abstract
Gene regulatory networks (GRNs) drive organism structure and functions, so the discovery and characterization of GRNs is a major goal in biological research. However, accurate identification of causal regulatory connections and inference of GRNs using gene expression datasets, more recently from single-cell RNA-seq (scRNA-seq), has been challenging. Here we employ the innovative method of Causal Inference Using Composition of Transactions (CICT) to uncover GRNs from scRNA-seq data. The basis of CICT is that if all gene expressions were random, a non-random regulatory gene should induce its targets at levels different from the background random process, resulting in distinct patterns in the whole relevance network of gene-gene associations. CICT proposes novel network features derived from a relevance network, which enable any machine learning algorithm to predict causal regulatory edges and infer GRNs. We evaluated CICT using simulated and experimental scRNA-seq data in a well-established benchmarking pipeline and showed that CICT outperformed existing network inference methods representing diverse approaches with many-fold higher accuracy. Furthermore, we demonstrated that GRN inference with CICT was robust to different levels of sparsity in scRNA-seq data, the characteristics of data and ground truth, the choice of association measure and the complexity of the supervised machine learning algorithm. Our results suggest aiming at directly predicting causality to recover regulatory relationships in complex biological networks substantially improves accuracy in GRN inference.
Collapse
Affiliation(s)
- Abbas Shojaee
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA
| | - Shao-shan Carol Huang
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA
| |
Collapse
|
46
|
Mao G, Pang Z, Zuo K, Wang Q, Pei X, Chen X, Liu J. Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks. Brief Bioinform 2023; 24:bbad414. [PMID: 37985457 PMCID: PMC10661972 DOI: 10.1093/bib/bbad414] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 10/25/2023] [Accepted: 10/26/2023] [Indexed: 11/22/2023] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful technique for studying gene expression patterns at the single-cell level. Inferring gene regulatory networks (GRNs) from scRNA-seq data provides insight into cellular phenotypes from the genomic level. However, the high sparsity, noise and dropout events inherent in scRNA-seq data present challenges for GRN inference. In recent years, the dramatic increase in data on experimentally validated transcription factors binding to DNA has made it possible to infer GRNs by supervised methods. In this study, we address the problem of GRN inference by framing it as a graph link prediction task. In this paper, we propose a novel framework called GNNLink, which leverages known GRNs to deduce the potential regulatory interdependencies between genes. First, we preprocess the raw scRNA-seq data. Then, we introduce a graph convolutional network-based interaction graph encoder to effectively refine gene features by capturing interdependencies between nodes in the network. Finally, the inference of GRN is obtained by performing matrix completion operation on node features. The features obtained from model training can be applied to downstream tasks such as measuring similarity and inferring causality between gene pairs. To evaluate the performance of GNNLink, we compare it with six existing GRN reconstruction methods using seven scRNA-seq datasets. These datasets encompass diverse ground truth networks, including functional interaction networks, Loss of Function/Gain of Function data, non-specific ChIP-seq data and cell-type-specific ChIP-seq data. Our experimental results demonstrate that GNNLink achieves comparable or superior performance across these datasets, showcasing its robustness and accuracy. Furthermore, we observe consistent performance across datasets of varying scales. For reproducibility, we provide the data and source code of GNNLink on our GitHub repository: https://github.com/sdesignates/GNNLink.
Collapse
Affiliation(s)
- Guo Mao
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Zhengbin Pang
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Ke Zuo
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Qinglin Wang
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Xiangdong Pei
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Xinhai Chen
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Jie Liu
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
- Laboratory of Software Engineering for Complex System, National University of Defense Technology, deya, 410073 Changsha, China
| |
Collapse
|
47
|
Zeng Y, He Y, Zheng R, Li M. Inferring single-cell gene regulatory network by non-redundant mutual information. Brief Bioinform 2023; 24:bbad326. [PMID: 37715282 DOI: 10.1093/bib/bbad326] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/12/2023] [Accepted: 08/08/2023] [Indexed: 09/17/2023] Open
Abstract
Gene regulatory network plays a crucial role in controlling the biological processes of living creatures. Deciphering the complex gene regulatory networks from experimental data remains a major challenge in system biology. Recent advances in single-cell RNA sequencing technology bring massive high-resolution data, enabling computational inference of cell-specific gene regulatory networks (GRNs). Many relevant algorithms have been developed to achieve this goal in the past years. However, GRN inference is still less ideal due to the extra noises involved in pseudo-time information and large amounts of dropouts in datasets. Here, we present a novel GRN inference method named Normi, which is based on non-redundant mutual information. Normi manipulates these problems by employing a sliding size-fixed window approach on the entire trajectory and conducts average smoothing strategy on the gene expression of the cells in each window to obtain representative cells. To further alleviate the impact of dropouts, we utilize the mixed KSG estimator to quantify the high-order time-delayed mutual information among genes, then filter out the redundant edges by adopting Max-Relevance and Min Redundancy algorithm. Moreover, we determined the optimal time delay for each gene pair by distance correlation. Normi outperforms other state-of-the-art GRN inference methods on both simulated data and single-cell RNA sequencing (scRNA-seq) datasets, demonstrating its superiority in robustness. The performance of Normi in real scRNA-seq data further reveals its ability to identify the key regulators and crucial biological processes.
Collapse
Affiliation(s)
- Yanping Zeng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yongxin He
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
48
|
Li H, Zhang Z, Squires M, Chen X, Zhang X. scMultiSim: simulation of single cell multi-omics and spatial data guided by gene regulatory networks and cell-cell interactions. RESEARCH SQUARE 2023:rs.3.rs-3301625. [PMID: 37790516 PMCID: PMC10543280 DOI: 10.21203/rs.3.rs-3301625/v1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Simulated single-cell data is essential for designing and evaluating computational methods in the absence of experimental ground truth. Existing simulators typically focus on modeling one or two specific biological factors or mechanisms that affect the output data, which limits their capacity to simulate the complexity and multi-modality in real data. Here, we present scMultiSim, an in silico simulator that generates multi-modal single-cell data, including gene expression, chromatin accessibility, RNA velocity, and spatial cell locations while accounting for the relationships between modalities. scMultiSim jointly models various biological factors that affect the output data, including cell identity, within-cell gene regulatory networks (GRNs), cell-cell interactions (CCIs), and chromatin accessibility, hile also incorporating technical noises. Moreover, it allows users to adjust each factor's effect easily. We validated scMultiSim's simulated biological effects and demonstrated its applications by benchmarking a wide range of computational tasks, including multi-modal and multi-batch data integration, RNA velocity estimation, GRN inference and CCI inference using spatially resolved gene expression data, many of them were not benchmarked before due to the lack of proper tools. Compared to existing simulators, scMultiSim can benchmark a much broader range of existing computational problems and even new potential tasks.
Collapse
Affiliation(s)
- Hechen Li
- Georgia Institute of Technology, Atlanta, USA
| | - Ziqi Zhang
- Georgia Institute of Technology, Atlanta, USA
| | | | - Xi Chen
- Southern University of Science and Technology, Shenzhen, China
| | | |
Collapse
|
49
|
Wang J, Chen Y, Zou Q. Inferring gene regulatory network from single-cell transcriptomes with graph autoencoder model. PLoS Genet 2023; 19:e1010942. [PMID: 37703293 PMCID: PMC10519590 DOI: 10.1371/journal.pgen.1010942] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 09/25/2023] [Accepted: 08/29/2023] [Indexed: 09/15/2023] Open
Abstract
The gene regulatory structure of cells involves not only the regulatory relationship between two genes, but also the cooperative associations of multiple genes. However, most gene regulatory network inference methods for single cell only focus on and infer the regulatory relationships of pairs of genes, ignoring the global regulatory structure which is crucial to identify the regulations in the complex biological systems. Here, we proposed a graph-based Deep learning model for Regulatory networks Inference among Genes (DeepRIG) from single-cell RNA-seq data. To learn the global regulatory structure, DeepRIG builds a prior regulatory graph by transforming the gene expression of data into the co-expression mode. Then it utilizes a graph autoencoder model to embed the global regulatory information contained in the graph into gene latent embeddings and to reconstruct the gene regulatory network. Extensive benchmarking results demonstrate that DeepRIG can accurately reconstruct the gene regulatory networks and outperform existing methods on multiple simulated networks and real-cell regulatory networks. Additionally, we applied DeepRIG to the samples of human peripheral blood mononuclear cells and triple-negative breast cancer, and presented that DeepRIG can provide accurate cell-type-specific gene regulatory networks inference and identify novel regulators of progression and inhibition.
Collapse
Affiliation(s)
- Jiacheng Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Yaojia Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| |
Collapse
|
50
|
Dautle M, Zhang S, Chen Y. scTIGER: A Deep-Learning Method for Inferring Gene Regulatory Networks from Case versus Control scRNA-seq Datasets. Int J Mol Sci 2023; 24:13339. [PMID: 37686146 PMCID: PMC10488287 DOI: 10.3390/ijms241713339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 08/06/2023] [Accepted: 08/23/2023] [Indexed: 09/10/2023] Open
Abstract
Inferring gene regulatory networks (GRNs) from single-cell RNA-seq (scRNA-seq) data is an important computational question to find regulatory mechanisms involved in fundamental cellular processes. Although many computational methods have been designed to predict GRNs from scRNA-seq data, they usually have high false positive rates and none infer GRNs by directly using the paired datasets of case-versus-control experiments. Here we present a novel deep-learning-based method, named scTIGER, for GRN detection by using the co-differential relationships of gene expression profiles in paired scRNA-seq datasets. scTIGER employs cell-type-based pseudotiming, an attention-based convolutional neural network method and permutation-based significance testing for inferring GRNs among gene modules. As state-of-the-art applications, we first applied scTIGER to scRNA-seq datasets of prostate cancer cells, and successfully identified the dynamic regulatory networks of AR, ERG, PTEN and ATF3 for same-cell type between prostatic cancerous and normal conditions, and two-cell types within the prostatic cancerous environment. We then applied scTIGER to scRNA-seq data from neurons with and without fear memory and detected specific regulatory networks for BDNF, CREB1 and MAPK4. Additionally, scTIGER demonstrates robustness against high levels of dropout noise in scRNA-seq data.
Collapse
Affiliation(s)
- Madison Dautle
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA;
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA;
| |
Collapse
|