1
|
da Silva JEH, Bernardino HS, de Oliveira IL, Camata JJ. A survey of the methodological process of modeling, inference, and evaluation of gene regulatory networks using scRNA-Seq data. Biosystems 2025; 253:105464. [PMID: 40409400 DOI: 10.1016/j.biosystems.2025.105464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 03/20/2025] [Accepted: 04/17/2025] [Indexed: 05/25/2025]
Abstract
The advent of scRNA-Seq sequencing technology has provided unprecedented resolutions in the analysis of gene regulatory networks (GRNs) at the single-cell level. However, new technical and methodological challenges also emerged. Factors such as the large number of zeros reported in expression levels, the biological variation due to the stochastic nature of gene expression, environmental niche, and effects created by the cell cycle make it difficult to correctly interpret the data obtained in the sequencing stage. On the other hand, the development of methods for the inference of GRNs, specifically using scRNA-Seq technology, proved to be of similar quality to random predictors. The lack of adequate pre-processing of gene expression data, including selection steps for subsets of genes of interest, smoothing, and discretization of gene expression, in addition to the different ways of modeling networks and network motifs, are factors that affect the performance of inference approaches. Finally, the lack of knowledge about the ground-truth network and the non-standardization of appropriate metrics to measure the quality of inferred networks make the process of comparing performance between algorithms a major problem, given the unbalanced nature of the data and the interpretation bias caused by the chosen metric. This article brings these issues to light, aiming to show how these factors influence both the inference process and the performance evaluation of inferred networks, through comparative computational experiments and provides suggestions for a more robust methodological process for researchers dealing with inference of GRNs.
Collapse
Affiliation(s)
- José Eduardo H da Silva
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil.
| | - Heder S Bernardino
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil
| | - Itamar L de Oliveira
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil
| | - José J Camata
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil
| |
Collapse
|
2
|
Pan Q, Ding L, Hladyshau S, Yao X, Zhou J, Yan L, Dhungana Y, Shi H, Qian C, Dong X, Burdyshaw C, Veloso JP, Khatamian A, Xie Z, Risch I, Yang X, Yang J, Huang X, Fang J, Jain A, Jain A, Rusch M, Brewer M, Peng J, Yan KK, Chi H, Yu J. scMINER: a mutual information-based framework for clustering and hidden driver inference from single-cell transcriptomics data. Nat Commun 2025; 16:4305. [PMID: 40341143 PMCID: PMC12062461 DOI: 10.1038/s41467-025-59620-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2025] [Accepted: 04/28/2025] [Indexed: 05/10/2025] Open
Abstract
Single-cell transcriptomics data present challenges due to their inherent stochasticity and sparsity, complicating both cell clustering and cell type-specific network inference. To address these challenges, we introduce scMINER (single-cell Mutual Information-based Network Engineering Ranger), an integrative framework for unsupervised cell clustering, transcription factor and signaling protein network inference, and identification of hidden drivers from single-cell transcriptomic data. scMINER demonstrates superior accuracy in cell clustering, outperforming five state-of-the-art algorithms and excelling in distinguishing closely related cell populations. For network inference, scMINER outperforms three established methods, as validated by ATAC-seq and CROP-seq. In particular, it surpasses SCENIC in revealing key transcription factor drivers involved in T cell exhaustion and Treg tissue specification. Moreover, scMINER enables the inference of signaling protein networks and drivers with high accuracy, which presents an advantage in multimodal single cell data analysis. In addition, we establish scMINER Portal, an interactive visualization tool to facilitate exploration of scMINER results.
Collapse
Affiliation(s)
- Qingfei Pan
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Liang Ding
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Siarhei Hladyshau
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xiangyu Yao
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Jiayu Zhou
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Lei Yan
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Yogesh Dhungana
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Graduate School of Biomedical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Hao Shi
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Chenxi Qian
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xinran Dong
- Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai, 201102, P.R. China
| | - Chad Burdyshaw
- Department of Information Services, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Joao Pedro Veloso
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Alireza Khatamian
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Zhen Xie
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Physiology, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
| | - Isabel Risch
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xu Yang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Jiyuan Yang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xin Huang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Precision Research Center for Refractory Diseases, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 201620, China
| | - Jason Fang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Anuj Jain
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Arihant Jain
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Michael Rusch
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Michael Brewer
- Department of Information Services, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Junmin Peng
- Department of Structural Biology and Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Koon-Kiu Yan
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Hongbo Chi
- Graduate School of Biomedical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Jiyang Yu
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
- Graduate School of Biomedical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
| |
Collapse
|
3
|
Zhao W, Larschan E, Sandstede B, Singh R. Optimal transport reveals dynamic gene regulatory networks via gene velocity estimation. PLoS Comput Biol 2025; 21:e1012476. [PMID: 40341271 PMCID: PMC12118989 DOI: 10.1371/journal.pcbi.1012476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 05/28/2025] [Accepted: 04/10/2025] [Indexed: 05/10/2025] Open
Abstract
Inferring gene regulatory networks from gene expression data is an important and challenging problem in the biology community. We propose OTVelo, a methodology that takes time-stamped single-cell gene expression data as input and predicts gene regulation across two time points. It is known that the rate of change of gene expression, which we will refer to as gene velocity, provides crucial information that enhances such inference; however, this information is not always available due to the limitations in sequencing depth. Our algorithm overcomes this limitation by estimating gene velocities using optimal transport. We then infer gene regulation using time-lagged correlation and Granger causality via regularized linear regression. Instead of providing an aggregated network across all time points, our method uncovers the underlying dynamical mechanism across time points. We validate our algorithm on 13 simulated datasets with both synthetic and curated networks and demonstrate its efficacy on 9 experimental data sets.
Collapse
Affiliation(s)
- Wenjun Zhao
- Division of Applied Mathematics, Brown University, Providence, Rhode Island, United States of America
- Department of Mathematics, University of British Columbia, Vancouver, Canada
| | - Erica Larschan
- Department of Molecular Biology, Cell Biology and Biochemistry, Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| | - Björn Sandstede
- Division of Applied Mathematics, Brown University, Providence, Rhode Island, United States of America
| | - Ritambhara Singh
- Department of Computer Science, Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
4
|
Wang JC, Chen YJ, Zou Q. GRACE: Unveiling Gene Regulatory Networks With Causal Mechanistic Graph Neural Networks in Single-Cell RNA-Sequencing Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:9005-9017. [PMID: 38896510 DOI: 10.1109/tnnls.2024.3412753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Reconstructing gene regulatory networks (GRNs) using single-cell RNA sequencing (scRNA-seq) data holds great promise for unraveling cellular fate development and heterogeneity. While numerous machine-learning methods have been proposed to infer GRNs from scRNA-seq gene expression data, many of them operate solely in a statistical or black box manner, limiting their capacity for making causal inferences between genes. In this study, we introduce GRN inference with Accuracy and Causal Explanation (GRACE), a novel graph-based causal autoencoder framework that combines a structural causal model (SCM) with graph neural networks (GNNs) to enable GRN inference and gene causal reasoning from scRNA-seq data. By explicitly modeling causal relationships between genes, GRACE facilitates the learning of regulatory context and gene embeddings. With the learned gene signals, our model successfully decoding the causal structures and alleviates the accurate determination of multiple attributes of gene regulation that is important to determine the regulatory levels. Through extensive evaluations on seven benchmarks, we demonstrate that GRACE outperforms 14 state-of-the-art GRN inference methods, with the incorporation of causal mechanisms significantly enhancing the accuracy of GRN and gene causality inference. Furthermore, the application to human peripheral blood mononuclear cell (PBMC) samples reveals cell type-specific regulators in monocyte phagocytosis and immune regulation, validated through network analysis and functional enrichment analysis.
Collapse
|
5
|
Wei PJ, Jin HW, Gao Z, Su Y, Zheng CH. GAEDGRN: reconstruction of gene regulatory networks based on gravity-inspired graph autoencoders. Brief Bioinform 2025; 26:bbaf232. [PMID: 40415678 DOI: 10.1093/bib/bbaf232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2025] [Revised: 04/25/2025] [Accepted: 05/04/2025] [Indexed: 05/27/2025] Open
Abstract
Reconstructing high-resolution gene regulatory networks (GRNs) based on single-cell RNA sequencing data provides an opportunity to gain insight into disease pathogenesis. At present, there are a large number of GRN reconstruction methods based on graph neural networks, and they can obtain excellent performance in GRN inference by extracting network structure features. However, most of these methods fail to fully exploit the directional characteristics or even ignore them when extracting network structural features. To this end, a novel framework called GAEDGRN is proposed based on gravity-inspired graph autoencoder (GIGAE) to infer potential causal relationships between genes. Among them, GIGAE can help us capture the complex directed network topology in GRN. Additionally, due to the uneven distribution of the latent vectors generated by the graph autoencoder, a random walk-based method is used to regularize the latent vectors learnt by the encoder. Furthermore, considering that some genes in GRN usually have a significant impact on biological functions, GAEDGRN designs a gene importance score calculation method and pays attention to genes with high importance in the process of GRN reconstruction. Experimental results on seven cell types of three GRN types show that GAEDGRN achieves high accuracy and strong robustness. Moreover, a case study on human embryonic stem cells demonstrates that GAEDGRN can help identify important genes.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institute of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Huai-Wan Jin
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Zhen Gao
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Yansen Su
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Chun-Hou Zheng
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| |
Collapse
|
6
|
Sebastian S, Roy S, Kalita J. Network-based analysis of Alzheimer's Disease genes using multi-omics network integration with graph diffusion. J Biomed Inform 2025; 164:104797. [PMID: 39993589 DOI: 10.1016/j.jbi.2025.104797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 01/16/2025] [Accepted: 02/10/2025] [Indexed: 02/26/2025]
Abstract
Alzheimer's Disease (AD) is a complex neurodegenerative disorder affecting millions worldwide. Despite extensive research, the mechanisms behind AD remain elusive. Many studies suggest that disease-responsible genes often act as hub genes in biological networks. However, this assumption requires further investigation in the context of AD. To examine the network characteristics of known AD genes, it is crucial to construct a highly confident network, which is challenging to achieve using a single data source. This work integrates multi-omics networks inferred from microarray, single-cell RNA sequencing, and single-nuclei RNA sequencing expression data, weighted with protein interaction and gene ontology information. We generate a high-quality integrated network by utilizing various inference methods and combining them through a graph diffusion-based integration approach. This network is then analyzed to investigate the properties of known AD-specific genes. Our findings reveal that AD genes are not always high-degree or central hub nodes in the network. Instead, these genes are distributed across different quartiles of degree centrality while maintaining significant interconnections for effective regulation. Furthermore, our study highlights that peripheral genes, often overlooked, also play crucial roles by connecting to relevant disease nodes and hub genes. These findings challenge the conventional understanding that AD-responsible genes are primarily the hub genes in the network, offering new insights into the complex regulatory mechanisms of AD and suggesting novel directions for future research.
Collapse
Affiliation(s)
- Softya Sebastian
- Network Reconstruction and Analysis (NetRA) Lab, Department of Computer Applications, Sikkim (Central) University, India; School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
| | - Swarup Roy
- Network Reconstruction and Analysis (NetRA) Lab, Department of Computer Applications, Sikkim (Central) University, India.
| | - Jugal Kalita
- Department of Computer Science, University of Colorado at Colorado Springs, USA
| |
Collapse
|
7
|
Han M, Chen X, Li X, Ma J, Chen T, Yang C, Wang J, Li Y, Guo W, Zhu Y. MulNet: a scalable framework for reconstructing intra- and intercellular signaling networks from bulk and single-cell RNA-seq data. Brief Bioinform 2025; 26:bbaf081. [PMID: 40095604 PMCID: PMC11912874 DOI: 10.1093/bib/bbaf081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 02/02/2025] [Accepted: 02/13/2025] [Indexed: 03/19/2025] Open
Abstract
Gene expression involves complex interactions between DNA, RNA, proteins, and small molecules. However, most existing molecular networks are built on limited interaction types, resulting in a fragmented understanding of gene regulation. Here, we present MulNet, a framework that organizes diverse molecular interactions underlying gene expression data into a scalable multilayer network. Additionally, MulNet can accurately identify gene modules and key regulators within this network. When applied across diverse cancer datasets, MulNet outperformed state-of-the-art methods in identifying biologically relevant modules. MulNet analysis of RNA-seq data from colon cancer revealed numerous well-established cancer regulators and a promising new therapeutic target, miR-8485, along with several downstream pathways it governs to inhibit tumor growth. MulNet analysis of single-cell RNA-seq data from head and neck cancer revealed intricate communication networks between fibroblasts and malignant cells mediated by transcription factors and cytokines. Overall, MulNet enables high-resolution reconstruction of intra- and intercellular communication from both bulk and single-cell data. The MulNet code and application are available at https://github.com/free1234hm/MulNet.
Collapse
Affiliation(s)
- Mingfei Han
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No. 38, Life Science Park Road, Changping District, Beijing 102206, China
| | - Xiaoqing Chen
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No. 38, Life Science Park Road, Changping District, Beijing 102206, China
| | - Xiao Li
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No. 38, Life Science Park Road, Changping District, Beijing 102206, China
| | - Jie Ma
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No. 38, Life Science Park Road, Changping District, Beijing 102206, China
| | - Tao Chen
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No. 38, Life Science Park Road, Changping District, Beijing 102206, China
| | - Chunyuan Yang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No. 38, Life Science Park Road, Changping District, Beijing 102206, China
| | - Juan Wang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No. 38, Life Science Park Road, Changping District, Beijing 102206, China
| | - Yingxing Li
- Central Research Laboratory, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 1 Shuaifuyuan Wangfujing Dongcheng District, Beijing 100730, China
| | - Wenting Guo
- Central Research Laboratory, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 1 Shuaifuyuan Wangfujing Dongcheng District, Beijing 100730, China
| | - Yunping Zhu
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No. 38, Life Science Park Road, Changping District, Beijing 102206, China
| |
Collapse
|
8
|
Yu W, Lin Z, Lan M, Ou-Yang L. GCLink: a graph contrastive link prediction framework for gene regulatory network inference. Bioinformatics 2025; 41:btaf074. [PMID: 39960893 PMCID: PMC11881698 DOI: 10.1093/bioinformatics/btaf074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 01/10/2025] [Accepted: 02/13/2025] [Indexed: 03/06/2025] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) unveil the intricate interactions among genes, pivotal in elucidating the complex biological processes within cells. The advent of single-cell RNA-sequencing (scRNA-seq) enables the inference of GRNs at single-cell resolution. However, the majority of current supervised network inference methods typically concentrate on predicting pairwise gene regulatory interaction, thus failing to fully exploit correlations among all genes and exhibiting limited generalization performance. RESULTS To address these issues, we propose a graph contrastive link prediction (GCLink) model to infer potential gene regulatory interactions from scRNA-seq data. Based on known gene regulatory interactions and scRNA-seq data, GCLink introduces a graph contrastive learning strategy to aggregate the feature and neighborhood information of genes to learn their representations. This approach reduces the dependence of our model on sample size and enhance its ability in predicting potential gene regulatory interactions. Extensive experiments on real scRNA-seq datasets demonstrate that GCLink outperforms other state-of-the-art methods in most cases. Furthermore, by pretraining GCLink on a source cell line with abundant known regulatory interactions and fine-tuning it on a target cell line with limited amount of known interactions, our GCLink model exhibits good performance in GRN inference, demonstrating its effectiveness in inferring GRNs from datasets with limited known interactions. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/Yoyiming/GCLink.
Collapse
Affiliation(s)
- Weiming Yu
- Guangdong Provincial Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
| | - Zerun Lin
- Guangdong Provincial Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
| | - Miaofang Lan
- Guangdong Provincial Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
| | - Le Ou-Yang
- Guangdong Laboratory of Machine Perception and Intelligent Computing, Faculty of Engineering, Shenzhen MSU-BIT University, Shenzhen 518116, China
| |
Collapse
|
9
|
Chen L, Dautle M, Gao R, Zhang S, Chen Y. Inferring gene regulatory networks from time-series scRNA-seq data via GRANGER causal recurrent autoencoders. Brief Bioinform 2025; 26:bbaf089. [PMID: 40062616 PMCID: PMC11891664 DOI: 10.1093/bib/bbaf089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 01/26/2025] [Accepted: 02/18/2025] [Indexed: 05/13/2025] Open
Abstract
The development of single-cell RNA sequencing (scRNA-seq) technology provides valuable data resources for inferring gene regulatory networks (GRNs), enabling deeper insights into cellular mechanisms and diseases. While many methods exist for inferring GRNs from static scRNA-seq data, current approaches face challenges in accurately handling time-series scRNA-seq data due to high noise levels and data sparsity. The temporal dimension introduces additional complexity by requiring models to capture dynamic changes, increasing sensitivity to noise, and exacerbating data sparsity across time points. In this study, we introduce GRANGER, an unsupervised deep learning-based method that integrates multiple advanced techniques, including a recurrent variational autoencoder, GRANGER causality, sparsity-inducing penalties, and negative binomial (NB)-based loss functions, to infer GRNs. GRANGER was evaluated using multiple popular benchmarking datasets, where it demonstrated superior performance compared to eight well-known GRN inference methods. The integration of a NB-based loss function and sparsity-inducing penalties in GRANGER significantly enhanced its capacity to address dropout noise and sparsity in scRNA-seq data. Additionally, GRANGER exhibited robustness against high levels of dropout noise. We applied GRANGER to scRNA-seq data from the whole mouse brain obtained through the BRAIN Initiative project and identified GRNs for five transcription regulators: E2f7, Gbx1, Sox10, Prox1, and Onecut2, which play crucial roles in diverse brain cell types. The inferred GRNs not only recalled many known regulatory relationships but also revealed sets of novel regulatory interactions with functional potential. These findings demonstrate that GRANGER is a highly effective tool for real-world applications in discovering novel gene regulatory relationships.
Collapse
Affiliation(s)
- Liang Chen
- College of Computer and Information Engineering, Tianjin Normal University, 393 Binshui W Ave, Tianjin, Tianjin 300387, China
| | - Madison Dautle
- Department of Biological and Biomedical Sciences, Rowan University, 201 Mullica Hill Road, Glassboro, NJ 08028, United States
| | - Ruoying Gao
- College of Computer and Information Engineering, Tianjin Normal University, 393 Binshui W Ave, Tianjin, Tianjin 300387, China
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, 393 Binshui W Ave, Tianjin, Tianjin 300387, China
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, 201 Mullica Hill Road, Glassboro, NJ 08028, United States
| |
Collapse
|
10
|
Li A, Li M, Fei R, Mallik S, Hu B, Yu Y. EfficientNet-resDDSC: A Hybrid Deep Learning Model Integrating Residual Blocks and Dilated Convolutions for Inferring Gene Causality in Single-Cell Data. Interdiscip Sci 2025; 17:166-184. [PMID: 39578307 DOI: 10.1007/s12539-024-00667-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 10/07/2024] [Accepted: 10/09/2024] [Indexed: 11/24/2024]
Abstract
Gene Regulatory Networks (GRNs) reveal complex interactions between genes in organisms, crucial for understanding the life system's operation. The rapid development of biotechnology, especially single-cell RNA sequencing (scRNA-seq), has generated a large amount of scRNA-seq data, which can be analyzed to explore the regulatory relationships between genes at the single-cell level. Previous models used to construct GRNs mainly aim at constructing associative relationships between genes, but usually fail to accurately reveal the causality between genes. Therefore, we present a hybrid deep learning model called EfficientNet-resDDSC (the EfficientNet with Residual Blocks and Depthwise Separable Dilated Convolutions) to infer causality between genes. The model inherits the basic structure of EfficientNet-B0 and incorporates residual blocks as well as dilated convolutions. The model's ability to extract low-level features at the primary stage is enhanced by introducing residual blocks. The model combines Depthwise Separable Convolution (DSC) in the inverted linear bottleneck layers with the dilated convolutions to expand the model's receptive fields without increasing the computational effort. This design enables the model to comprehensively reveal potential relationships among different genes in high-dimensional and high-noise single-cell data. In comparison with the five existing deep learning network models, EfficientNet-resDDSC's overall performance is significantly better than others on four datasets. In this study, EfficientNet-resDDSC was further applied to construct GRNs for breast cancer patients, focusing on the related regulatory genes of the key gene BRCA1, which contributes to the advancement of breast cancer research and treatment strategies.
Collapse
Affiliation(s)
- Aimin Li
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, China
- Shaanxi Key Laboratory for Network Computing and Security Technology, Xi'an, 710048, China
| | - Mingyue Li
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, China
| | - Rong Fei
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, China.
- Shaanxi Key Laboratory for Network Computing and Security Technology, Xi'an, 710048, China.
| | - Saurav Mallik
- Department of Environmental Health, Harvard University T H Chan School of Public Health, Boston, 02115, USA
| | - Bo Hu
- Hangzhou HollySys Automation Co., Ltd, Hangzhou, 100176, China
| | - Yue Yu
- School of Information Science and Technology, Northwestern University, Xi'an, 710127, China
| |
Collapse
|
11
|
Uthamacumaran A. Cell Fate Dynamics Reconstruction Identifies TPT1 and PTPRZ1 Feedback Loops as Master Regulators of Differentiation in Pediatric Glioblastoma-Immune Cell Networks. Interdiscip Sci 2025; 17:59-85. [PMID: 39420135 DOI: 10.1007/s12539-024-00657-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 09/09/2024] [Accepted: 09/10/2024] [Indexed: 10/19/2024]
Abstract
Pediatric glioblastoma is a complex dynamical disease that is difficult to treat due to its multiple adaptive behaviors driven largely by phenotypic plasticity. Integrated data science and network theory pipelines offer novel approaches to studying glioblastoma cell fate dynamics, particularly phenotypic transitions over time. Here we used various single-cell trajectory inference algorithms to infer signaling dynamics regulating pediatric glioblastoma-immune cell networks. We identified GATA2, PTPRZ1, TPT1, MTRNR2L1/2, OLIG1/2, SOX11, FXYD6, SEZ6L, PDGFRA, EGFR, S100B, WNT, TNF α , and NF-kB as critical transition genes or signals regulating glioblastoma-immune network dynamics, revealing potential clinically relevant targets. Further, we reconstructed glioblastoma cell fate attractors and found complex bifurcation dynamics within glioblastoma phenotypic transitions, suggesting that a causal pattern may be driving glioblastoma evolution and cell fate decision-making. Together, our findings have implications for developing targeted therapies against glioblastoma, and the continued integration of quantitative approaches and artificial intelligence (AI) to understand pediatric glioblastoma tumor-immune interactions.
Collapse
Affiliation(s)
- Abicumaran Uthamacumaran
- Department of Physics (Alumni), Concordia University, Montréal, H4B 1R6, Canada.
- Department of Psychology (Alumni), Concordia University, Montréal, H4B 1R6, Canada.
- Oxford Immune Algorithmics, Reading, RG1 8EQ, UK.
| |
Collapse
|
12
|
Xu J, Lu C, Jin S, Meng Y, Fu X, Zeng X, Nussinov R, Cheng F. Deep learning-based cell-specific gene regulatory networks inferred from single-cell multiome data. Nucleic Acids Res 2025; 53:gkaf138. [PMID: 40037709 PMCID: PMC11879466 DOI: 10.1093/nar/gkaf138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 01/03/2025] [Accepted: 02/13/2025] [Indexed: 03/06/2025] Open
Abstract
Gene regulatory networks (GRNs) provide a global representation of how genetic/genomic information is transferred in living systems and are a key component in understanding genome regulation. Single-cell multiome data provide unprecedented opportunities to reconstruct GRNs at fine-grained resolution. However, the inference of GRNs is hindered by insufficient single omic profiles due to the characteristic high loss rate of single-cell sequencing data. In this study, we developed scMultiomeGRN, a deep learning framework to infer transcription factor (TF) regulatory networks via unique integration of single-cell genomic (single-cell RNA sequencing) and epigenomic (single-cell ATAC sequencing) data. We create scMultiomeGRN to elucidate these networks by conceptualizing TF network graph structures. Specifically, we build modality-specific neighbor aggregators and cross-modal attention modules to learn latent representations of TFs from single-cell multi-omics. We demonstrate that scMultiomeGRN outperforms state-of-the-art models on multiple benchmark datasets involved in diseases and health. Via scMultiomeGRN, we identified Alzheimer's disease-relevant regulatory network of SPI1 and RUNX1 for microglia. In summary, scMultiomeGRN offers a deep learning framework to identify cell type-specific gene regulatory network from single-cell multiome data.
Collapse
Affiliation(s)
- Junlin Xu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei 430065, China
| | - Changcheng Lu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Shuting Jin
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei 430065, China
| | - Yajie Meng
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, Hubei 430200, China
| | - Xiangzheng Fu
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Ruth Nussinov
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute at Frederick, Frederick, MD 21702, United States
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Feixiong Cheng
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, United States
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, United States
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, United States
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, United States
| |
Collapse
|
13
|
Wang C, Liu ZP. Diffusion-based generation of gene regulatory networks from scRNA-seq data with DigNet. Genome Res 2025; 35:340-354. [PMID: 39694856 PMCID: PMC11874984 DOI: 10.1101/gr.279551.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 12/10/2024] [Indexed: 12/20/2024]
Abstract
A gene regulatory network (GRN) intricately encodes the interconnectedness of identities and functionalities of genes within cells, ultimately shaping cellular specificity. Despite decades of endeavors, reverse engineering of GRNs from gene expression profiling data remains a profound challenge, particularly when it comes to reconstructing cell-specific GRNs that are tailored to precise cellular and genetic contexts. Here, we propose a discrete diffusion generation model, called DigNet, capable of generating corresponding GRNs from high-throughput single-cell RNA sequencing (scRNA-seq) data. DigNet embeds the network generation process into a multistep recovery procedure with Markov properties. Each intermediate step has a specific model to recover a portion of the gene regulatory architectures. It thus can ensure compatibility between global network structures and regulatory modules through the unique multistep diffusion procedure. Furthermore, through iMetacell integration and non-Euclidean discrete space modeling, DigNet is robust to the presence of noise in scRNA-seq data and the sparsity of GRNs. Benchmark evaluation results against more than a dozen state-of-the-art network inference methods demonstrate that DigNet achieves superior performance across various single-cell GRN reconstruction experiments. Furthermore, DigNet provides unique insights into the immune response in breast cancer, derived from differential gene regulation identified in T cells. As an open-source software, DigNet offers a powerful and effective tool for generating cell-specific GRNs from scRNA-seq data.
Collapse
Affiliation(s)
- Chuanyuan Wang
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
14
|
Yuan Q, Duren Z. Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data. Nat Biotechnol 2025; 43:247-257. [PMID: 38609714 PMCID: PMC11825371 DOI: 10.1038/s41587-024-02182-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 02/26/2024] [Indexed: 04/14/2024]
Abstract
Existing methods for gene regulatory network (GRN) inference rely on gene expression data alone or on lower resolution bulk data. Despite the recent integration of chromatin accessibility and RNA sequencing data, learning complex mechanisms from limited independent data points still presents a daunting challenge. Here we present LINGER (Lifelong neural network for gene regulation), a machine-learning method to infer GRNs from single-cell paired gene expression and chromatin accessibility data. LINGER incorporates atlas-scale external bulk data across diverse cellular contexts and prior knowledge of transcription factor motifs as a manifold regularization. LINGER achieves a fourfold to sevenfold relative increase in accuracy over existing methods and reveals a complex regulatory landscape of genome-wide association studies, enabling enhanced interpretation of disease-associated variants and genes. Following the GRN inference from reference single-cell multiome data, LINGER enables the estimation of transcription factor activity solely from bulk or single-cell gene expression data, leveraging the abundance of available gene expression data to identify driver regulators from case-control studies.
Collapse
Affiliation(s)
- Qiuyue Yuan
- Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, USA
| | - Zhana Duren
- Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, USA.
| |
Collapse
|
15
|
Jung S. Advances in modeling cellular state dynamics: integrating omics data and predictive techniques. Anim Cells Syst (Seoul) 2025; 29:72-83. [PMID: 39807350 PMCID: PMC11727055 DOI: 10.1080/19768354.2024.2449518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 12/19/2024] [Accepted: 12/29/2024] [Indexed: 01/16/2025] Open
Abstract
Dynamic modeling of cellular states has emerged as a pivotal approach for understanding complex biological processes such as cell differentiation, disease progression, and tissue development. This review provides a comprehensive overview of current approaches for modeling cellular state dynamics, focusing on techniques ranging from dynamic or static biomolecular network models to deep learning models. We highlight how these approaches integrated with various omics data such as transcriptomics, and single-cell RNA sequencing could be used to capture and predict cellular behavior and transitions. We also discuss applications of these modeling approaches in predicting gene knockout effects, designing targeted interventions, and simulating organ development. This review emphasizes the importance of selecting appropriate modeling strategies based on scalability and resolution requirements, which vary according to the complexity and size of biological systems under study. By evaluating strengths, limitations, and recent advancements of these methodologies, we aim to guide future research in developing more robust and interpretable models for understanding and manipulating cellular state dynamics in various biological contexts, ultimately advancing therapeutic strategies and precision medicine.
Collapse
Affiliation(s)
- Sungwon Jung
- Department of Genome Medicine and Science, Gachon University College of Medicine, Incheon, Republic of Korea
- Gachon Institute of Genome Medicine and Science, Gachon University Gil Medical Center, Incheon, Republic of Korea
| |
Collapse
|
16
|
Weng G, Martin P, Kim H, Won KJ. Integrating Prior Knowledge Using Transformer for Gene Regulatory Network Inference. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2409990. [PMID: 39605181 PMCID: PMC11744656 DOI: 10.1002/advs.202409990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Revised: 10/23/2024] [Indexed: 11/29/2024]
Abstract
Gene regulatory network (GRN) inference, a process of reconstructing gene regulatory rules from experimental data, has the potential to discover new regulatory rules. However, existing methods often struggle to generalize across diverse cell types and account for unseen regulators. Here, this work presents GRNPT, a novel Transformer-based framework that integrates large language model (LLM) embeddings from publicly accessible biological data and a temporal convolutional network (TCN) autoencoder to capture regulatory patterns from single-cell RNA sequencing (scRNA-seq) trajectories. GRNPT significantly outperforms both supervised and unsupervised methods in inferring GRNs, particularly when training data is limited. Notably, GRNPT exhibits exceptional generalizability, accurately predicting regulatory relationships in previously unseen cell types and even regulators. By combining LLMs ability to distillate biological knowledge from text and deep learning methodologies capturing complex patterns in gene expression data, GRNPT overcomes the limitations of traditional GRN inference methods and enables more accurate and comprehensive understanding of gene regulatory dynamics.
Collapse
Affiliation(s)
- Guangzheng Weng
- Biotech Research and Innovation Centre (BRIC)University of CopenhagenOle Maaløes Vej 5Copenhagen2200Denmark
| | - Patrick Martin
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCA90069USA
| | - Hyobin Kim
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCA90069USA
| | - Kyoung Jae Won
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCA90069USA
| |
Collapse
|
17
|
Khullar S, Huang X, Ramesh R, Svaren J, Wang D. NetREm: Network Regression Embeddings reveal cell-type transcription factor coordination for gene regulation. BIOINFORMATICS ADVANCES 2024; 5:vbae206. [PMID: 40260118 PMCID: PMC12011367 DOI: 10.1093/bioadv/vbae206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 10/22/2024] [Accepted: 12/18/2024] [Indexed: 04/23/2025]
Abstract
Motivation Transcription factor (TF) coordination plays a key role in gene regulation via direct and/or indirect protein-protein interactions (PPIs) and co-binding to regulatory elements on DNA. Single-cell technologies facilitate gene expression measurement for individual cells and cell-type identification, yet the connection between TF-TF coordination and target gene (TG) regulation of various cell types remains unclear. Results To address this, we introduce our innovative computational approach, Network Regression Embeddings (NetREm), to reveal cell-type TF-TF coordination activities for TG regulation. NetREm leverages network-constrained regularization, using prior knowledge of PPIs among TFs, to analyze single-cell gene expression data, uncovering cell-type coordinating TFs and identifying revolutionary TF-TG candidate regulatory network links. NetREm's performance is validated using simulation studies and benchmarked across several datasets in humans, mice, yeast. Further, we showcase NetREm's ability to prioritize valid novel human TF-TF coordination links in 9 peripheral blood mononuclear and 42 immune cell sub-types. We apply NetREm to examine cell-type networks in central and peripheral nerve systems (e.g. neuronal, glial, Schwann cells) and in Alzheimer's disease versus Controls. Top predictions are validated with experimental data from rat, mouse, and human models. Additional functional genomics data helps link genetic variants to our TF-TG regulatory and TF-TF coordination networks. Availability and implementation https://github.com/SaniyaKhullar/NetREm.
Collapse
Affiliation(s)
- Saniya Khullar
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, United States
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53076, United States
| | - Xiang Huang
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, United States
| | - Raghu Ramesh
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, United States
- Comparative Biomedical Sciences Training Program, University of Wisconsin-Madison, Madison, WI 53706, United States
| | - John Svaren
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, United States
- Department of Comparative Biosciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI 53706, United States
| | - Daifeng Wang
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, United States
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53076, United States
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, United States
| |
Collapse
|
18
|
Yang B, Li J, Li X, Liu S. Gene regulatory network inference based on novel ensemble method. Brief Funct Genomics 2024; 23:866-878. [PMID: 39324652 DOI: 10.1093/bfgp/elae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 08/09/2024] [Accepted: 09/06/2024] [Indexed: 09/27/2024] Open
Abstract
Gene regulatory networks (GRNs) contribute toward understanding the function of genes and the development of cancer or the impact of key genes on diseases. Hence, this study proposes an ensemble method based on 13 basic classification methods and a flexible neural tree (FNT) to improve GRN identification accuracy. The primary classification methods contain ridge classification, stochastic gradient descent, Gaussian process classification, Bernoulli Naive Bayes, adaptive boosting, gradient boosting decision tree, hist gradient boosting classification, eXtreme gradient boosting (XGBoost), multilayer perceptron, light gradient boosting machine, random forest, support vector machine, and k-nearest neighbor algorithm, which are regarded as the input variable set of FNT model. Additionally, a hybrid evolutionary algorithm based on a gene programming variant and particle swarm optimization is developed to search for the optimal FNT model. Experiments on three simulation datasets and three real single-cell RNA-seq datasets demonstrate that the proposed ensemble feature outperforms 13 supervised algorithms, seven unsupervised algorithms (ARACNE, CLR, GENIE3, MRNET, PCACMI, GENECI, and EPCACMI) and four single cell-specific methods (SCODE, BiRGRN, LEAP, and BiGBoost) based on the area under the receiver operating characteristic curve, area under the precision-recall curve, and F1 metrics.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, No. 1 Beian Road, Zaozhuang 277160, China
| | - Jing Li
- School of Information Science and Engineering, Zaozhuang University, No. 1 Beian Road, Zaozhuang 277160, China
| | - Xiang Li
- Information Department, Qingdao Eighth People's Hospital, No. 84 Fengshan Road, Qingdao 266121, China
| | - Sanrong Liu
- School of Information Science and Engineering, Zaozhuang University, No. 1 Beian Road, Zaozhuang 277160, China
| |
Collapse
|
19
|
Bonev B, Castelo-Branco G, Chen F, Codeluppi S, Corces MR, Fan J, Heiman M, Harris K, Inoue F, Kellis M, Levine A, Lotfollahi M, Luo C, Maynard KR, Nitzan M, Ramani V, Satijia R, Schirmer L, Shen Y, Sun N, Green GS, Theis F, Wang X, Welch JD, Gokce O, Konopka G, Liddelow S, Macosko E, Ali Bayraktar O, Habib N, Nowakowski TJ. Opportunities and challenges of single-cell and spatially resolved genomics methods for neuroscience discovery. Nat Neurosci 2024; 27:2292-2309. [PMID: 39627587 PMCID: PMC11999325 DOI: 10.1038/s41593-024-01806-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 09/23/2024] [Indexed: 12/13/2024]
Abstract
Over the past decade, single-cell genomics technologies have allowed scalable profiling of cell-type-specific features, which has substantially increased our ability to study cellular diversity and transcriptional programs in heterogeneous tissues. Yet our understanding of mechanisms of gene regulation or the rules that govern interactions between cell types is still limited. The advent of new computational pipelines and technologies, such as single-cell epigenomics and spatially resolved transcriptomics, has created opportunities to explore two new axes of biological variation: cell-intrinsic regulation of cell states and expression programs and interactions between cells. Here, we summarize the most promising and robust technologies in these areas, discuss their strengths and limitations and discuss key computational approaches for analysis of these complex datasets. We highlight how data sharing and integration, documentation, visualization and benchmarking of results contribute to transparency, reproducibility, collaboration and democratization in neuroscience, and discuss needs and opportunities for future technology development and analysis.
Collapse
Affiliation(s)
- Boyan Bonev
- Helmholtz Pioneer Campus, Helmholtz Zentrum München, Neuherberg, Germany
- Physiological Genomics, Biomedical Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Gonçalo Castelo-Branco
- Laboratory of Molecular Neurobiology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Fei Chen
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - M Ryan Corces
- Gladstone Institute of Neurological Disease, San Francisco, CA, USA
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Jean Fan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Myriam Heiman
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA
- The Picower Institute for Learning and Memory, MIT, Cambridge, MA, USA
| | - Kenneth Harris
- UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Fumitaka Inoue
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Manolis Kellis
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ariel Levine
- Spinal Circuits and Plasticity Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
| | - Mo Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Chongyuan Luo
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Kristen R Maynard
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Department of Psychiatry, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Mor Nitzan
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
- Racah Institute of Physics, The Hebrew University of Jerusalem, Jerusalem, Israel
- Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Vijay Ramani
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, San Francisco, CA, USA
| | - Rahul Satijia
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Lucas Schirmer
- Department of Neurology, Mannheim Center for Translational Neuroscience, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Yin Shen
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Na Sun
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Gilad S Green
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Fabian Theis
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Xiao Wang
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Joshua D Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Ozgun Gokce
- German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany.
- Department of Neurodegenerative Diseases and Geriatric Psychiatry, University Hospital Bonn, Bonn, Germany.
| | - Genevieve Konopka
- Department of Neuroscience, UT Southwestern Medical Center, Dallas, TX, USA.
- Peter O'Donnell Jr. Brain Institute, UT Southwestern Medical Center, Dallas, TX, USA.
| | - Shane Liddelow
- Neuroscience Institute, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Neuroscience & Physiology, NYU Grossman School of Medicine, New York, NY, USA.
- Parekh Center for Interdisciplinary Neurology, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Ophthalmology, NYU Grossman School of Medicine, New York, NY, USA.
| | - Evan Macosko
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA.
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA.
| | | | - Naomi Habib
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel.
| | - Tomasz J Nowakowski
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Department of Anatomy, University of California, San Francisco, San Francisco, CA, USA.
- Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, San Francisco, CA, USA.
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
20
|
Peng D, Cahan P. OneSC: a computational platform for recapitulating cell state transitions. Bioinformatics 2024; 40:btae703. [PMID: 39570626 PMCID: PMC11630913 DOI: 10.1093/bioinformatics/btae703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 11/13/2024] [Accepted: 11/19/2024] [Indexed: 11/22/2024] Open
Abstract
MOTIVATION Computational modeling of cell state transitions has been a great interest of many in the field of developmental biology, cancer biology, and cell fate engineering because it enables performing perturbation experiments in silico more rapidly and cheaply than could be achieved in a lab. Recent advancements in single-cell RNA-sequencing (scRNA-seq) allow the capture of high-resolution snapshots of cell states as they transition along temporal trajectories. Using these high-throughput datasets, we can train computational models to generate in silico "synthetic" cells that faithfully mimic the temporal trajectories. RESULTS Here we present OneSC, a platform that can simulate cell state transitions using systems of stochastic differential equations govern by a regulatory network of core transcription factors (TFs). Different from many current network inference methods, OneSC prioritizes on generating Boolean network that produces faithful cell state transitions and terminal cell states that mimic real biological systems. Applying OneSC to real data, we inferred a core TF network using a mouse myeloid progenitor scRNA-seq dataset and showed that the dynamical simulations of that network generate synthetic single-cell expression profiles that faithfully recapitulate the four myeloid differentiation trajectories going into differentiated cell states (erythrocytes, megakaryocytes, granulocytes, and monocytes). Finally, through the in silico perturbations of the mouse myeloid progenitor core network, we showed that OneSC can accurately predict cell fate decision biases of TF perturbations that closely match with previous experimental observations. AVAILABILITY AND IMPLEMENTATION OneSC is implemented as a Python package on GitHub (https://github.com/CahanLab/oneSC) and on Zenodo (https://zenodo.org/records/14052421).
Collapse
Affiliation(s)
- Da Peng
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, United States
| | - Patrick Cahan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, United States
- Institute for Cell Engineering, Johns Hopkins University, Baltimore, MD 21205, United States
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD 21205, United States
| |
Collapse
|
21
|
Shan X, Zhao H. Inferring Cell-Type-Specific Co-Expressed Genes from Single Cell Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.08.622700. [PMID: 39605403 PMCID: PMC11601408 DOI: 10.1101/2024.11.08.622700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Background Cell-type-specific gene co-expression networks are widely used to characterize gene relationships. Although many methods have been developed to infer such co-expression networks from single-cell data, the lack of consideration of false positive control in many evaluations may lead to incorrect conclusions because higher reproducibility, higher functional coherence, and a larger overlap with known biological networks may not imply better performance if the false positives are not well controlled. Results In this study, we have developed an efficient and effective simulation tool to derive empirical p-values in co-expression inference to appropriately control false positives in assessing method performance. We studied the power of the p-value-based approach in inferring cell-type-specific co-expressions from single-cell data using both simulated and real data. We also highlight the need to adjust for random overlaps between the inferred and known networks when the number of selected correlated gene pairs varies substantially across different methods. We further illustrate the expression level bias in known biological networks and the impact of such bias in method assessment. Conclusion Our study indicates the importance of controlling false positives in the inference of co-expressed genes to achieve more reliable results and proposes a simulation-based p-value method to achieve this.
Collapse
|
22
|
Karamveer, Uzun Y. Approaches for Benchmarking Single-Cell Gene Regulatory Network Methods. Bioinform Biol Insights 2024; 18:11779322241287120. [PMID: 39502448 PMCID: PMC11536393 DOI: 10.1177/11779322241287120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 09/10/2024] [Indexed: 11/08/2024] Open
Abstract
Gene regulatory networks are powerful tools for modeling genetic interactions that control the expression of genes driving cell differentiation, and single-cell sequencing offers a unique opportunity to build these networks with high-resolution genomic data. There are many proposed computational methods to build these networks using single-cell data, and different approaches are used to benchmark these methods. However, a comprehensive discussion specifically focusing on benchmarking approaches is missing. In this article, we lay the GRN terminology, present an overview of common gold-standard studies and data sets, and define the performance metrics for benchmarking network construction methodologies. We also point out the advantages and limitations of different benchmarking approaches, suggest alternative ground truth data sets that can be used for benchmarking, and specify additional considerations in this context.
Collapse
Affiliation(s)
- Karamveer
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Penn State Cancer Institute, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
23
|
Dong J, Li J, Wang F. Deep Learning in Gene Regulatory Network Inference: A Survey. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2089-2101. [PMID: 39137088 DOI: 10.1109/tcbb.2024.3442536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2024]
Abstract
Understanding the intricate regulatory relationships among genes is crucial for comprehending the development, differentiation, and cellular response in living systems. Consequently, inferring gene regulatory networks (GRNs) based on observed data has gained significant attention as a fundamental goal in biological applications. The proliferation and diversification of available data present both opportunities and challenges in accurately inferring GRNs. Deep learning, a highly successful technique in various domains, holds promise in aiding GRN inference. Several GRN inference methods employing deep learning models have been proposed; however, the selection of an appropriate method remains a challenge for life scientists. In this survey, we provide a comprehensive analysis of 12 GRN inference methods that leverage deep learning models. We trace the evolution of these major methods and categorize them based on the types of applicable data. We delve into the core concepts and specific steps of each method, offering a detailed evaluation of their effectiveness and scalability across different scenarios. These insights enable us to make informed recommendations. Moreover, we explore the challenges faced by GRN inference methods utilizing deep learning and discuss future directions, providing valuable suggestions for the advancement of data scientists in this field.
Collapse
|
24
|
Wang W, Wang Y, Lyu R, Grün D. Scalable identification of lineage-specific gene regulatory networks from metacells with NetID. Genome Biol 2024; 25:275. [PMID: 39425176 PMCID: PMC11488259 DOI: 10.1186/s13059-024-03418-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 10/08/2024] [Indexed: 10/21/2024] Open
Abstract
The identification of gene regulatory networks (GRNs) is crucial for understanding cellular differentiation. Single-cell RNA sequencing data encode gene-level covariations at high resolution, yet data sparsity and high dimensionality hamper accurate and scalable GRN reconstruction. To overcome these challenges, we introduce NetID leveraging homogenous metacells while avoiding spurious gene-gene correlations. Benchmarking demonstrates superior performance of NetID compared to imputation-based methods. By incorporating cell fate probability information, NetID facilitates the prediction of lineage-specific GRNs and recovers known network motifs governing bone marrow hematopoiesis, making it a powerful toolkit for deciphering gene regulatory control of cellular differentiation from large-scale single-cell transcriptome data.
Collapse
Affiliation(s)
- Weixu Wang
- Human Phenome Institute, Fudan University, Shanghai, China
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
| | - Yichen Wang
- Cancer, Ageing and Somatic Mutation, Wellcome Sanger Institute, Hinxton, UK
| | - Ruiqi Lyu
- School of Computer Science, Carnegie Mellon University, Pittsburgh, USA
| | - Dominic Grün
- Würzburg Institute of Systems Immunology, Julius-Maximilians-Universität Würzburg, Würzburg, Germany.
- CAIDAS - Center for Artificial Intelligence and Data Science, Würzburg, Germany.
| |
Collapse
|
25
|
Guo Y, Xiao Z. Constructing the dynamic transcriptional regulatory networks to identify phenotype-specific transcription regulators. Brief Bioinform 2024; 25:bbae542. [PMID: 39451156 PMCID: PMC11503644 DOI: 10.1093/bib/bbae542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 09/25/2024] [Accepted: 10/10/2024] [Indexed: 10/26/2024] Open
Abstract
The transcriptional regulatory network (TRN) is a graph framework that helps understand the complex transcriptional regulation mechanisms in the transcription process. Identifying the phenotype-specific transcription regulators is vital to reveal the functional roles of transcription elements in associating the specific phenotypes. Although many methods have been developed towards detecting the phenotype-specific transcription elements based on the static TRN in the past decade, most of them are not satisfactory for elucidating the phenotype-related functional roles of transcription regulators in multiple levels, as the dynamic characteristics of transcription regulators are usually ignored in static models. In this study, we introduce a novel framework called DTGN to identify the phenotype-specific transcription factors (TFs) and pathways by constructing dynamic TRNs. We first design a graph autoencoder model to integrate the phenotype-oriented time-series gene expression data and static TRN to learn the temporal representations of genes. Then, based on the learned temporal representations of genes, we develop a statistical method to construct a series of dynamic TRNs associated with the development of specific phenotypes. Finally, we identify the phenotype-specific TFs and pathways from the constructed dynamic TRNs. Results from multiple phenotypic datasets show that the proposed DTGN framework outperforms most existing methods in identifying phenotype-specific TFs and pathways. Our framework offers a new approach to exploring the functional roles of transcription regulators that associate with specific phenotypes in a dynamic model.
Collapse
Affiliation(s)
- Yang Guo
- School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
| | - Zhiqiang Xiao
- School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
| |
Collapse
|
26
|
K Lodi M, Chernikov A, Ghosh P. COFFEE: consensus single cell-type specific inference for gene regulatory networks. Brief Bioinform 2024; 25:bbae457. [PMID: 39311699 PMCID: PMC11418232 DOI: 10.1093/bib/bbae457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/22/2024] [Accepted: 09/02/2024] [Indexed: 09/26/2024] Open
Abstract
The inference of gene regulatory networks (GRNs) is crucial to understanding the regulatory mechanisms that govern biological processes. GRNs may be represented as edges in a graph, and hence, it have been inferred computationally for scRNA-seq data. A wisdom of crowds approach to integrate edges from several GRNs to create one composite GRN has demonstrated improved performance when compared with individual algorithm implementations on bulk RNA-seq and microarray data. In an effort to extend this approach to scRNA-seq data, we present COFFEE (COnsensus single cell-type speciFic inFerence for gEnE regulatory networks), a Borda voting-based consensus algorithm that integrates information from 10 established GRN inference methods. We conclude that COFFEE has improved performance across synthetic, curated, and experimental datasets when compared with baseline methods. Additionally, we show that a modified version of COFFEE can be leveraged to improve performance on newer cell-type specific GRN inference methods. Overall, our results demonstrate that consensus-based methods with pertinent modifications continue to be valuable for GRN inference at the single cell level. While COFFEE is benchmarked on 10 algorithms, it is a flexible strategy that can incorporate any set of GRN inference algorithms according to user preference. A Python implementation of COFFEE may be found on GitHub: https://github.com/lodimk2/coffee.
Collapse
Affiliation(s)
- Musaddiq K Lodi
- Integrative Life Sciences, Virginia Commonwealth University, 1000 W Cary St, Richmond, VA 23284, United States
| | - Anna Chernikov
- Center for Biological Data Science, Virginia Commonwealth University, 1015 Floyd Ave, Richmond, VA 23284, United States
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, 401 W Main St, Richmond, VA 23284, United States
| |
Collapse
|
27
|
Zhao W, Larschan E, Sandstede B, Singh R. Optimal transport reveals dynamic gene regulatory networks via gene velocity estimation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.12.612590. [PMID: 39345416 PMCID: PMC11429941 DOI: 10.1101/2024.09.12.612590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Inferring gene regulatory networks from gene expression data is an important and challenging problem in the biology community. We propose OTVelo, a methodology that takes time-stamped single-cell gene expression data as input and predicts gene regulation across two time points. It is known that the rate of change of gene expression, which we will refer to as gene velocity, provides crucial information that enhances such inference; however, this information is not always available due to the limitations in sequencing depth. Our algorithm overcomes this limitation by estimating gene velocities using optimal transport. We then infer gene regulation using time-lagged correlation and Granger causality via regularized linear regression. Instead of providing an aggregated network across all time points, our method uncovers the underlying dynamical mechanism across time points. We validate our algorithm on 13 simulated datasets with both synthetic and curated networks and demonstrate its efficacy on 4 experimental data sets.
Collapse
Affiliation(s)
- Wenjun Zhao
- Division of Applied Mathematics, Brown University, Providence, RI 02912, USA
| | - Erica Larschan
- Department of Molecular Biology, Cell Biology and Biochemistry, Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| | - Björn Sandstede
- Division of Applied Mathematics , Brown University, Providence, RI 02912, USA
| | - Ritambhara Singh
- Department of Computer Science, Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| |
Collapse
|
28
|
Chang LY, Hao TY, Wang WJ, Lin CY. Inference of single-cell network using mutual information for scRNA-seq data analysis. BMC Bioinformatics 2024; 25:292. [PMID: 39237886 PMCID: PMC11378379 DOI: 10.1186/s12859-024-05895-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Accepted: 08/08/2024] [Indexed: 09/07/2024] Open
Abstract
BACKGROUND With the advance in single-cell RNA sequencing (scRNA-seq) technology, deriving inherent biological system information from expression profiles at a single-cell resolution has become possible. It has been known that network modeling by estimating the associations between genes could better reveal dynamic changes in biological systems. However, accurately constructing a single-cell network (SCN) to capture the network architecture of each cell and further explore cell-to-cell heterogeneity remains challenging. RESULTS We introduce SINUM, a method for constructing the SIngle-cell Network Using Mutual information, which estimates mutual information between any two genes from scRNA-seq data to determine whether they are dependent or independent in a specific cell. Experiments on various scRNA-seq datasets with different cell numbers based on eight performance indexes (e.g., adjusted rand index and F-measure index) validated the accuracy and robustness of SINUM in cell type identification, superior to the state-of-the-art SCN inference method. Additionally, the SINUM SCNs exhibit high overlap with the human interactome and possess the scale-free property. CONCLUSIONS SINUM presents a view of biological systems at the network level to detect cell-type marker genes/gene pairs and investigate time-dependent changes in gene associations during embryo development. Codes for SINUM are freely available at https://github.com/SysMednet/SINUM .
Collapse
Affiliation(s)
- Lan-Yun Chang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, 300, Taiwan
| | - Ting-Yi Hao
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, 300, Taiwan
| | - Wei-Jie Wang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, 300, Taiwan
| | - Chun-Yu Lin
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, 300, Taiwan.
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, 300, Taiwan.
- Institute of Data Science and Engineering, National Yang Ming Chiao Tung University, Hsinchu, 300, Taiwan.
- Center for Intelligent Drug Systems and Smart Bio-Devices, National Yang Ming Chiao Tung University, Hsinchu, 300, Taiwan.
- Cancer and Immunology Research Center, National Yang Ming Chiao Tung University, Taipei, 112, Taiwan.
- School of Dentistry, Kaohsiung Medical University, Kaohsiung, 807, Taiwan.
| |
Collapse
|
29
|
Segura-Ortiz A, García-Nieto J, Aldana-Montes JF, Navas-Delgado I. Multi-objective context-guided consensus of a massive array of techniques for the inference of Gene Regulatory Networks. Comput Biol Med 2024; 179:108850. [PMID: 39013340 DOI: 10.1016/j.compbiomed.2024.108850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 07/03/2024] [Accepted: 07/03/2024] [Indexed: 07/18/2024]
Abstract
BACKGROUND AND OBJECTIVE Gene Regulatory Network (GRN) inference is a fundamental task in biology and medicine, as it enables a deeper understanding of the intricate mechanisms of gene expression present in organisms. This bioinformatics problem has been addressed in the literature through multiple computational approaches. Techniques developed for inferring from expression data have employed Bayesian networks, ordinary differential equations (ODEs), machine learning, information theory measures and neural networks, among others. The diversity of implementations and their respective customization have led to the emergence of many tools and multiple specialized domains derived from them, understood as subsets of networks with specific characteristics that are challenging to detect a priori. This specialization has introduced significant uncertainty when choosing the most appropriate technique for a particular dataset. This proposal, named MO-GENECI, builds upon the basic idea of the previous proposal GENECI and optimizes consensus among different inference techniques, through a carefully refined multi-objective evolutionary algorithm guided by various objective functions, linked to the biological context at hand. METHODS MO-GENECI has been tested on an extensive and diverse academic benchmark of 106 gene regulatory networks from multiple sources and sizes. The evaluation of MO-GENECI compared its performance to individual techniques using key metrics (AUROC and AUPR) for gene regulatory network inference. Friedman's statistical ranking provided an ordered classification, followed by non-parametric Holm tests to determine statistical significance. RESULTS MO-GENECI's Pareto front approximation facilitates easy selection of an appropriate solution based on generic input data characteristics. The best solution consistently emerged as the winner in all statistical tests, and in many cases, the median precision solution showed no statistically significant difference compared to the winner. CONCLUSIONS MO-GENECI has not only demonstrated achieving more accurate results than individual techniques, but has also overcome the uncertainty associated with the initial choice due to its flexibility and adaptability. It is shown intelligently to select the most suitable techniques for each case. The source code is hosted in a public repository at GitHub under MIT license: https://github.com/AdrianSeguraOrtiz/MO-GENECI. Moreover, to facilitate its installation and use, the software associated with this implementation has been encapsulated in a Python package available at PyPI: https://pypi.org/project/geneci/.
Collapse
Affiliation(s)
- Adrián Segura-Ortiz
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain.
| | - José García-Nieto
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - José F Aldana-Montes
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - Ismael Navas-Delgado
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| |
Collapse
|
30
|
Peng D, Cahan P. OneSC: A computational platform for recapitulating cell state transitions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.31.596831. [PMID: 38895453 PMCID: PMC11185539 DOI: 10.1101/2024.05.31.596831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Computational modelling of cell state transitions has been a great interest of many in the field of developmental biology, cancer biology and cell fate engineering because it enables performing perturbation experiments in silico more rapidly and cheaply than could be achieved in a wet lab. Recent advancements in single-cell RNA sequencing (scRNA-seq) allow the capture of high-resolution snapshots of cell states as they transition along temporal trajectories. Using these high-throughput datasets, we can train computational models to generate in silico 'synthetic' cells that faithfully mimic the temporal trajectories. Here we present OneSC, a platform that can simulate synthetic cells across developmental trajectories using systems of stochastic differential equations govern by a core transcription factors (TFs) regulatory network. Different from the current network inference methods, OneSC prioritizes on generating Boolean network that produces faithful cell state transitions and steady cell states that mimic real biological systems. Applying OneSC to real data, we inferred a core TF network using a mouse myeloid progenitor scRNA-seq dataset and showed that the dynamical simulations of that network generate synthetic single-cell expression profiles that faithfully recapitulate the four myeloid differentiation trajectories going into differentiated cell states (erythrocytes, megakaryocytes, granulocytes and monocytes). Finally, through the in-silico perturbations of the mouse myeloid progenitor core network, we showed that OneSC can accurately predict cell fate decision biases of TF perturbations that closely match with previous experimental observations.
Collapse
Affiliation(s)
- Da Peng
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
| | - Patrick Cahan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
- Institute for Cell Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, Maryland, 21205, USA
| |
Collapse
|
31
|
Lei Y, Huang XT, Guo X, Hang Katie Chan K, Gao L. DeepGRNCS: deep learning-based framework for jointly inferring gene regulatory networks across cell subpopulations. Brief Bioinform 2024; 25:bbae334. [PMID: 38980373 PMCID: PMC11232306 DOI: 10.1093/bib/bbae334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 06/03/2024] [Accepted: 07/01/2024] [Indexed: 07/10/2024] Open
Abstract
Inferring gene regulatory networks (GRNs) allows us to obtain a deeper understanding of cellular function and disease pathogenesis. Recent advances in single-cell RNA sequencing (scRNA-seq) technology have improved the accuracy of GRN inference. However, many methods for inferring individual GRNs from scRNA-seq data are limited because they overlook intercellular heterogeneity and similarities between different cell subpopulations, which are often present in the data. Here, we propose a deep learning-based framework, DeepGRNCS, for jointly inferring GRNs across cell subpopulations. We follow the commonly accepted hypothesis that the expression of a target gene can be predicted based on the expression of transcription factors (TFs) due to underlying regulatory relationships. We initially processed scRNA-seq data by discretizing data scattering using the equal-width method. Then, we trained deep learning models to predict target gene expression from TFs. By individually removing each TF from the expression matrix, we used pre-trained deep model predictions to infer regulatory relationships between TFs and genes, thereby constructing the GRN. Our method outperforms existing GRN inference methods for various simulated and real scRNA-seq datasets. Finally, we applied DeepGRNCS to non-small cell lung cancer scRNA-seq data to identify key genes in each cell subpopulation and analyzed their biological relevance. In conclusion, DeepGRNCS effectively predicts cell subpopulation-specific GRNs. The source code is available at https://github.com/Nastume777/DeepGRNCS.
Collapse
Affiliation(s)
- Yahui Lei
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| | - Xiao-Tai Huang
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| | - Xingli Guo
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| | - Kei Hang Katie Chan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong SAR, China
- Department of Epidemiology and Center for Global Cardiometabolic Health, Brown University, Providence, RI, United States
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| |
Collapse
|
32
|
Zinati Y, Takiddeen A, Emad A. GRouNdGAN: GRN-guided simulation of single-cell RNA-seq data using causal generative adversarial networks. Nat Commun 2024; 15:4055. [PMID: 38744843 PMCID: PMC11525796 DOI: 10.1038/s41467-024-48516-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 05/01/2024] [Indexed: 05/16/2024] Open
Abstract
We introduce GRouNdGAN, a gene regulatory network (GRN)-guided reference-based causal implicit generative model for simulating single-cell RNA-seq data, in silico perturbation experiments, and benchmarking GRN inference methods. Through the imposition of a user-defined GRN in its architecture, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating transcription factors (TFs). Training on six experimental reference datasets, we show that our model captures non-linear TF-gene dependencies and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise, with no user manipulation and only implicit parameterization. GRouNdGAN can synthesize cells under new conditions to perform in silico TF knockout experiments. Benchmarking various GRN inference algorithms reveals that GRouNdGAN effectively bridges the existing gap between simulated and biological data benchmarks of GRN inference algorithms, providing gold standard ground truth GRNs and realistic cells corresponding to the biological system of interest.
Collapse
Affiliation(s)
- Yazdan Zinati
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada
| | - Abdulrahman Takiddeen
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada
| | - Amin Emad
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada.
- Mila, Quebec AI Institute, Montreal, QC, Canada.
- The Rosalind and Morris Goodman Cancer Institute, Montreal, QC, Canada.
| |
Collapse
|
33
|
Lee J, Kim N, Cho KH. Decoding the principle of cell-fate determination for its reverse control. NPJ Syst Biol Appl 2024; 10:47. [PMID: 38710700 PMCID: PMC11074314 DOI: 10.1038/s41540-024-00372-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 04/16/2024] [Indexed: 05/08/2024] Open
Abstract
Understanding and manipulating cell fate determination is pivotal in biology. Cell fate is determined by intricate and nonlinear interactions among molecules, making mathematical model-based quantitative analysis indispensable for its elucidation. Nevertheless, obtaining the essential dynamic experimental data for model development has been a significant obstacle. However, recent advancements in large-scale omics data technology are providing the necessary foundation for developing such models. Based on accumulated experimental evidence, we can postulate that cell fate is governed by a limited number of core regulatory circuits. Following this concept, we present a conceptual control framework that leverages single-cell RNA-seq data for dynamic molecular regulatory network modeling, aiming to identify and manipulate core regulatory circuits and their master regulators to drive desired cellular state transitions. We illustrate the proposed framework by applying it to the reversion of lung cancer cell states, although it is more broadly applicable to understanding and controlling a wide range of cell-fate determination processes.
Collapse
Affiliation(s)
- Jonghoon Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Namhee Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- biorevert, Inc., Daejeon, Republic of Korea
| | - Kwang-Hyun Cho
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
| |
Collapse
|
34
|
Rosebrock D, Vingron M, Arndt PF. Modeling gene expression cascades during cell state transitions. iScience 2024; 27:109386. [PMID: 38500834 PMCID: PMC10946328 DOI: 10.1016/j.isci.2024.109386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 12/14/2023] [Accepted: 02/27/2024] [Indexed: 03/20/2024] Open
Abstract
During cellular processes such as differentiation or response to external stimuli, cells exhibit dynamic changes in their gene expression profiles. Single-cell RNA sequencing (scRNA-seq) can be used to investigate these dynamic changes. To this end, cells are typically ordered along a pseudotemporal trajectory which recapitulates the progression of cells as they transition from one cell state to another. We infer transcriptional dynamics by modeling the gene expression profiles in pseudotemporally ordered cells using a Bayesian inference approach. This enables ordering genes along transcriptional cascades, estimating differences in the timing of gene expression dynamics, and deducing regulatory gene interactions. Here, we apply this approach to scRNA-seq datasets derived from mouse embryonic forebrain and pancreas samples. This analysis demonstrates the utility of the method to derive the ordering of gene dynamics and regulatory relationships critical for proper cellular differentiation and maturation across a variety of developmental contexts.
Collapse
Affiliation(s)
- Daniel Rosebrock
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Peter F. Arndt
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| |
Collapse
|
35
|
Malekpour SA, Haghverdi L, Sadeghi M. Single-cell multi-omics analysis identifies context-specific gene regulatory gates and mechanisms. Brief Bioinform 2024; 25:bbae180. [PMID: 38653489 PMCID: PMC11036345 DOI: 10.1093/bib/bbae180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 01/29/2024] [Accepted: 04/02/2024] [Indexed: 04/25/2024] Open
Abstract
There is a growing interest in inferring context specific gene regulatory networks from single-cell RNA sequencing (scRNA-seq) data. This involves identifying the regulatory relationships between transcription factors (TFs) and genes in individual cells, and then characterizing these relationships at the level of specific cell types or cell states. In this study, we introduce scGATE (single-cell gene regulatory gate) as a novel computational tool for inferring TF-gene interaction networks and reconstructing Boolean logic gates involving regulatory TFs using scRNA-seq data. In contrast to current Boolean models, scGATE eliminates the need for individual formulations and likelihood calculations for each Boolean rule (e.g. AND, OR, XOR). By employing a Bayesian framework, scGATE infers the Boolean rule after fitting the model to the data, resulting in significant reductions in time-complexities for logic-based studies. We have applied assay for transposase-accessible chromatin with sequencing (scATAC-seq) data and TF DNA binding motifs to filter out non-relevant TFs in gene regulations. By integrating single-cell clustering with these external cues, scGATE is able to infer context specific networks. The performance of scGATE is evaluated using synthetic and real single-cell multi-omics data from mouse tissues and human blood, demonstrating its superiority over existing tools for reconstructing TF-gene networks. Additionally, scGATE provides a flexible framework for understanding the complex combinatorial and cooperative relationships among TFs regulating target genes by inferring Boolean logic gates among them.
Collapse
Affiliation(s)
- Seyed Amir Malekpour
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), 19395-5746, Tehran, Iran
| | - Laleh Haghverdi
- Berlin Institute for Medical Systems Biology, Max Delbrück Center (BIMSB-MDC) in the Helmholtz Association, Berlin, Germany
| | - Mehdi Sadeghi
- Department of Medical Genetics, National Institute of Genetic Engineering and Biotechnology, 1497716316, Tehran, Iran
| |
Collapse
|
36
|
Pan X, Zhang X. Studying temporal dynamics of single cells: expression, lineage and regulatory networks. Biophys Rev 2024; 16:57-67. [PMID: 38495440 PMCID: PMC10937865 DOI: 10.1007/s12551-023-01090-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 06/27/2023] [Indexed: 03/19/2024] Open
Abstract
Learning how multicellular organs are developed from single cells to different cell types is a fundamental problem in biology. With the high-throughput scRNA-seq technology, computational methods have been developed to reveal the temporal dynamics of single cells from transcriptomic data, from phenomena on cell trajectories to the underlying mechanism that formed the trajectory. There are several distinct families of computational methods including Trajectory Inference (TI), Lineage Tracing (LT), and Gene Regulatory Network (GRN) Inference which are involved in such studies. This review summarizes these computational approaches which use scRNA-seq data to study cell differentiation and cell fate specification as well as the advantages and limitations of different methods. We further discuss how GRNs can potentially affect cell fate decisions and trajectory structures. Supplementary Information The online version contains supplementary material available at 10.1007/s12551-023-01090-5.
Collapse
Affiliation(s)
- Xinhai Pan
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA
| | - Xiuwei Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA
| |
Collapse
|
37
|
Raja R, Khanum S, Aboulmouna L, Maurya MR, Gupta S, Subramaniam S, Ramkrishna D. Modeling transcriptional regulation of the cell cycle using a novel cybernetic-inspired approach. Biophys J 2024; 123:221-234. [PMID: 38102827 PMCID: PMC10808046 DOI: 10.1016/j.bpj.2023.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 09/18/2023] [Accepted: 12/12/2023] [Indexed: 12/17/2023] Open
Abstract
Quantitative understanding of cellular processes, such as cell cycle and differentiation, is impeded by various forms of complexity ranging from myriad molecular players and their multilevel regulatory interactions, cellular evolution with multiple intermediate stages, lack of elucidation of cause-effect relationships among the many system players, and the computational complexity associated with the profusion of variables and parameters. In this paper, we present a modeling framework based on the cybernetic concept that biological regulation is inspired by objectives embedding rational strategies for dimension reduction, process stage specification through the system dynamics, and innovative causal association of regulatory events with the ability to predict the evolution of the dynamical system. The elementary step of the modeling strategy involves stage-specific objective functions that are computationally determined from experiments, augmented with dynamical network computations involving endpoint objective functions, mutual information, change-point detection, and maximal clique centrality. We demonstrate the power of the method through application to the mammalian cell cycle, which involves thousands of biomolecules engaged in signaling, transcription, and regulation. Starting with a fine-grained transcriptional description obtained from RNA sequencing measurements, we develop an initial model, which is then dynamically modeled using the cybernetic-inspired method, based on the strategies described above. The cybernetic-inspired method is able to distill the most significant interactions from a multitude of possibilities. In addition to capturing the complexity of regulatory processes in a mechanistically causal and stage-specific manner, we identify the functional network modules, including novel cell cycle stages. Our model is able to predict future cell cycles consistent with experimental measurements. We posit that this innovative framework has the promise to extend to the dynamics of other biological processes, with a potential to provide novel mechanistic insights.
Collapse
Affiliation(s)
- Rubesh Raja
- The Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana
| | - Sana Khanum
- The Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana
| | - Lina Aboulmouna
- Department of Bioengineering, University of California San Diego, La Jolla, California
| | - Mano R Maurya
- Department of Bioengineering, University of California San Diego, La Jolla, California
| | - Shakti Gupta
- Department of Bioengineering, University of California San Diego, La Jolla, California
| | - Shankar Subramaniam
- Department of Bioengineering, University of California San Diego, La Jolla, California; Departments of Computer Science and Engineering, Cellular and Molecular Medicine, San Diego Supercomputer Center, and the Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, California.
| | - Doraiswami Ramkrishna
- The Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana.
| |
Collapse
|
38
|
Kim H, Choi H, Lee D, Kim J. A review on gene regulatory network reconstruction algorithms based on single cell RNA sequencing. Genes Genomics 2024; 46:1-11. [PMID: 38032470 DOI: 10.1007/s13258-023-01473-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 10/24/2023] [Indexed: 12/01/2023]
Abstract
BACKGROUND Understanding gene regulatory networks (GRNs) is essential for unraveling the molecular mechanisms governing cellular behavior. With the advent of high-throughput transcriptome measurement technology, researchers have aimed to reverse engineer the biological systems, extracting gene regulatory rules from their outputs, which represented by gene expression data. Bulk RNA sequencing, a widely used method for measuring gene expression, has been employed for GRN reconstruction. However, it falls short in capturing dynamic changes in gene expression at the level of individual cells since it averages gene expression across mixed cell populations. OBJECTIVE In this review, we provide an overview of 15 GRN reconstruction tools and discuss their respective strengths and limitations, particularly in the context of single cell RNA sequencing (scRNA-seq). METHODS Recent advancements in scRNA-seq break new ground of GRN reconstruction. They offer snapshots of the individual cell transcriptomes and capturing dynamic changes. We emphasize how these technological breakthroughs have enhanced GRN reconstruction. CONCLUSION GRN reconstructors can be classified based on their requirement for cellular trajectory, which represents a dynamical cellular process including differentiation, aging, or disease progression. Benchmarking studies support the superiority of GRN reconstructors that do not require trajectory analysis in identifying regulator-target relationships. However, methods equipped with trajectory analysis demonstrate better performance in identifying key regulatory factors. In conclusion, researchers should select a suitable GRN reconstructor based on their specific research objectives.
Collapse
Affiliation(s)
- Hyeonkyu Kim
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul, 06978, Republic of Korea
| | - Hwisoo Choi
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul, 06978, Republic of Korea
| | - Daewon Lee
- School of Art and Technology, Chung-Ang University, 4726 Seodong-Daero, Anseong-Si, Gyeonggi-Do, 17546, Republic of Korea.
| | - Junil Kim
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul, 06978, Republic of Korea.
| |
Collapse
|
39
|
Lizotte S, Young JG, Allard A. Hypergraph reconstruction from uncertain pairwise observations. Sci Rep 2023; 13:21364. [PMID: 38049512 PMCID: PMC10695935 DOI: 10.1038/s41598-023-48081-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 11/22/2023] [Indexed: 12/06/2023] Open
Abstract
The network reconstruction task aims to estimate a complex system's structure from various data sources such as time series, snapshots, or interaction counts. Recent work has examined this problem in networks whose relationships involve precisely two entities-the pairwise case. Here, using Bayesian inference, we investigate the general problem of reconstructing a network in which higher-order interactions are also present. We study a minimal example of this problem, focusing on the case of hypergraphs with interactions between pairs and triplets of vertices, measured imperfectly and indirectly. We derive a Metropolis-Hastings-within-Gibbs algorithm for this model to highlight the unique challenges that come with estimating higher-order models. We show that this approach tends to reconstruct empirical and synthetic networks more accurately than an equivalent graph model without higher-order interactions.
Collapse
Affiliation(s)
- Simon Lizotte
- Département de Physique, de génie Physique et d'optique, Université Laval, Québec, G1V 0A6, Canada
- Centre Interdisciplinaire en Modélisation Mathématique, Université Laval, Québec, G1V 0A6, Canada
| | - Jean-Gabriel Young
- Département de Physique, de génie Physique et d'optique, Université Laval, Québec, G1V 0A6, Canada
- Department of Mathematics and Statistics, University of Vermont, Burlington, VT, 05405, USA
- Vermont Complex Systems Center, University of Vermont, Burlington, VT, 05405, USA
| | - Antoine Allard
- Département de Physique, de génie Physique et d'optique, Université Laval, Québec, G1V 0A6, Canada.
- Centre Interdisciplinaire en Modélisation Mathématique, Université Laval, Québec, G1V 0A6, Canada.
- Vermont Complex Systems Center, University of Vermont, Burlington, VT, 05405, USA.
| |
Collapse
|
40
|
Badia-I-Mompel P, Wessels L, Müller-Dott S, Trimbour R, Ramirez Flores RO, Argelaguet R, Saez-Rodriguez J. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023; 24:739-754. [PMID: 37365273 DOI: 10.1038/s41576-023-00618-5] [Citation(s) in RCA: 126] [Impact Index Per Article: 63.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2023] [Indexed: 06/28/2023]
Abstract
The interplay between chromatin, transcription factors and genes generates complex regulatory circuits that can be represented as gene regulatory networks (GRNs). The study of GRNs is useful to understand how cellular identity is established, maintained and disrupted in disease. GRNs can be inferred from experimental data - historically, bulk omics data - and/or from the literature. The advent of single-cell multi-omics technologies has led to the development of novel computational methods that leverage genomic, transcriptomic and chromatin accessibility information to infer GRNs at an unprecedented resolution. Here, we review the key principles of inferring GRNs that encompass transcription factor-gene interactions from transcriptomics and chromatin accessibility data. We focus on the comparison and classification of methods that use single-cell multimodal data. We highlight challenges in GRN inference, in particular with respect to benchmarking, and potential further developments using additional data modalities.
Collapse
Affiliation(s)
- Pau Badia-I-Mompel
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Lorna Wessels
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Department of Vascular Biology and Tumor Angiogenesis, European Center for Angioscience, Medical Faculty, MannHeim Heidelberg University, Mannheim, Germany
| | - Sophia Müller-Dott
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Rémi Trimbour
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, Paris, France
| | - Ricardo O Ramirez Flores
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany.
| |
Collapse
|
41
|
Shojaee A, Huang SSC. Robust discovery of gene regulatory networks from single-cell gene expression data by Causal Inference Using Composition of Transactions. Brief Bioinform 2023; 24:bbad370. [PMID: 37897702 PMCID: PMC10612495 DOI: 10.1093/bib/bbad370] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 09/06/2023] [Accepted: 09/29/2023] [Indexed: 10/30/2023] Open
Abstract
Gene regulatory networks (GRNs) drive organism structure and functions, so the discovery and characterization of GRNs is a major goal in biological research. However, accurate identification of causal regulatory connections and inference of GRNs using gene expression datasets, more recently from single-cell RNA-seq (scRNA-seq), has been challenging. Here we employ the innovative method of Causal Inference Using Composition of Transactions (CICT) to uncover GRNs from scRNA-seq data. The basis of CICT is that if all gene expressions were random, a non-random regulatory gene should induce its targets at levels different from the background random process, resulting in distinct patterns in the whole relevance network of gene-gene associations. CICT proposes novel network features derived from a relevance network, which enable any machine learning algorithm to predict causal regulatory edges and infer GRNs. We evaluated CICT using simulated and experimental scRNA-seq data in a well-established benchmarking pipeline and showed that CICT outperformed existing network inference methods representing diverse approaches with many-fold higher accuracy. Furthermore, we demonstrated that GRN inference with CICT was robust to different levels of sparsity in scRNA-seq data, the characteristics of data and ground truth, the choice of association measure and the complexity of the supervised machine learning algorithm. Our results suggest aiming at directly predicting causality to recover regulatory relationships in complex biological networks substantially improves accuracy in GRN inference.
Collapse
Affiliation(s)
- Abbas Shojaee
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA
| | - Shao-shan Carol Huang
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA
| |
Collapse
|
42
|
Mao G, Pang Z, Zuo K, Wang Q, Pei X, Chen X, Liu J. Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks. Brief Bioinform 2023; 24:bbad414. [PMID: 37985457 PMCID: PMC10661972 DOI: 10.1093/bib/bbad414] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 10/25/2023] [Accepted: 10/26/2023] [Indexed: 11/22/2023] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful technique for studying gene expression patterns at the single-cell level. Inferring gene regulatory networks (GRNs) from scRNA-seq data provides insight into cellular phenotypes from the genomic level. However, the high sparsity, noise and dropout events inherent in scRNA-seq data present challenges for GRN inference. In recent years, the dramatic increase in data on experimentally validated transcription factors binding to DNA has made it possible to infer GRNs by supervised methods. In this study, we address the problem of GRN inference by framing it as a graph link prediction task. In this paper, we propose a novel framework called GNNLink, which leverages known GRNs to deduce the potential regulatory interdependencies between genes. First, we preprocess the raw scRNA-seq data. Then, we introduce a graph convolutional network-based interaction graph encoder to effectively refine gene features by capturing interdependencies between nodes in the network. Finally, the inference of GRN is obtained by performing matrix completion operation on node features. The features obtained from model training can be applied to downstream tasks such as measuring similarity and inferring causality between gene pairs. To evaluate the performance of GNNLink, we compare it with six existing GRN reconstruction methods using seven scRNA-seq datasets. These datasets encompass diverse ground truth networks, including functional interaction networks, Loss of Function/Gain of Function data, non-specific ChIP-seq data and cell-type-specific ChIP-seq data. Our experimental results demonstrate that GNNLink achieves comparable or superior performance across these datasets, showcasing its robustness and accuracy. Furthermore, we observe consistent performance across datasets of varying scales. For reproducibility, we provide the data and source code of GNNLink on our GitHub repository: https://github.com/sdesignates/GNNLink.
Collapse
Affiliation(s)
- Guo Mao
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Zhengbin Pang
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Ke Zuo
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Qinglin Wang
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Xiangdong Pei
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Xinhai Chen
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Jie Liu
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
- Laboratory of Software Engineering for Complex System, National University of Defense Technology, deya, 410073 Changsha, China
| |
Collapse
|
43
|
Zhao J, Wong CW, Ching WK, Cheng X. NG-SEM: an effective non-Gaussian structural equation modeling framework for gene regulatory network inference from single-cell RNA-seq data. Brief Bioinform 2023; 24:bbad369. [PMID: 37864293 DOI: 10.1093/bib/bbad369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 09/25/2023] [Accepted: 09/29/2023] [Indexed: 10/22/2023] Open
Abstract
Inference of gene regulatory network (GRN) from gene expression profiles has been a central problem in systems biology and bioinformatics in the past decades. The tremendous emergency of single-cell RNA sequencing (scRNA-seq) data brings new opportunities and challenges for GRN inference: the extensive dropouts and complicated noise structure may also degrade the performance of contemporary gene regulatory models. Thus, there is an urgent need to develop more accurate methods for gene regulatory network inference in single-cell data while considering the noise structure at the same time. In this paper, we extend the traditional structural equation modeling (SEM) framework by considering a flexible noise modeling strategy, namely we use the Gaussian mixtures to approximate the complex stochastic nature of a biological system, since the Gaussian mixture framework can be arguably served as a universal approximation for any continuous distributions. The proposed non-Gaussian SEM framework is called NG-SEM, which can be optimized by iteratively performing Expectation-Maximization algorithm and weighted least-squares method. Moreover, the Akaike Information Criteria is adopted to select the number of components of the Gaussian mixture. To probe the accuracy and stability of our proposed method, we design a comprehensive variate of control experiments to systematically investigate the performance of NG-SEM under various conditions, including simulations and real biological data sets. Results on synthetic data demonstrate that this strategy can improve the performance of traditional Gaussian SEM model and results on real biological data sets verify that NG-SEM outperforms other five state-of-the-art methods.
Collapse
Affiliation(s)
- Jiaying Zhao
- Department of Mathematics, The University of Hongkong, Pokfulam road, Hong Kong
| | - Chi-Wing Wong
- Department of Mathematics, The University of Hongkong, Pokfulam road, Hong Kong
| | - Wai-Ki Ching
- Department of Mathematics, The University of Hongkong, Pokfulam road, Hong Kong
| | - Xiaoqing Cheng
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, ShaanXi, China
| |
Collapse
|
44
|
Zeng Y, He Y, Zheng R, Li M. Inferring single-cell gene regulatory network by non-redundant mutual information. Brief Bioinform 2023; 24:bbad326. [PMID: 37715282 DOI: 10.1093/bib/bbad326] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/12/2023] [Accepted: 08/08/2023] [Indexed: 09/17/2023] Open
Abstract
Gene regulatory network plays a crucial role in controlling the biological processes of living creatures. Deciphering the complex gene regulatory networks from experimental data remains a major challenge in system biology. Recent advances in single-cell RNA sequencing technology bring massive high-resolution data, enabling computational inference of cell-specific gene regulatory networks (GRNs). Many relevant algorithms have been developed to achieve this goal in the past years. However, GRN inference is still less ideal due to the extra noises involved in pseudo-time information and large amounts of dropouts in datasets. Here, we present a novel GRN inference method named Normi, which is based on non-redundant mutual information. Normi manipulates these problems by employing a sliding size-fixed window approach on the entire trajectory and conducts average smoothing strategy on the gene expression of the cells in each window to obtain representative cells. To further alleviate the impact of dropouts, we utilize the mixed KSG estimator to quantify the high-order time-delayed mutual information among genes, then filter out the redundant edges by adopting Max-Relevance and Min Redundancy algorithm. Moreover, we determined the optimal time delay for each gene pair by distance correlation. Normi outperforms other state-of-the-art GRN inference methods on both simulated data and single-cell RNA sequencing (scRNA-seq) datasets, demonstrating its superiority in robustness. The performance of Normi in real scRNA-seq data further reveals its ability to identify the key regulators and crucial biological processes.
Collapse
Affiliation(s)
- Yanping Zeng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yongxin He
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
45
|
Wang J, Chen Y, Zou Q. Inferring gene regulatory network from single-cell transcriptomes with graph autoencoder model. PLoS Genet 2023; 19:e1010942. [PMID: 37703293 PMCID: PMC10519590 DOI: 10.1371/journal.pgen.1010942] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 09/25/2023] [Accepted: 08/29/2023] [Indexed: 09/15/2023] Open
Abstract
The gene regulatory structure of cells involves not only the regulatory relationship between two genes, but also the cooperative associations of multiple genes. However, most gene regulatory network inference methods for single cell only focus on and infer the regulatory relationships of pairs of genes, ignoring the global regulatory structure which is crucial to identify the regulations in the complex biological systems. Here, we proposed a graph-based Deep learning model for Regulatory networks Inference among Genes (DeepRIG) from single-cell RNA-seq data. To learn the global regulatory structure, DeepRIG builds a prior regulatory graph by transforming the gene expression of data into the co-expression mode. Then it utilizes a graph autoencoder model to embed the global regulatory information contained in the graph into gene latent embeddings and to reconstruct the gene regulatory network. Extensive benchmarking results demonstrate that DeepRIG can accurately reconstruct the gene regulatory networks and outperform existing methods on multiple simulated networks and real-cell regulatory networks. Additionally, we applied DeepRIG to the samples of human peripheral blood mononuclear cells and triple-negative breast cancer, and presented that DeepRIG can provide accurate cell-type-specific gene regulatory networks inference and identify novel regulators of progression and inhibition.
Collapse
Affiliation(s)
- Jiacheng Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Yaojia Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| |
Collapse
|
46
|
Yuan Q, Duren Z. Continuous lifelong learning for modeling of gene regulation from single cell multiome data by leveraging atlas-scale external data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.01.551575. [PMID: 37577525 PMCID: PMC10418251 DOI: 10.1101/2023.08.01.551575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Accurate context-specific Gene Regulatory Networks (GRNs) inference from genomics data is a crucial task in computational biology. However, existing methods face limitations, such as reliance on gene expression data alone, lower resolution from bulk data, and data scarcity for specific cellular systems. Despite recent technological advancements, including single-cell sequencing and the integration of ATAC-seq and RNA-seq data, learning such complex mechanisms from limited independent data points still presents a daunting challenge, impeding GRN inference accuracy. To overcome this challenge, we present LINGER (LIfelong neural Network for GEne Regulation), a novel deep learning-based method to infer GRNs from single-cell multiome data with paired gene expression and chromatin accessibility data from the same cell. LINGER incorporates both 1) atlas-scale external bulk data across diverse cellular contexts and 2) the knowledge of transcription factor (TF) motif matching to cis-regulatory elements as a manifold regularization to address the challenge of limited data and extensive parameter space in GRN inference. Our results demonstrate that LINGER achieves 2-3 fold higher accuracy over existing methods. LINGER reveals a complex regulatory landscape of genome-wide association studies, enabling enhanced interpretation of disease-associated variants and genes. Additionally, following the GRN inference from a reference sc-multiome data, LINGER allows for the estimation of TF activity solely from bulk or single-cell gene expression data, leveraging the abundance of available gene expression data to identify driver regulators from case-control studies. Overall, LINGER provides a comprehensive tool for robust gene regulation inference from genomics data, empowering deeper insights into cellular mechanisms.
Collapse
Affiliation(s)
- Qiuyue Yuan
- Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC 29646, USA
| | - Zhana Duren
- Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC 29646, USA
| |
Collapse
|
47
|
Zhang S, Pyne S, Pietrzak S, Halberg S, McCalla SG, Siahpirani AF, Sridharan R, Roy S. Inference of cell type-specific gene regulatory networks on cell lineages from single cell omic datasets. Nat Commun 2023; 14:3064. [PMID: 37244909 PMCID: PMC10224950 DOI: 10.1038/s41467-023-38637-9] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 05/10/2023] [Indexed: 05/29/2023] Open
Abstract
Cell type-specific gene expression patterns are outputs of transcriptional gene regulatory networks (GRNs) that connect transcription factors and signaling proteins to target genes. Single-cell technologies such as single cell RNA-sequencing (scRNA-seq) and single cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq), can examine cell-type specific gene regulation at unprecedented detail. However, current approaches to infer cell type-specific GRNs are limited in their ability to integrate scRNA-seq and scATAC-seq measurements and to model network dynamics on a cell lineage. To address this challenge, we have developed single-cell Multi-Task Network Inference (scMTNI), a multi-task learning framework to infer the GRN for each cell type on a lineage from scRNA-seq and scATAC-seq data. Using simulated and real datasets, we show that scMTNI is a broadly applicable framework for linear and branching lineages that accurately infers GRN dynamics and identifies key regulators of fate transitions for diverse processes such as cellular reprogramming and differentiation.
Collapse
Affiliation(s)
- Shilu Zhang
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Saptarshi Pyne
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Stefan Pietrzak
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI, USA
| | - Spencer Halberg
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Sunnie Grace McCalla
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Alireza Fotuhi Siahpirani
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Rupa Sridharan
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
48
|
Xu J, Zhang A, Liu F, Zhang X. STGRNS: an interpretable transformer-based method for inferring gene regulatory networks from single-cell transcriptomic data. Bioinformatics 2023; 39:btad165. [PMID: 37004161 PMCID: PMC10085635 DOI: 10.1093/bioinformatics/btad165] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Revised: 02/28/2023] [Accepted: 03/25/2023] [Indexed: 04/03/2023] Open
Abstract
MOTIVATION Single-cell RNA-sequencing (scRNA-seq) technologies provide an opportunity to infer cell-specific gene regulatory networks (GRNs), which is an important challenge in systems biology. Although numerous methods have been developed for inferring GRNs from scRNA-seq data, it is still a challenge to deal with cellular heterogeneity. RESULTS To address this challenge, we developed an interpretable transformer-based method namely STGRNS for inferring GRNs from scRNA-seq data. In this algorithm, gene expression motif technique was proposed to convert gene pairs into contiguous sub-vectors, which can be used as input for the transformer encoder. By avoiding missing phase-specific regulations in a network, gene expression motif can improve the accuracy of GRN inference for different types of scRNA-seq data. To assess the performance of STGRNS, we implemented the comparative experiments with some popular methods on extensive benchmark datasets including 21 static and 27 time-series scRNA-seq dataset. All the results show that STGRNS is superior to other comparative methods. In addition, STGRNS was also proved to be more interpretable than "black box" deep learning methods, which are well-known for the difficulty to explain the predictions clearly. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/zhanglab-wbgcas/STGRNS.
Collapse
Affiliation(s)
- Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
| | - Fang Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074 China
| |
Collapse
|
49
|
Shen B, Coruzzi G, Shasha D. EnsInfer: a simple ensemble approach to network inference outperforms any single method. BMC Bioinformatics 2023; 24:114. [PMID: 36964499 PMCID: PMC10037858 DOI: 10.1186/s12859-023-05231-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 03/15/2023] [Indexed: 03/26/2023] Open
Abstract
This study evaluates both a variety of existing base causal inference methods and a variety of ensemble methods. We show that: (i) base network inference methods vary in their performance across different datasets, so a method that works poorly on one dataset may work well on another; (ii) a non-homogeneous ensemble method in the form of a Naive Bayes classifier leads overall to as good or better results than using the best single base method or any other ensemble method; (iii) for the best results, the ensemble method should integrate all methods that satisfy a statistical test of normality on training data. The resulting ensemble model EnsInfer easily integrates all kinds of RNA-seq data as well as new and existing inference methods. The paper categorizes and reviews state-of-the-art underlying methods, describes the EnsInfer ensemble approach in detail, and presents experimental results. The source code and data used will be made available to the community upon publication.
Collapse
Affiliation(s)
- Bingran Shen
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, 251 Mercer St, New York, 10012 USA
| | - Gloria Coruzzi
- Department of Biology, Center for Genomics and Systems Biology, New York University, 12 Waverly Pl, New York, 10003 USA
| | - Dennis Shasha
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, 251 Mercer St, New York, 10012 USA
| |
Collapse
|
50
|
Mao G, Pang Z, Zuo K, Liu J. Gene Regulatory Network Inference Using Convolutional Neural Networks from scRNA-seq Data. J Comput Biol 2023; 30:619-631. [PMID: 36877552 DOI: 10.1089/cmb.2022.0355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023] Open
Abstract
In recent years, with the rapid development of single-cell sequencing technology, this brings new opportunities and challenges to reconstruct gene regulatory networks. On the one hand, scRNA-seq data reveal statistical information of gene expression at single-cell resolution, which is beneficial to construct gene expression regulatory networks. On the other hand, the noise and dropout of single-cell data bring great difficulties to the analysis of scRNA-seq data, resulting in lower accuracy of gene regulatory networks reconstructed by traditional methods. In this article, we propose a novel supervised convolutional neural network (CNNSE), which can extract gene expression information from 2D co-expression matrices of gene doublets and identify interactions between genes. Our method can avoid the loss of extreme point interference by constructing a 2D co-expression matrix of gene pairs and significantly improve the regulation precision between gene pairs. And the CNNSE model is able to obtain detailed and high-level semantic information from the 2D co-expression matrix. Our method achieves satisfactory results on simulated data [accuracy (ACC): 0.712, F1: 0.724]. On two real scRNA-seq datasets, our method exhibits higher stability and accuracy in inference tasks compared with other existing gene regulatory network inference algorithms.
Collapse
Affiliation(s)
- Guo Mao
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China
| | - Zhengbin Pang
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China
| | - Ke Zuo
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China
| | - Jie Liu
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China
- Laboratory of Software Engineering for Complex System, National University of Defense Technology, Changsha, China
| |
Collapse
|