1
|
Wang Z, Meng J, Li H, Dai Q, Lin X, Luan Y. Attention-augmented multi-domain cooperative graph representation learning for molecular interaction prediction. Neural Netw 2025; 186:107265. [PMID: 39987715 DOI: 10.1016/j.neunet.2025.107265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 01/23/2025] [Accepted: 02/07/2025] [Indexed: 02/25/2025]
Abstract
Accurate identification of molecular interactions is crucial for biological network analysis, which can provide valuable insights into fundamental regulatory mechanisms. Despite considerable progress driven by computational advancements, existing methods often rely on task-specific prior knowledge or inherent structural properties of molecules, which limits their generalizability and applicability. Recently, graph-based methods have emerged as a promising approach for predicting links in molecular networks. However, most of these methods focus primarily on aggregating topological information within individual domains, leading to an inadequate characterization of molecular interactions. To mitigate these challenges, we propose AMCGRL, a generalized multi-domain cooperative graph representation learning framework for multifarious molecular interaction prediction tasks. Concretely, AMCGRL incorporates multiple graph encoders to simultaneously learn molecular representations from both intra-domain and inter-domain graphs in a comprehensive manner. Then, the cross-domain decoder is employed to bridge these graph encoders to facilitate the extraction of task-relevant information across different domains. Furthermore, a hierarchical mutual attention mechanism is developed to capture complex pairwise interaction patterns between distinct types of molecules through inter-molecule communicative learning. Extensive experiments conducted on the various datasets demonstrate the superior representation learning capability of AMCGRL compared to the state-of-the-art methods, proving its effectiveness in advancing the prediction of molecular interactions.
Collapse
Affiliation(s)
- Zhaowei Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Haibin Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Qiguo Dai
- School of Computer Science and Engineering, Dalian Minzu University, Dalian 116600, China.
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian 116024, China.
| |
Collapse
|
2
|
Zhang H, Zhang W, Zheng X, Li Y. scRDEN: single-cell dynamic gene rank differential expression network and robust trajectory inference. Sci Rep 2025; 15:16963. [PMID: 40374885 DOI: 10.1038/s41598-025-01969-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Accepted: 05/09/2025] [Indexed: 05/18/2025] Open
Abstract
The remarkable advancement of single-cell RNA sequencing (scRNA-seq) technology has empowered researchers to probe gene expression at the single-cell level with unprecedented precision. To gain a profound understanding of the heterogeneity inherent in cell fate determination, a central challenge lies in the comprehensive analysis of the dynamic regulatory alterations that underlie transcriptional differences and the accurate inference of the differentiation trajectory. Here, we propose the method scRDEN, a robust framework that infers important cell sub-populations and differential expression networks of multiple genes along the differentiation directions of each branch by converting the unstable gene expression values in cells into relatively stable gene-gene interactions (global features) and extracting the order of differential expression (network features), and further integrating the expression features of different dimension reduction methods. When applied to five published scRNA-seq datasets from human and mouse cell differentiation, scRDEN not only successfully captures the stable cell subpopulations with potential marker genes, measures the transcriptional differences of gene pairs to identify the rank differential expression network along the differentiation direction of each branch. In addition, in multiple gene rank differential expression networks, the rank expression directly related to transcription factors/marker genes shows a significant strengthening and weakening trend along with their expression changes, and the distribution of diversity and cluster coefficient show a non-monotonic change trend, including the cases of increasing first and then decreasing or decreasing first and then increasing. This may correspond to the mechanism of cells gradually differentiating into stable functions. It is particularly noteworthy that scRDEN method yielded exceptional results when applied to the large-scale, multi-branched, double-batch mouse dentate gyrus data. This outstanding performance provides novel and valuable insights into large-scale, multi-batch trajectory inference and the study of transcriptional mechanism regulation during the processes of differentiation and development.
Collapse
Affiliation(s)
- Han Zhang
- School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan, 430073, China
| | - Wei Zhang
- School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan, 430073, China
| | - Xiaoying Zheng
- School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan, 430073, China.
| | - Yuanyuan Li
- School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan, 430073, China.
| |
Collapse
|
3
|
Wang JC, Chen YJ, Zou Q. GRACE: Unveiling Gene Regulatory Networks With Causal Mechanistic Graph Neural Networks in Single-Cell RNA-Sequencing Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:9005-9017. [PMID: 38896510 DOI: 10.1109/tnnls.2024.3412753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Reconstructing gene regulatory networks (GRNs) using single-cell RNA sequencing (scRNA-seq) data holds great promise for unraveling cellular fate development and heterogeneity. While numerous machine-learning methods have been proposed to infer GRNs from scRNA-seq gene expression data, many of them operate solely in a statistical or black box manner, limiting their capacity for making causal inferences between genes. In this study, we introduce GRN inference with Accuracy and Causal Explanation (GRACE), a novel graph-based causal autoencoder framework that combines a structural causal model (SCM) with graph neural networks (GNNs) to enable GRN inference and gene causal reasoning from scRNA-seq data. By explicitly modeling causal relationships between genes, GRACE facilitates the learning of regulatory context and gene embeddings. With the learned gene signals, our model successfully decoding the causal structures and alleviates the accurate determination of multiple attributes of gene regulation that is important to determine the regulatory levels. Through extensive evaluations on seven benchmarks, we demonstrate that GRACE outperforms 14 state-of-the-art GRN inference methods, with the incorporation of causal mechanisms significantly enhancing the accuracy of GRN and gene causality inference. Furthermore, the application to human peripheral blood mononuclear cell (PBMC) samples reveals cell type-specific regulators in monocyte phagocytosis and immune regulation, validated through network analysis and functional enrichment analysis.
Collapse
|
4
|
Wei PJ, Jin HW, Gao Z, Su Y, Zheng CH. GAEDGRN: reconstruction of gene regulatory networks based on gravity-inspired graph autoencoders. Brief Bioinform 2025; 26:bbaf232. [PMID: 40415678 DOI: 10.1093/bib/bbaf232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2025] [Revised: 04/25/2025] [Accepted: 05/04/2025] [Indexed: 05/27/2025] Open
Abstract
Reconstructing high-resolution gene regulatory networks (GRNs) based on single-cell RNA sequencing data provides an opportunity to gain insight into disease pathogenesis. At present, there are a large number of GRN reconstruction methods based on graph neural networks, and they can obtain excellent performance in GRN inference by extracting network structure features. However, most of these methods fail to fully exploit the directional characteristics or even ignore them when extracting network structural features. To this end, a novel framework called GAEDGRN is proposed based on gravity-inspired graph autoencoder (GIGAE) to infer potential causal relationships between genes. Among them, GIGAE can help us capture the complex directed network topology in GRN. Additionally, due to the uneven distribution of the latent vectors generated by the graph autoencoder, a random walk-based method is used to regularize the latent vectors learnt by the encoder. Furthermore, considering that some genes in GRN usually have a significant impact on biological functions, GAEDGRN designs a gene importance score calculation method and pays attention to genes with high importance in the process of GRN reconstruction. Experimental results on seven cell types of three GRN types show that GAEDGRN achieves high accuracy and strong robustness. Moreover, a case study on human embryonic stem cells demonstrates that GAEDGRN can help identify important genes.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institute of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Huai-Wan Jin
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Zhen Gao
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Yansen Su
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Chun-Hou Zheng
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| |
Collapse
|
5
|
Wang K, Li Y, Liu F, Luan X, Wang X, Zhou J. GRLGRN: graph representation-based learning to infer gene regulatory networks from single-cell RNA-seq data. BMC Bioinformatics 2025; 26:108. [PMID: 40251476 PMCID: PMC12008888 DOI: 10.1186/s12859-025-06116-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2024] [Accepted: 03/18/2025] [Indexed: 04/20/2025] Open
Abstract
BACKGROUND A gene regulatory network (GRN) is a graph-level representation that describes the regulatory relationships between transcription factors and target genes in cells. The reconstruction of GRNs can help investigate cellular dynamics, drug design, and metabolic systems, and the rapid development of single-cell RNA sequencing (scRNA-seq) technology provides important opportunities while posing significant challenges for reconstructing GRNs. A number of methods for inferring GRNs have been proposed in recent years based on traditional machine learning and deep learning algorithms. However, inferring the GRN from scRNA-seq data remains challenging owing to cellular heterogeneity, measurement noise, and data dropout. RESULTS In this study, we propose a deep learning model called graph representational learning GRN (GRLGRN) to infer the latent regulatory dependencies between genes based on a prior GRN and data on the profiles of single-cell gene expressions. GRLGRN uses a graph transformer network to extract implicit links from the prior GRN, and encodes the features of genes by using both an adjacency matrix of implicit links and a matrix of the profile of gene expression. Moreover, it uses attention mechanisms to improve feature extraction, and feeds the refined gene embeddings into an output module to infer gene regulatory relationships. To evaluate the performance of GRLGRN, we compared it with prevalent models and performed ablation experiments on seven cell-line datasets with three ground-truth networks. The results showed that GRLGRN achieved the best predictions in AUROC and AUPRC on 78.6% and 80.9% of the datasets, and achieved an average improvement of 7.3% in AUROC and 30.7% in AUPRC. The interpretation discussion and the network visualization were conducted. CONCLUSIONS The experimental results and case studies illustrate the considerable performance of GRLGRN in predicting gene interactions and provide interpretability for the prediction tasks, such as identifying hub genes in the network and uncovering implicit links.
Collapse
Affiliation(s)
- Kai Wang
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
| | - Yulong Li
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
| | - Fei Liu
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
| | - Xiaoli Luan
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
| | - Xinglong Wang
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
- Key Laboratory of Industrial Biotechnology, Ministry of Education and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
| | - Jingwen Zhou
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China.
- Key Laboratory of Industrial Biotechnology, Ministry of Education and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China.
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China.
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China.
| |
Collapse
|
6
|
Zhou Z, Wei J, Liu M, Zhuo L, Fu X, Zou Q. AnomalGRN: deciphering single-cell gene regulation network with graph anomaly detection. BMC Biol 2025; 23:73. [PMID: 40069807 PMCID: PMC11900578 DOI: 10.1186/s12915-025-02177-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Accepted: 02/25/2025] [Indexed: 03/14/2025] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) is now essential for cellular-level gene expression studies and deciphering complex gene regulatory mechanisms. Deep learning methods, when combined with scRNA-seq technology, transform gene regulation research into graph link prediction tasks. However, these methods struggle to mitigate the impact of noisy data in gene regulatory networks (GRNs) and address the significant imbalance between positive and negative links. RESULTS Consequently, we introduce the AnomalGRN model, focusing on heterogeneity and sparsification to elucidate complex regulatory mechanisms within GRNs. Initially, we consider gene pairs as nodes to construct new networks, thereby converting gene regulation prediction into a node prediction task. Considering the imbalance between positive and negative links in GRNs, we further adapt this issue into a graph anomaly detection (GAD) task, marking the first application of anomaly detection to GRN analysis. Introducing the cosine metric rule enables the AnomalGRN model to differentiate between homogeneity and heterogeneity among nodes in the reconstructed GRNs. The adoption of graph structure sparsification technology reduces noisy data impact and optimizes node representation. CONCLUSIONS
Collapse
Affiliation(s)
- Zhecheng Zhou
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, 325027, China
| | - Jinhang Wei
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, 325027, China
| | - Mingzhe Liu
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, 325027, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, 325027, China.
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410012, China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 611730, China.
| |
Collapse
|
7
|
Han M, Chen X, Li X, Ma J, Chen T, Yang C, Wang J, Li Y, Guo W, Zhu Y. MulNet: a scalable framework for reconstructing intra- and intercellular signaling networks from bulk and single-cell RNA-seq data. Brief Bioinform 2025; 26:bbaf081. [PMID: 40095604 PMCID: PMC11912874 DOI: 10.1093/bib/bbaf081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 02/02/2025] [Accepted: 02/13/2025] [Indexed: 03/19/2025] Open
Abstract
Gene expression involves complex interactions between DNA, RNA, proteins, and small molecules. However, most existing molecular networks are built on limited interaction types, resulting in a fragmented understanding of gene regulation. Here, we present MulNet, a framework that organizes diverse molecular interactions underlying gene expression data into a scalable multilayer network. Additionally, MulNet can accurately identify gene modules and key regulators within this network. When applied across diverse cancer datasets, MulNet outperformed state-of-the-art methods in identifying biologically relevant modules. MulNet analysis of RNA-seq data from colon cancer revealed numerous well-established cancer regulators and a promising new therapeutic target, miR-8485, along with several downstream pathways it governs to inhibit tumor growth. MulNet analysis of single-cell RNA-seq data from head and neck cancer revealed intricate communication networks between fibroblasts and malignant cells mediated by transcription factors and cytokines. Overall, MulNet enables high-resolution reconstruction of intra- and intercellular communication from both bulk and single-cell data. The MulNet code and application are available at https://github.com/free1234hm/MulNet.
Collapse
Affiliation(s)
- Mingfei Han
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No. 38, Life Science Park Road, Changping District, Beijing 102206, China
| | - Xiaoqing Chen
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No. 38, Life Science Park Road, Changping District, Beijing 102206, China
| | - Xiao Li
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No. 38, Life Science Park Road, Changping District, Beijing 102206, China
| | - Jie Ma
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No. 38, Life Science Park Road, Changping District, Beijing 102206, China
| | - Tao Chen
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No. 38, Life Science Park Road, Changping District, Beijing 102206, China
| | - Chunyuan Yang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No. 38, Life Science Park Road, Changping District, Beijing 102206, China
| | - Juan Wang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No. 38, Life Science Park Road, Changping District, Beijing 102206, China
| | - Yingxing Li
- Central Research Laboratory, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 1 Shuaifuyuan Wangfujing Dongcheng District, Beijing 100730, China
| | - Wenting Guo
- Central Research Laboratory, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 1 Shuaifuyuan Wangfujing Dongcheng District, Beijing 100730, China
| | - Yunping Zhu
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No. 38, Life Science Park Road, Changping District, Beijing 102206, China
| |
Collapse
|
8
|
Yu W, Lin Z, Lan M, Ou-Yang L. GCLink: a graph contrastive link prediction framework for gene regulatory network inference. Bioinformatics 2025; 41:btaf074. [PMID: 39960893 PMCID: PMC11881698 DOI: 10.1093/bioinformatics/btaf074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 01/10/2025] [Accepted: 02/13/2025] [Indexed: 03/06/2025] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) unveil the intricate interactions among genes, pivotal in elucidating the complex biological processes within cells. The advent of single-cell RNA-sequencing (scRNA-seq) enables the inference of GRNs at single-cell resolution. However, the majority of current supervised network inference methods typically concentrate on predicting pairwise gene regulatory interaction, thus failing to fully exploit correlations among all genes and exhibiting limited generalization performance. RESULTS To address these issues, we propose a graph contrastive link prediction (GCLink) model to infer potential gene regulatory interactions from scRNA-seq data. Based on known gene regulatory interactions and scRNA-seq data, GCLink introduces a graph contrastive learning strategy to aggregate the feature and neighborhood information of genes to learn their representations. This approach reduces the dependence of our model on sample size and enhance its ability in predicting potential gene regulatory interactions. Extensive experiments on real scRNA-seq datasets demonstrate that GCLink outperforms other state-of-the-art methods in most cases. Furthermore, by pretraining GCLink on a source cell line with abundant known regulatory interactions and fine-tuning it on a target cell line with limited amount of known interactions, our GCLink model exhibits good performance in GRN inference, demonstrating its effectiveness in inferring GRNs from datasets with limited known interactions. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/Yoyiming/GCLink.
Collapse
Affiliation(s)
- Weiming Yu
- Guangdong Provincial Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
| | - Zerun Lin
- Guangdong Provincial Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
| | - Miaofang Lan
- Guangdong Provincial Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
| | - Le Ou-Yang
- Guangdong Laboratory of Machine Perception and Intelligent Computing, Faculty of Engineering, Shenzhen MSU-BIT University, Shenzhen 518116, China
| |
Collapse
|
9
|
Chen L, Dautle M, Gao R, Zhang S, Chen Y. Inferring gene regulatory networks from time-series scRNA-seq data via GRANGER causal recurrent autoencoders. Brief Bioinform 2025; 26:bbaf089. [PMID: 40062616 PMCID: PMC11891664 DOI: 10.1093/bib/bbaf089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 01/26/2025] [Accepted: 02/18/2025] [Indexed: 05/13/2025] Open
Abstract
The development of single-cell RNA sequencing (scRNA-seq) technology provides valuable data resources for inferring gene regulatory networks (GRNs), enabling deeper insights into cellular mechanisms and diseases. While many methods exist for inferring GRNs from static scRNA-seq data, current approaches face challenges in accurately handling time-series scRNA-seq data due to high noise levels and data sparsity. The temporal dimension introduces additional complexity by requiring models to capture dynamic changes, increasing sensitivity to noise, and exacerbating data sparsity across time points. In this study, we introduce GRANGER, an unsupervised deep learning-based method that integrates multiple advanced techniques, including a recurrent variational autoencoder, GRANGER causality, sparsity-inducing penalties, and negative binomial (NB)-based loss functions, to infer GRNs. GRANGER was evaluated using multiple popular benchmarking datasets, where it demonstrated superior performance compared to eight well-known GRN inference methods. The integration of a NB-based loss function and sparsity-inducing penalties in GRANGER significantly enhanced its capacity to address dropout noise and sparsity in scRNA-seq data. Additionally, GRANGER exhibited robustness against high levels of dropout noise. We applied GRANGER to scRNA-seq data from the whole mouse brain obtained through the BRAIN Initiative project and identified GRNs for five transcription regulators: E2f7, Gbx1, Sox10, Prox1, and Onecut2, which play crucial roles in diverse brain cell types. The inferred GRNs not only recalled many known regulatory relationships but also revealed sets of novel regulatory interactions with functional potential. These findings demonstrate that GRANGER is a highly effective tool for real-world applications in discovering novel gene regulatory relationships.
Collapse
Affiliation(s)
- Liang Chen
- College of Computer and Information Engineering, Tianjin Normal University, 393 Binshui W Ave, Tianjin, Tianjin 300387, China
| | - Madison Dautle
- Department of Biological and Biomedical Sciences, Rowan University, 201 Mullica Hill Road, Glassboro, NJ 08028, United States
| | - Ruoying Gao
- College of Computer and Information Engineering, Tianjin Normal University, 393 Binshui W Ave, Tianjin, Tianjin 300387, China
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, 393 Binshui W Ave, Tianjin, Tianjin 300387, China
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, 201 Mullica Hill Road, Glassboro, NJ 08028, United States
| |
Collapse
|
10
|
Gao Z, Su Y, Tang J, Jin H, Ding Y, Cao RF, Wei PJ, Zheng CH. AttentionGRN: a functional and directed graph transformer for gene regulatory network reconstruction from scRNA-seq data. Brief Bioinform 2025; 26:bbaf118. [PMID: 40116659 PMCID: PMC11926986 DOI: 10.1093/bib/bbaf118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Revised: 02/12/2025] [Accepted: 02/27/2025] [Indexed: 03/23/2025] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) enables the reconstruction of cell type-specific gene regulatory networks (GRNs), offering detailed insights into gene regulation at high resolution. While graph neural networks have become widely used for GRN inference, their message-passing mechanisms are often limited by issues such as over-smoothing and over-squashing, which hinder the preservation of essential network structure. To address these challenges, we propose a novel graph transformer-based model, AttentionGRN, which leverages soft encoding to enhance model expressiveness and improve the accuracy of GRN inference from scRNA-seq data. Furthermore, the GRN-oriented message aggregation strategies are designed to capture both the directed network structure information and functional information inherent in GRNs. Specifically, we design directed structure encoding to facilitate the learning of directed network topologies and employ functional gene sampling to capture key functional modules and global network structure. Our extensive experiments, conducted on 88 datasets across two distinct tasks, demonstrate that AttentionGRN consistently outperforms existing methods. Furthermore, AttentionGRN has been successfully applied to reconstruct cell type-specific GRNs for human mature hepatocytes, revealing novel hub genes and previously unidentified transcription factor-target gene regulatory associations.
Collapse
Affiliation(s)
- Zhen Gao
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Yansen Su
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Jin Tang
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Huaiwan Jin
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Yun Ding
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Rui-Fen Cao
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Pi-Jing Wei
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institute of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Chun-Hou Zheng
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| |
Collapse
|
11
|
Stock M, Losert C, Zambon M, Popp N, Lubatti G, Hörmanseder E, Heinig M, Scialdone A. Leveraging prior knowledge to infer gene regulatory networks from single-cell RNA-sequencing data. Mol Syst Biol 2025; 21:214-230. [PMID: 39939367 PMCID: PMC11876610 DOI: 10.1038/s44320-025-00088-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 01/29/2025] [Accepted: 01/30/2025] [Indexed: 02/14/2025] Open
Abstract
Many studies have used single-cell RNA sequencing (scRNA-seq) to infer gene regulatory networks (GRNs), which are crucial for understanding complex cellular regulation. However, the inherent noise and sparsity of scRNA-seq data present significant challenges to accurate GRN inference. This review explores one promising approach that has been proposed to address these challenges: integrating prior knowledge into the inference process to enhance the reliability of the inferred networks. We categorize common types of prior knowledge, such as experimental data and curated databases, and discuss methods for representing priors, particularly through graph structures. In addition, we classify recent GRN inference algorithms based on their ability to incorporate these priors and assess their performance in different contexts. Finally, we propose a standardized benchmarking framework to evaluate algorithms more fairly, ensuring biologically meaningful comparisons. This review provides guidance for researchers selecting GRN inference methods and offers insights for developers looking to improve current approaches and foster innovation in the field.
Collapse
Affiliation(s)
- Marco Stock
- Helmholtz Center Munich Institute of Epigenetics und Stem Cells, Munich, Germany
- Helmholtz Center Munich Institute of Computational Biology, Munich, Germany
- Helmholtz Center Munich Institute of Functional Epigenetics, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Corinna Losert
- Helmholtz Center Munich Institute of Computational Biology, Munich, Germany
- Department of Computer Science, TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Matteo Zambon
- Helmholtz Center Munich Institute of Epigenetics und Stem Cells, Munich, Germany
- Helmholtz Center Munich Institute of Computational Biology, Munich, Germany
- Helmholtz Center Munich Institute of Functional Epigenetics, Munich, Germany
| | - Niclas Popp
- Helmholtz Center Munich Institute of Epigenetics und Stem Cells, Munich, Germany
- Helmholtz Center Munich Institute of Computational Biology, Munich, Germany
- Helmholtz Center Munich Institute of Functional Epigenetics, Munich, Germany
| | - Gabriele Lubatti
- Helmholtz Center Munich Institute of Epigenetics und Stem Cells, Munich, Germany
- Helmholtz Center Munich Institute of Computational Biology, Munich, Germany
- Helmholtz Center Munich Institute of Functional Epigenetics, Munich, Germany
| | - Eva Hörmanseder
- Helmholtz Center Munich Institute of Epigenetics und Stem Cells, Munich, Germany
| | - Matthias Heinig
- Helmholtz Center Munich Institute of Computational Biology, Munich, Germany
- Department of Computer Science, TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- German Centre for Cardiovascular Research (DZHK), Munich Heart Association, Partner Site Munich, Berlin, Germany
| | - Antonio Scialdone
- Helmholtz Center Munich Institute of Epigenetics und Stem Cells, Munich, Germany.
- Helmholtz Center Munich Institute of Computational Biology, Munich, Germany.
- Helmholtz Center Munich Institute of Functional Epigenetics, Munich, Germany.
| |
Collapse
|
12
|
Xu J, Lu C, Jin S, Meng Y, Fu X, Zeng X, Nussinov R, Cheng F. Deep learning-based cell-specific gene regulatory networks inferred from single-cell multiome data. Nucleic Acids Res 2025; 53:gkaf138. [PMID: 40037709 PMCID: PMC11879466 DOI: 10.1093/nar/gkaf138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 01/03/2025] [Accepted: 02/13/2025] [Indexed: 03/06/2025] Open
Abstract
Gene regulatory networks (GRNs) provide a global representation of how genetic/genomic information is transferred in living systems and are a key component in understanding genome regulation. Single-cell multiome data provide unprecedented opportunities to reconstruct GRNs at fine-grained resolution. However, the inference of GRNs is hindered by insufficient single omic profiles due to the characteristic high loss rate of single-cell sequencing data. In this study, we developed scMultiomeGRN, a deep learning framework to infer transcription factor (TF) regulatory networks via unique integration of single-cell genomic (single-cell RNA sequencing) and epigenomic (single-cell ATAC sequencing) data. We create scMultiomeGRN to elucidate these networks by conceptualizing TF network graph structures. Specifically, we build modality-specific neighbor aggregators and cross-modal attention modules to learn latent representations of TFs from single-cell multi-omics. We demonstrate that scMultiomeGRN outperforms state-of-the-art models on multiple benchmark datasets involved in diseases and health. Via scMultiomeGRN, we identified Alzheimer's disease-relevant regulatory network of SPI1 and RUNX1 for microglia. In summary, scMultiomeGRN offers a deep learning framework to identify cell type-specific gene regulatory network from single-cell multiome data.
Collapse
Affiliation(s)
- Junlin Xu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei 430065, China
| | - Changcheng Lu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Shuting Jin
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei 430065, China
| | - Yajie Meng
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, Hubei 430200, China
| | - Xiangzheng Fu
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Ruth Nussinov
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute at Frederick, Frederick, MD 21702, United States
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Feixiong Cheng
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, United States
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, United States
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, United States
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, United States
| |
Collapse
|
13
|
Sun Y, Gao J. HGATLink: single-cell gene regulatory network inference via the fusion of heterogeneous graph attention networks and transformer. BMC Bioinformatics 2025; 26:49. [PMID: 39934680 PMCID: PMC11817978 DOI: 10.1186/s12859-025-06071-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Accepted: 01/29/2025] [Indexed: 02/13/2025] Open
Abstract
BACKGROUND Gene regulatory networks (GRNs) involve complex regulatory relationships between genes and play important roles in the study of various biological systems and diseases. The introduction of single-cell sequencing (scRNA-seq) technology has allowed gene regulation studies to be carried out on specific cell types, providing the opportunity to accurately infer gene regulatory networks. However, the sparsity and noise problems of single-cell sequencing data pose challenges for gene regulatory network inference, and although many gene regulatory network inference methods have been proposed, they often fail to eliminate transitive interactions or do not address multilevel relationships and nonlinear features in the graph data well. RESULTS On the basis of the above limitations, we propose a gene regulatory network inference framework named HGATLink. HGATLink combines the heterogeneous graph attention network and simplified transformer to capture complex interactions effectively between genes in low-dimensional space via matrix decomposition techniques, which not only enhances the ability to model complex heterogeneous graph structures and alleviate transitive interactions, but also effectively captures the long-range dependencies between genes to ensure more accurate prediction. CONCLUSIONS Compared with 10 state-of-the-art GRN inference methods on 14 scRNA-seq datasets under two metrics, AUROC and AUPRC, HGATLink shows good stability and accuracy in gene regulatory network inference tasks.
Collapse
Affiliation(s)
- Yao Sun
- Department of Computer Science and Technology, College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, 010011, Inner Mongolia, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research, Hohhot, 010018, Inner Mongolia, China
| | - Jing Gao
- Department of Computer Science and Technology, College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, 010011, Inner Mongolia, China.
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research, Hohhot, 010018, Inner Mongolia, China.
| |
Collapse
|
14
|
Kommu S, Wang Y, Wang Y, Wang X. Prediction of Gene Regulatory Connections with Joint Single-Cell Foundation Models and Graph-Based Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.12.16.628715. [PMID: 39975293 PMCID: PMC11838224 DOI: 10.1101/2024.12.16.628715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Motivation Single-cell RNA sequencing (scRNA-seq) data offers unprecedented opportunities to infer gene regulatory networks (GRNs) at a fine-grained resolution, shedding light on cellular phenotypes at the molecular level. However, the high sparsity, noise, and dropout events inherent in scRNA-seq data pose significant challenges for accurate and reliable GRN inference. The rapid growth in experimentally validated transcription factor-DNA binding data (e.g., ChIP-seq) has enabled supervised machine learning methods, which rely on known gene regulatory interactions to learn patterns, and achieve high accuracy in GRN inference by framing it as a gene regulatory link prediction task. This study addresses the gene regulatory link prediction problem by learning informative vectorized representations at the gene level to predict missing regulatory interactions. However, a higher performance of supervised learning methods requires a large amount of known TF-DNA binding data, which is often experimentally expensive and therefore limited in amount. Advances in large-scale pre-training and transfer learning provide a transformative opportunity to address this challenge. In this study, we leverage large-scale pre-trained models, trained on extensive scRNA-seq datasets and known as single-cell foundation models (scFMs). These models are combined with joint graph-based learning to establish a robust foundation for gene regulatory link prediction. Results We propose scRegNet, a novel and effective framework that leverages scFMs with joint graph-based learning for gene regulatory link prediction. scRegNet achieves state-of-the-art results in comparison with nine baseline methods on seven scRNA-seq benchmark datasets. In addition, scRegNet is more robust than the baseline methods on noisy training data. Availability The source code is available at https://github.com/sindhura-cs/scRegNet.
Collapse
Affiliation(s)
- Sindhura Kommu
- Department of Computer Science, Virginia Tech, Blacksburg, 24061, Virginia, USA
| | - Yizhi Wang
- Department of Electrical and Computer Engineering, Virginia Tech, Arlington, 22203, Virginia, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Tech, Arlington, 22203, Virginia, USA
| | - Xuan Wang
- Department of Computer Science, Virginia Tech, Blacksburg, 24061, Virginia, USA
| |
Collapse
|
15
|
Cao G, Chen D. Unveiling Long Non-coding RNA Networks from Single-Cell Omics Data Through Artificial Intelligence. Methods Mol Biol 2025; 2883:257-279. [PMID: 39702712 DOI: 10.1007/978-1-0716-4290-0_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
Single-cell omics technologies have revolutionized the study of long non-coding RNAs (lncRNAs), offering unprecedented resolution in elucidating their expression dynamics, cell-type specificity, and associated gene regulatory networks (GRNs). Concurrently, the integration of artificial intelligence (AI) methodologies has significantly advanced our understanding of lncRNA functions and its implications in disease pathogenesis. This chapter discusses the progress in single-cell omics data analysis, emphasizing its pivotal role in unraveling the molecular mechanisms underlying cellular heterogeneity and the associated regulatory networks involving lncRNAs. Additionally, we provide a summary of single-cell omics resources and AI models for constructing single-cell gene regulatory networks (scGRNs). Finally, we explore the challenges and prospects of exploring scGRNs in the context of lncRNA biology.
Collapse
Affiliation(s)
- Guangshuo Cao
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| | - Dijun Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China.
| |
Collapse
|
16
|
Gao Y, Duan H, Meng F, Zhang C, Li X, Li F. scRSSL: Residual semi-supervised learning with deep generative models to automatically identify cell types. IET Syst Biol 2025; 19:e12107. [PMID: 40261690 PMCID: PMC12033026 DOI: 10.1049/syb2.12107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 10/18/2024] [Accepted: 10/27/2024] [Indexed: 04/24/2025] Open
Abstract
Single-cell sequencing (scRNA-seq) allows researchers to study cellular heterogeneity in individual cells. In single-cell transcriptomics analysis, identifying the cell type of individual cells is a key task. At present, single-cell datasets often face the challenges of high dimensionality, large number of samples, high sparsity and sample imbalance. The traditional methods of cell type recognition have been challenged. The authors propose a deep residual generation model based on semi-supervised learning (scRSSL) to address these challenges. ScRSSL creatively introduces residual networks into semi-supervised generative models. The authors take advantage of its semi-supervised learning to solve the problem of sample imbalance. During the training of the model, the authors use a residual neural network to accomplish the inference of cell types so that local features of single-cell data can be extracted. Because of the semi-supervised learning approach, it can automatically and accurately predict individual cell types in datasets, even with only a small number of cell labels. Experimentally, the authors' method has proven to have better performance compared to other methods.
Collapse
Affiliation(s)
- Yanru Gao
- School of Computer ScienceQufu Normal UniversityRizhaoChina
| | - Hongyu Duan
- Department of Statistics and Financial MathematicsSchool of MathematicsSouth China University of TechnologyGuangzhouChina
| | - Fanhao Meng
- School of Computer ScienceQufu Normal UniversityRizhaoChina
| | - Conghui Zhang
- School of Computer ScienceQufu Normal UniversityRizhaoChina
| | - Xiyue Li
- School of Computer ScienceQufu Normal UniversityRizhaoChina
| | - Feng Li
- School of Computer ScienceQufu Normal UniversityRizhaoChina
| |
Collapse
|
17
|
Cui W, Long Q, Liu W, Fang C, Wang X, Wang P, Zhou Y. Hierarchical Graph Transformer With Contrastive Learning for Gene Regulatory Network Inference. IEEE J Biomed Health Inform 2025; 29:690-699. [PMID: 39401117 DOI: 10.1109/jbhi.2024.3476490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Gene regulatory networks (GRNs) are crucial for understanding gene regulation and cellular processes. Inferring GRNs helps uncover regulatory pathways, shedding light on the regulation and development of cellular processes. With the rise of high-throughput sequencing and advancements in computational technology, computational models have emerged as cost-effective alternatives to traditional experimental studies. Moreover, the surge in ChIP-seq data for TF-DNA binding has catalyzed the development of graph neural network (GNN)-based methods, greatly advancing GRN inference capabilities. However, most existing GNN-based methods suffer from the inability to capture long-distance structural semantic correlations due to transitive interactions. In this paper, we introduce a novel GNN-based model named Hierarchical Graph Transformer with Contrastive Learning for GRN (HGTCGRN) inference. HGTCGRN excels at capturing structural semantics using a hierarchical graph Transformer, which introduces a series of gene family nodes representing gene functions as virtual nodes to interact with nodes in the GRNS. These semantic-aware virtual-node embeddings are aggregated to produce node representations with varying emphasis. Additionally, we leverage gene ontology information to construct gene interaction networks for contrastive learning optimization of GRNs. Experimental results demonstrate that HGTCGRN achieves superior performance in GRN inference.
Collapse
|
18
|
Li R, Wu J, Li G, Liu J, Liu J, Xuan J, Deng Z. SIGRN: Inferring Gene Regulatory Network with Soft Introspective Variational Autoencoders. Int J Mol Sci 2024; 25:12741. [PMID: 39684451 DOI: 10.3390/ijms252312741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 11/21/2024] [Accepted: 11/25/2024] [Indexed: 12/18/2024] Open
Abstract
Gene regulatory networks (GRNs) exhibit the complex regulatory relationships among genes, which are essential for understanding developmental biology and uncovering the fundamental aspects of various biological phenomena. It is an effective and economical way to infer GRNs from single-cell RNA sequencing (scRNA-seq) with computational methods. Recent researches have been done on the problem by using variational autoencoder (VAE) and structural equation model (SEM). Due to the shortcoming of VAE generating poor-quality data, in this paper, a soft introspective adversarial gene regulatory network unsupervised inference model, called SIGRN, is proposed by introducing adversarial mechanism in building a variational autoencoder model. SIGRN applies "soft" introspective adversarial mode to avoid training additional neural networks and adding additional training parameters. It demonstrates superior inference accuracy across most benchmark datasets when compared to nine leading-edge methods. In addition, method SIGRN also achieves better performance on representing cells and generating scRNA-seq data in most datasets. All of which have been verified via substantial experiments. The SIGRN method shows promise for generating scRNA-seq data and inferring GRNs.
Collapse
Affiliation(s)
- Rongyuan Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Jingli Wu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Gaoshi Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Jiafei Liu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Jinlu Liu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Junbo Xuan
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Zheng Deng
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| |
Collapse
|
19
|
Wu Y, Xu P, Wang L, Liu S, Hou Y, Lu H, Hu P, Li X, Yu X. scGO: interpretable deep neural network for cell status annotation and disease diagnosis. Brief Bioinform 2024; 26:bbaf018. [PMID: 39820437 PMCID: PMC11737892 DOI: 10.1093/bib/bbaf018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 12/16/2024] [Accepted: 01/10/2025] [Indexed: 01/19/2025] Open
Abstract
Machine learning has emerged as a transformative tool for elucidating cellular heterogeneity in single-cell RNA sequencing. However, a significant challenge lies in the "black box" nature of deep learning models, which obscures the decision-making process and limits interpretability in cell status annotation. In this study, we introduced scGO, a Gene Ontology (GO)-inspired deep learning framework designed to provide interpretable cell status annotation for scRNA-seq data. scGO employs sparse neural networks to leverage the intrinsic biological relationships among genes, transcription factors, and GO terms, significantly augmenting interpretability and reducing computational cost. scGO outperforms state-of-the-art methods in the precise characterization of cell subtypes across diverse datasets. Our extensive experimentation across a spectrum of scRNA-seq datasets underscored the remarkable efficacy of scGO in disease diagnosis, prediction of developmental stages, and evaluation of disease severity and cellular senescence status. Furthermore, we incorporated in silico individual gene manipulations into the scGO model, introducing an additional layer for discovering therapeutic targets. Our results provide an interpretable model for accurately annotating cell status, capturing latent biological knowledge, and informing clinical practice.
Collapse
Affiliation(s)
- You Wu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China
| | - Pengfei Xu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China
| | - Liyuan Wang
- School of Agriculture and Biology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China
| | - Shuai Liu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China
| | - Yingnan Hou
- School of Agriculture and Biology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China
| | - Hui Lu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China
| | - Peng Hu
- Ministry of Education, Shanghai Ocean University, No. 999, Huchenghuan Road, Shanghai 201306, China
| | - Xiaofei Li
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China
- Shanghai Pudong New Area People’s Hospital, No. 490, Chuanhuan South Road, Shanghai 201299, China
| | - Xiang Yu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China
| |
Collapse
|
20
|
Liu Y, Zhong L, Yan B, Chen Z, Yu Y, Yu D, Qin J, Wang J. A self-attention-driven deep learning framework for inference of transcriptional gene regulatory networks. Brief Bioinform 2024; 26:bbae639. [PMID: 39679439 DOI: 10.1093/bib/bbae639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 10/15/2024] [Accepted: 11/25/2024] [Indexed: 12/17/2024] Open
Abstract
The interactions between transcription factors (TFs) and the target genes could provide a basis for constructing gene regulatory networks (GRNs) for mechanistic understanding of various biological complex processes. From gene expression data, particularly single-cell transcriptomic data containing rich cell-to-cell variations, it is highly desirable to infer TF-gene interactions (TGIs) using deep learning technologies. Numerous models or software including deep learning-based algorithms have been designed to identify transcriptional regulatory relationships between TFs and the downstream genes. However, these methods do not significantly improve predictions of TGIs due to some limitations regarding constructing underlying interactive structures linking regulatory components. In this study, we introduce a deep learning framework, DeepTGI, that encodes gene expression profiles from single-cell and/or bulk transcriptomic data and predicts TGIs with high accuracy. Our approach could fuse the features extracted from Auto-encoder with self-attention mechanism and other networks and could transform multihead attention modules to define representative features. By comparing it with other models or methods, DeepTGI exhibits its superiority to identify more potential TGIs and to reconstruct the GRNs and, therefore, could provide broader perspectives for discovery of more biological meaningful TGIs and for understanding transcriptional gene regulatory mechanisms.
Collapse
Affiliation(s)
- Yong Liu
- College of Electronic Information, Guangxi Minzu University, 188 East University Road, Nanning, Guangxi, 530006, China
| | - Le Zhong
- College of Electronic Information, Guangxi Minzu University, 188 East University Road, Nanning, Guangxi, 530006, China
| | - Bin Yan
- Division of Applied Oral Sciences & Community Dental Care, Faculty of Dentistry, The University of Hong Kong, 34 Hospital Road, Hong Kong SAR, China
| | - Zhuobin Chen
- School of Pharmaceutical Sciences (Shenzhen), Shenzhen Campus of Sun Yat-sen University, 66 Gongchang Road, Shenzhen, Guangdong, 518107, China
| | - Yanjia Yu
- College of Electronic Information, Guangxi Minzu University, 188 East University Road, Nanning, Guangxi, 530006, China
| | - Dan Yu
- Division of Applied Oral Sciences & Community Dental Care, Faculty of Dentistry, The University of Hong Kong, 34 Hospital Road, Hong Kong SAR, China
| | - Jing Qin
- School of Pharmaceutical Sciences (Shenzhen), Shenzhen Campus of Sun Yat-sen University, 66 Gongchang Road, Shenzhen, Guangdong, 518107, China
| | - Junwen Wang
- Division of Applied Oral Sciences & Community Dental Care, Faculty of Dentistry, The University of Hong Kong, 34 Hospital Road, Hong Kong SAR, China
- Department of Quantitative Health Sciences, Center for Individualized Medicine, and Mayo Clinic Comprehensive Cancer Center, Mayo Clinic, 13400 E Shea Blvd, Scottsdale, AZ, 85259, United States
| |
Collapse
|
21
|
Zhu W, Du Z, Xu Z, Yang D, Chen M, Song Q. SCRN: Single-Cell Gene Regulatory Network Identification in Alzheimer's Disease. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1886-1896. [PMID: 38976461 DOI: 10.1109/tcbb.2024.3424400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Alzheimer's disease (AD) is the most common neurodegenerative disease, and it consumes considerable medical resources with increasing number of patients every year. Mounting evidence show that the regulatory disruptions altering the intrinsic activity of genes in brain cells contribute to AD pathogenesis. To gain insights into the underlying gene regulation in AD, we proposed a graph learning method, Single-Cell based Regulatory Network (SCRN), to identify the regulatory mechanisms based on single-cell data. SCRN implements the γ-decaying heuristic link prediction based on graph neural networks and can identify reliable gene regulatory networks using locally closed subgraphs. In this work, we first performed UMAP dimension reduction analysis on single-cell RNA sequencing (scRNA-seq) data of AD and normal samples. Then we used SCRN to construct the gene regulatory network based on three well-recognized AD genes (APOE, CX3CR1, and P2RY12). Enrichment analysis of the regulatory network revealed significant pathways including NGF signaling, ERBB2 signaling, and hemostasis. These findings demonstrate the feasibility of using SCRN to uncover potential biomarkers and therapeutic targets related to AD.
Collapse
|
22
|
Dong J, Li J, Wang F. Deep Learning in Gene Regulatory Network Inference: A Survey. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2089-2101. [PMID: 39137088 DOI: 10.1109/tcbb.2024.3442536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2024]
Abstract
Understanding the intricate regulatory relationships among genes is crucial for comprehending the development, differentiation, and cellular response in living systems. Consequently, inferring gene regulatory networks (GRNs) based on observed data has gained significant attention as a fundamental goal in biological applications. The proliferation and diversification of available data present both opportunities and challenges in accurately inferring GRNs. Deep learning, a highly successful technique in various domains, holds promise in aiding GRN inference. Several GRN inference methods employing deep learning models have been proposed; however, the selection of an appropriate method remains a challenge for life scientists. In this survey, we provide a comprehensive analysis of 12 GRN inference methods that leverage deep learning models. We trace the evolution of these major methods and categorize them based on the types of applicable data. We delve into the core concepts and specific steps of each method, offering a detailed evaluation of their effectiveness and scalability across different scenarios. These insights enable us to make informed recommendations. Moreover, we explore the challenges faced by GRN inference methods utilizing deep learning and discuss future directions, providing valuable suggestions for the advancement of data scientists in this field.
Collapse
|
23
|
Yuan L, Zhao L, Jiang Y, Shen Z, Zhang Q, Zhang M, Zheng CH, Huang DS. scMGATGRN: a multiview graph attention network-based method for inferring gene regulatory networks from single-cell transcriptomic data. Brief Bioinform 2024; 25:bbae526. [PMID: 39417321 PMCID: PMC11484520 DOI: 10.1093/bib/bbae526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 07/09/2024] [Accepted: 10/03/2024] [Indexed: 10/19/2024] Open
Abstract
The gene regulatory network (GRN) plays a vital role in understanding the structure and dynamics of cellular systems, revealing complex regulatory relationships, and exploring disease mechanisms. Recently, deep learning (DL)-based methods have been proposed to infer GRNs from single-cell transcriptomic data and achieved impressive performance. However, these methods do not fully utilize graph topological information and high-order neighbor information from multiple receptive fields. To overcome those limitations, we propose a novel model based on multiview graph attention network, namely, scMGATGRN, to infer GRNs. scMGATGRN mainly consists of GAT, multiview, and view-level attention mechanism. GAT can extract essential features of the gene regulatory network. The multiview model can simultaneously utilize local feature information and high-order neighbor feature information of nodes in the gene regulatory network. The view-level attention mechanism dynamically adjusts the relative importance of node embedding representations and efficiently aggregates node embedding representations from two views. To verify the effectiveness of scMGATGRN, we compared its performance with 10 methods (five shallow learning algorithms and five state-of-the-art DL-based methods) on seven benchmark single-cell RNA sequencing (scRNA-seq) datasets from five cell lines (two in human and three in mouse) with four different kinds of ground-truth networks. The experimental results not only show that scMGATGRN outperforms competing methods but also demonstrate the potential of this model in inferring GRNs. The code and data of scMGATGRN are made freely available on GitHub (https://github.com/nathanyl/scMGATGRN).
Collapse
Affiliation(s)
- Lin Yuan
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), 3501 Daxue Road, 250353, Shandong, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), 3501 Daxue Road, 250353, Shandong, China
- Shandong Provincial Key Laboratory of Industrial Network and Information System Security, Shandong Fundamental Research Center for Computer Science, 3501 Daxue Road, 250353, Shandong, China
| | - Ling Zhao
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), 3501 Daxue Road, 250353, Shandong, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), 3501 Daxue Road, 250353, Shandong, China
- Shandong Provincial Key Laboratory of Industrial Network and Information System Security, Shandong Fundamental Research Center for Computer Science, 3501 Daxue Road, 250353, Shandong, China
| | - Yufeng Jiang
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), 3501 Daxue Road, 250353, Shandong, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), 3501 Daxue Road, 250353, Shandong, China
- Shandong Provincial Key Laboratory of Industrial Network and Information System Security, Shandong Fundamental Research Center for Computer Science, 3501 Daxue Road, 250353, Shandong, China
| | - Zhen Shen
- School of Computer and Software, Nanyang Institute of Technology, 80 Changjiang Road, 473004, Henan, China
| | - Qinhu Zhang
- Ningbo Institute of Digital Twin, Eastern Institute of Technology, 568 Tongxin Road, 315201, Zhejiang, China
| | - Ming Zhang
- Department of Pediatrics, Zhongshan Hospital Xiamen University, 201 Hubinnan Road, 361004, Fujian, China
| | - Chun-Hou Zheng
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - De-Shuang Huang
- Ningbo Institute of Digital Twin, Eastern Institute of Technology, 568 Tongxin Road, 315201, Zhejiang, China
- Institute for Regenerative Medicine, Medical Innovation Center and State Key Laboratory of Cardiology, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, 1239 Siping Road, 200123, Shanghai, China
| |
Collapse
|
24
|
Wei PJ, Bao JJ, Gao Z, Tan JY, Cao RF, Su Y, Zheng CH, Deng L. MEFFGRN: Matrix enhancement and feature fusion-based method for reconstructing the gene regulatory network of epithelioma papulosum cyprini cells by spring viremia of carp virus infection. Comput Biol Med 2024; 179:108835. [PMID: 38996550 DOI: 10.1016/j.compbiomed.2024.108835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 06/05/2024] [Accepted: 06/29/2024] [Indexed: 07/14/2024]
Abstract
Gene regulatory networks (GRNs) are crucial for understanding organismal molecular mechanisms and processes. Construction of GRN in the epithelioma papulosum cyprini (EPC) cells of cyprinid fish by spring viremia of carp virus (SVCV) infection helps understand the immune regulatory mechanisms that enhance the survival capabilities of cyprinid fish. Although many computational methods have been used to infer GRNs, specialized approaches for predicting the GRN of EPC cells following SVCV infection are lacking. In addition, most existing methods focus primarily on gene expression features, neglecting the valuable network structural information in known GRNs. In this study, we propose a novel supervised deep neural network, named MEFFGRN (Matrix Enhancement- and Feature Fusion-based method for Gene Regulatory Network inference), to accurately predict the GRN of EPC cells following SVCV infection. MEFFGRN considers both gene expression data and network structure information of known GRN and introduces a matrix enhancement method to address the sparsity issue of known GRN, extracting richer network structure information. To optimize the benefits of CNN (Convolutional Neural Network) in image processing, gene expression and enhanced GRN data were transformed into histogram images for each gene pair respectively. Subsequently, these histograms were separately fed into CNNs for training to obtain the corresponding gene expression and network structural features. Furthermore, a feature fusion mechanism was introduced to comprehensively integrate the gene expression and network structural features. This integration considers the specificity of each feature and their interactive information, resulting in a more comprehensive and precise feature representation during the fusion process. Experimental results from both real-world and benchmark datasets demonstrate that MEFFGRN achieves competitive performance compared with state-of-the-art computational methods. Furthermore, study findings from SVCV-infected EPC cells suggest that MEFFGRN can predict novel gene regulatory relationships.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Jin-Jin Bao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Zhen Gao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Jing-Yun Tan
- Shenzhen Key Laboratory of Microbial Genetic Engineering, College of Life Sciences and Oceanology, Shenzhen University, Shenzhen, 518055, Guangdong, China
| | - Rui-Fen Cao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Yansen Su
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Chun-Hou Zheng
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China.
| | - Li Deng
- Shenzhen Key Laboratory of Microbial Genetic Engineering, College of Life Sciences and Oceanology, Shenzhen University, Shenzhen, 518055, Guangdong, China.
| |
Collapse
|
25
|
Jiang H, Wang Y, Yin C, Pan H, Chen L, Feng K, Chang Y, Sun H. SLIVER: Unveiling large scale gene regulatory networks of single-cell transcriptomic data through causal structure learning and modules aggregation. Comput Biol Med 2024; 178:108690. [PMID: 38879931 DOI: 10.1016/j.compbiomed.2024.108690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 05/19/2024] [Accepted: 06/01/2024] [Indexed: 06/18/2024]
Abstract
Prevalent Gene Regulatory Network (GRN) construction methods rely on generalized correlation analysis. However, in biological systems, regulation is essentially a causal relationship that cannot be adequately captured solely through correlation. Therefore, it is more reasonable to infer GRNs from a causal perspective. Existing causal discovery algorithms typically rely on Directed Acyclic Graphs (DAGs) to model causal relationships, but it often requires traversing the entire network, which result in computational demands skyrocketing as the number of nodes grows and make causal discovery algorithms only suitable for small networks with one or two hundred nodes or fewer. In this study, we propose the SLIVER (cauSaL dIscovery Via dimEnsionality Reduction) algorithm which integrates causal structural equation model and graph decomposition. SLIVER introduces a set of factor nodes, serving as abstractions of different functional modules to integrate the regulatory relationships between genes based on their respective functions or pathways, thus reducing the GRN to the product of two low-dimensional matrices. Subsequently, we employ the structural causal model (SCM) to learn the GRN within the gene node space, enforce the DAG constraint in the low-dimensional space, and guide each factor to aggregate various functions through cosine similarity. We evaluate the performance of the SLIVER algorithm on 12 real single cell transcriptomic datasets, and demonstrate it outperforms other 12 widely used methods both in GRN inference performance and computational resource usage. The analysis of the gene information integrated by factor nodes also demonstrate the biological explanation of factor nodes in GRNs. We apply it to scRNA-seq of Type 2 diabetes mellitus to capture the transcriptional regulatory structural changes of β cells under high insulin demand.
Collapse
Affiliation(s)
- Hongyang Jiang
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Yuezhu Wang
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Chaoyi Yin
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Hao Pan
- College of Software, Jilin University, Changchun, 130012, China
| | - Liqun Chen
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Ke Feng
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Yi Chang
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China; International Center of Future Science, Jilin University, Changchun, China; Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China
| | - Huiyan Sun
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China; International Center of Future Science, Jilin University, Changchun, China; Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China.
| |
Collapse
|
26
|
Loers JU, Vermeirssen V. A single-cell multimodal view on gene regulatory network inference from transcriptomics and chromatin accessibility data. Brief Bioinform 2024; 25:bbae382. [PMID: 39207727 PMCID: PMC11359808 DOI: 10.1093/bib/bbae382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 06/27/2024] [Accepted: 07/23/2024] [Indexed: 09/04/2024] Open
Abstract
Eukaryotic gene regulation is a combinatorial, dynamic, and quantitative process that plays a vital role in development and disease and can be modeled at a systems level in gene regulatory networks (GRNs). The wealth of multi-omics data measured on the same samples and even on the same cells has lifted the field of GRN inference to the next stage. Combinations of (single-cell) transcriptomics and chromatin accessibility allow the prediction of fine-grained regulatory programs that go beyond mere correlation of transcription factor and target gene expression, with enhancer GRNs (eGRNs) modeling molecular interactions between transcription factors, regulatory elements, and target genes. In this review, we highlight the key components for successful (e)GRN inference from (sc)RNA-seq and (sc)ATAC-seq data exemplified by state-of-the-art methods as well as open challenges and future developments. Moreover, we address preprocessing strategies, metacell generation and computational omics pairing, transcription factor binding site detection, and linear and three-dimensional approaches to identify chromatin interactions as well as dynamic and causal eGRN inference. We believe that the integration of transcriptomics together with epigenomics data at a single-cell level is the new standard for mechanistic network inference, and that it can be further advanced with integrating additional omics layers and spatiotemporal data, as well as with shifting the focus towards more quantitative and causal modeling strategies.
Collapse
Affiliation(s)
- Jens Uwe Loers
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| | - Vanessa Vermeirssen
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| |
Collapse
|
27
|
Cui W, Long Q, Xiao M, Wang X, Feng G, Li X, Wang P, Zhou Y. Refining computational inference of gene regulatory networks: integrating knockout data within a multi-task framework. Brief Bioinform 2024; 25:bbae361. [PMID: 39082651 PMCID: PMC11289685 DOI: 10.1093/bib/bbae361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/09/2024] [Accepted: 07/16/2024] [Indexed: 08/03/2024] Open
Abstract
Constructing accurate gene regulatory network s (GRNs), which reflect the dynamic governing process between genes, is critical to understanding the diverse cellular process and unveiling the complexities in biological systems. With the development of computer sciences, computational-based approaches have been applied to the GRNs inference task. However, current methodologies face challenges in effectively utilizing existing topological information and prior knowledge of gene regulatory relationships, hindering the comprehensive understanding and accurate reconstruction of GRNs. In response, we propose a novel graph neural network (GNN)-based Multi-Task Learning framework for GRN reconstruction, namely MTLGRN. Specifically, we first encode the gene promoter sequences and the gene biological features and concatenate the corresponding feature representations. Then, we construct a multi-task learning framework including GRN reconstruction, Gene knockout predict, and Gene expression matrix reconstruction. With joint training, MTLGRN can optimize the gene latent representations by integrating gene knockout information, promoter characteristics, and other biological attributes. Extensive experimental results demonstrate superior performance compared with state-of-the-art baselines on the GRN reconstruction task, efficiently leveraging biological knowledge and comprehensively understanding the gene regulatory relationships. MTLGRN also pioneered attempts to simulate gene knockouts on bulk data by incorporating gene knockout information.
Collapse
Affiliation(s)
- Wentao Cui
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Qingqing Long
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
| | - Meng Xiao
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Xuezhi Wang
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Guihai Feng
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing, 100101, China
| | - Xin Li
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing, 100101, China
| | - Pengfei Wang
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Yuanchun Zhou
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| |
Collapse
|
28
|
Wang Y, Zhou F, Guan J. SFINN: inferring gene regulatory network from single-cell and spatial transcriptomic data with shared factor neighborhood and integrated neural network. Bioinformatics 2024; 40:btae433. [PMID: 38950180 PMCID: PMC11236097 DOI: 10.1093/bioinformatics/btae433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 06/18/2024] [Accepted: 06/28/2024] [Indexed: 07/03/2024] Open
Abstract
MOTIVATION The rise of single-cell RNA sequencing (scRNA-seq) technology presents new opportunities for constructing detailed cell type-specific gene regulatory networks (GRNs) to study cell heterogeneity. However, challenges caused by noises, technical errors, and dropout phenomena in scRNA-seq data pose significant obstacles to GRN inference, making the design of accurate GRN inference algorithms still essential. The recent growth of both single-cell and spatial transcriptomic sequencing data enables the development of supervised deep learning methods to infer GRNs on these diverse single-cell datasets. RESULTS In this study, we introduce a novel deep learning framework based on shared factor neighborhood and integrated neural network (SFINN) for inferring potential interactions and causalities between transcription factors and target genes from single-cell and spatial transcriptomic data. SFINN utilizes shared factor neighborhood to construct cellular neighborhood network based on gene expression data and additionally integrates cellular network generated from spatial location information. Subsequently, the cell adjacency matrix and gene pair expression are fed into an integrated neural network framework consisting of a graph convolutional neural network and a fully-connected neural network to determine whether the genes interact. Performance evaluation in the tasks of gene interaction and causality prediction against the existing GRN reconstruction algorithms demonstrates the usability and competitiveness of SFINN across different kinds of data. SFINN can be applied to infer GRNs from conventional single-cell sequencing data and spatial transcriptomic data. AVAILABILITY AND IMPLEMENTATION SFINN can be accessed at GitHub: https://github.com/JGuan-lab/SFINN.
Collapse
Affiliation(s)
- Yongjie Wang
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
| | - Fengfan Zhou
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
| | - Jinting Guan
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
- Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai 200240, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| |
Collapse
|
29
|
Moeckel C, Mouratidis I, Chantzi N, Uzun Y, Georgakopoulos-Soares I. Advances in computational and experimental approaches for deciphering transcriptional regulatory networks: Understanding the roles of cis-regulatory elements is essential, and recent research utilizing MPRAs, STARR-seq, CRISPR-Cas9, and machine learning has yielded valuable insights. Bioessays 2024; 46:e2300210. [PMID: 38715516 PMCID: PMC11444527 DOI: 10.1002/bies.202300210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/22/2024] [Accepted: 04/23/2024] [Indexed: 05/16/2024]
Abstract
Understanding the influence of cis-regulatory elements on gene regulation poses numerous challenges given complexities stemming from variations in transcription factor (TF) binding, chromatin accessibility, structural constraints, and cell-type differences. This review discusses the role of gene regulatory networks in enhancing understanding of transcriptional regulation and covers construction methods ranging from expression-based approaches to supervised machine learning. Additionally, key experimental methods, including MPRAs and CRISPR-Cas9-based screening, which have significantly contributed to understanding TF binding preferences and cis-regulatory element functions, are explored. Lastly, the potential of machine learning and artificial intelligence to unravel cis-regulatory logic is analyzed. These computational advances have far-reaching implications for precision medicine, therapeutic target discovery, and the study of genetic variations in health and disease.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
30
|
Zhou X, Pan J, Chen L, Zhang S, Chen Y. DeepIMAGER: Deeply Analyzing Gene Regulatory Networks from scRNA-seq Data. Biomolecules 2024; 14:766. [PMID: 39062480 PMCID: PMC11274664 DOI: 10.3390/biom14070766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 06/22/2024] [Accepted: 06/25/2024] [Indexed: 07/28/2024] Open
Abstract
Understanding the dynamics of gene regulatory networks (GRNs) across diverse cell types poses a challenge yet holds immense value in unraveling the molecular mechanisms governing cellular processes. Current computational methods, which rely solely on expression changes from bulk RNA-seq and/or scRNA-seq data, often result in high rates of false positives and low precision. Here, we introduce an advanced computational tool, DeepIMAGER, for inferring cell-specific GRNs through deep learning and data integration. DeepIMAGER employs a supervised approach that transforms the co-expression patterns of gene pairs into image-like representations and leverages transcription factor (TF) binding information for model training. It is trained using comprehensive datasets that encompass scRNA-seq profiles and ChIP-seq data, capturing TF-gene pair information across various cell types. Comprehensive validations on six cell lines show DeepIMAGER exhibits superior performance in ten popular GRN inference tools and has remarkable robustness against dropout-zero events. DeepIMAGER was applied to scRNA-seq datasets of multiple myeloma (MM) and detected potential GRNs for TFs of RORC, MITF, and FOXD2 in MM dendritic cells. This technical innovation, combined with its capability to accurately decode GRNs from scRNA-seq, establishes DeepIMAGER as a valuable tool for unraveling complex regulatory networks in various cell types.
Collapse
Affiliation(s)
- Xiguo Zhou
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Jingyi Pan
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Liang Chen
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| |
Collapse
|
31
|
Wu S, Jin K, Tang M, Xia Y, Gao W. Inference of Gene Regulatory Networks Based on Multi-view Hierarchical Hypergraphs. Interdiscip Sci 2024; 16:318-332. [PMID: 38342857 DOI: 10.1007/s12539-024-00604-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/26/2023] [Accepted: 01/03/2024] [Indexed: 02/13/2024]
Abstract
Since gene regulation is a complex process in which multiple genes act simultaneously, accurately inferring gene regulatory networks (GRNs) is a long-standing challenge in systems biology. Although graph neural networks can formally describe intricate gene expression mechanisms, current GRN inference methods based on graph learning regard only transcription factor (TF)-target gene interactions as pairwise relationships, and cannot model the many-to-many high-order regulatory patterns that prevail among genes. Moreover, these methods often rely on limited prior regulatory knowledge, ignoring the structural information of GRNs in gene expression profiles. Therefore, we propose a multi-view hierarchical hypergraphs GRN (MHHGRN) inference model. Specifically, multiple heterogeneous biological information is integrated to construct multi-view hierarchical hypergraphs of TFs and target genes, using hypergraph convolution networks to model higher order complex regulatory relationships. Meanwhile, the coupled information diffusion mechanism and the cross-domain messaging mechanism facilitate the information sharing between genes to optimise gene embedding representations. Finally, a unique channel attention mechanism is used to adaptively learn feature representations from multiple views for GRN inference. Experimental results show that MHHGRN achieves better results than the baseline methods on the E. coli and S. cerevisiae benchmark datasets of the DREAM5 challenge, and it has excellent cross-species generalization, achieving comparable or better performance on scRNA-seq datasets from five mouse and two human cell lines.
Collapse
Affiliation(s)
- Songyang Wu
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| | - Kui Jin
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| | - Mingjing Tang
- School of Life Science, Yunnan Normal University, Kunming, 650500, China.
- Engineering Research Center of Sustainable Development and Utilization of Biomass Energy, Ministry of Education, Yunnan Normal University, Kunming, 650500, China.
| | - Yuelong Xia
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| | - Wei Gao
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| |
Collapse
|
32
|
Wei PJ, Guo Z, Gao Z, Ding Z, Cao RF, Su Y, Zheng CH. Inference of gene regulatory networks based on directed graph convolutional networks. Brief Bioinform 2024; 25:bbae309. [PMID: 38935070 PMCID: PMC11209731 DOI: 10.1093/bib/bbae309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 05/17/2024] [Indexed: 06/28/2024] Open
Abstract
Inferring gene regulatory network (GRN) is one of the important challenges in systems biology, and many outstanding computational methods have been proposed; however there remains some challenges especially in real datasets. In this study, we propose Directed Graph Convolutional neural network-based method for GRN inference (DGCGRN). To better understand and process the directed graph structure data of GRN, a directed graph convolutional neural network is conducted which retains the structural information of the directed graph while also making full use of neighbor node features. The local augmentation strategy is adopted in graph neural network to solve the problem of poor prediction accuracy caused by a large number of low-degree nodes in GRN. In addition, for real data such as E.coli, sequence features are obtained by extracting hidden features using Bi-GRU and calculating the statistical physicochemical characteristics of gene sequence. At the training stage, a dynamic update strategy is used to convert the obtained edge prediction scores into edge weights to guide the subsequent training process of the model. The results on synthetic benchmark datasets and real datasets show that the prediction performance of DGCGRN is significantly better than existing models. Furthermore, the case studies on bladder uroepithelial carcinoma and lung cancer cells also illustrate the performance of the proposed model.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Ziqiang Guo
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Zhen Gao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Zheng Ding
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Rui-Fen Cao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Yansen Su
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Chun-Hou Zheng
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| |
Collapse
|
33
|
Lei Y, Huang XT, Guo X, Hang Katie Chan K, Gao L. DeepGRNCS: deep learning-based framework for jointly inferring gene regulatory networks across cell subpopulations. Brief Bioinform 2024; 25:bbae334. [PMID: 38980373 PMCID: PMC11232306 DOI: 10.1093/bib/bbae334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 06/03/2024] [Accepted: 07/01/2024] [Indexed: 07/10/2024] Open
Abstract
Inferring gene regulatory networks (GRNs) allows us to obtain a deeper understanding of cellular function and disease pathogenesis. Recent advances in single-cell RNA sequencing (scRNA-seq) technology have improved the accuracy of GRN inference. However, many methods for inferring individual GRNs from scRNA-seq data are limited because they overlook intercellular heterogeneity and similarities between different cell subpopulations, which are often present in the data. Here, we propose a deep learning-based framework, DeepGRNCS, for jointly inferring GRNs across cell subpopulations. We follow the commonly accepted hypothesis that the expression of a target gene can be predicted based on the expression of transcription factors (TFs) due to underlying regulatory relationships. We initially processed scRNA-seq data by discretizing data scattering using the equal-width method. Then, we trained deep learning models to predict target gene expression from TFs. By individually removing each TF from the expression matrix, we used pre-trained deep model predictions to infer regulatory relationships between TFs and genes, thereby constructing the GRN. Our method outperforms existing GRN inference methods for various simulated and real scRNA-seq datasets. Finally, we applied DeepGRNCS to non-small cell lung cancer scRNA-seq data to identify key genes in each cell subpopulation and analyzed their biological relevance. In conclusion, DeepGRNCS effectively predicts cell subpopulation-specific GRNs. The source code is available at https://github.com/Nastume777/DeepGRNCS.
Collapse
Affiliation(s)
- Yahui Lei
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| | - Xiao-Tai Huang
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| | - Xingli Guo
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| | - Kei Hang Katie Chan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong SAR, China
- Department of Epidemiology and Center for Global Cardiometabolic Health, Brown University, Providence, RI, United States
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| |
Collapse
|
34
|
Zinati Y, Takiddeen A, Emad A. GRouNdGAN: GRN-guided simulation of single-cell RNA-seq data using causal generative adversarial networks. Nat Commun 2024; 15:4055. [PMID: 38744843 PMCID: PMC11525796 DOI: 10.1038/s41467-024-48516-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 05/01/2024] [Indexed: 05/16/2024] Open
Abstract
We introduce GRouNdGAN, a gene regulatory network (GRN)-guided reference-based causal implicit generative model for simulating single-cell RNA-seq data, in silico perturbation experiments, and benchmarking GRN inference methods. Through the imposition of a user-defined GRN in its architecture, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating transcription factors (TFs). Training on six experimental reference datasets, we show that our model captures non-linear TF-gene dependencies and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise, with no user manipulation and only implicit parameterization. GRouNdGAN can synthesize cells under new conditions to perform in silico TF knockout experiments. Benchmarking various GRN inference algorithms reveals that GRouNdGAN effectively bridges the existing gap between simulated and biological data benchmarks of GRN inference algorithms, providing gold standard ground truth GRNs and realistic cells corresponding to the biological system of interest.
Collapse
Affiliation(s)
- Yazdan Zinati
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada
| | - Abdulrahman Takiddeen
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada
| | - Amin Emad
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada.
- Mila, Quebec AI Institute, Montreal, QC, Canada.
- The Rosalind and Morris Goodman Cancer Institute, Montreal, QC, Canada.
| |
Collapse
|
35
|
Guo C, Huang Z, Chen J, Yu G, Wang Y, Wang X. Identification of Novel Regulators of Leaf Senescence Using a Deep Learning Model. PLANTS (BASEL, SWITZERLAND) 2024; 13:1276. [PMID: 38732491 PMCID: PMC11085074 DOI: 10.3390/plants13091276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 04/26/2024] [Accepted: 04/29/2024] [Indexed: 05/13/2024]
Abstract
Deep learning has emerged as a powerful tool for investigating intricate biological processes in plants by harnessing the potential of large-scale data. Gene regulation is a complex process that transcription factors (TFs), cooperating with their target genes, participate in through various aspects of biological processes. Despite its significance, the study of gene regulation has primarily focused on a limited number of notable instances, leaving numerous aspects and interactions yet to be explored comprehensively. Here, we developed DEGRN (Deep learning on Expression for Gene Regulatory Network), an innovative deep learning model designed to decipher gene interactions by leveraging high-dimensional expression data obtained from bulk RNA-Seq and scRNA-Seq data in the model plant Arabidopsis. DEGRN exhibited a compared level of predictive power when applied to various datasets. Through the utilization of DEGRN, we successfully identified an extensive set of 3,053,363 high-quality interactions, encompassing 1430 TFs and 13,739 non-TF genes. Notably, DEGRN's predictive capabilities allowed us to uncover novel regulators involved in a range of complex biological processes, including development, metabolism, and stress responses. Using leaf senescence as an example, we revealed a complex network underpinning this process composed of diverse TF families, including bHLH, ERF, and MYB. We also identified a novel TF, named MAF5, whose expression showed a strong linear regression relation during the progression of senescence. The mutant maf5 showed early leaf decay compared to the wild type, indicating a potential role in the regulation of leaf senescence. This hypothesis was further supported by the expression patterns observed across four stages of leaf development, as well as transcriptomics analysis. Overall, the comprehensive coverage provided by DEGRN expands our understanding of gene regulatory networks and paves the way for further investigations into their functional implications.
Collapse
Affiliation(s)
| | | | | | | | | | - Xu Wang
- Shanghai Collaborative Innovation Center of Agri-Seeds, Joint Center for Single Cell Biology, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China; (C.G.); (Z.H.); (J.C.); (G.Y.); (Y.W.)
| |
Collapse
|
36
|
Gan Y, Yu J, Xu G, Yan C, Zou G. Inferring gene regulatory networks from single-cell transcriptomics based on graph embedding. Bioinformatics 2024; 40:btae291. [PMID: 38810116 PMCID: PMC11142726 DOI: 10.1093/bioinformatics/btae291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 03/06/2024] [Accepted: 05/28/2024] [Indexed: 05/31/2024] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) encode gene regulation in living organisms, and have become a critical tool to understand complex biological processes. However, due to the dynamic and complex nature of gene regulation, inferring GRNs from scRNA-seq data is still a challenging task. Existing computational methods usually focus on the close connections between genes, and ignore the global structure and distal regulatory relationships. RESULTS In this study, we develop a supervised deep learning framework, IGEGRNS, to infer GRNs from scRNA-seq data based on graph embedding. In the framework, contextual information of genes is captured by GraphSAGE, which aggregates gene features and neighborhood structures to generate low-dimensional embedding for genes. Then, the k most influential nodes in the whole graph are filtered through Top-k pooling. Finally, potential regulatory relationships between genes are predicted by stacking CNNs. Compared with nine competing supervised and unsupervised methods, our method achieves better performance on six time-series scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION Our method IGEGRNS is implemented in Python using the Pytorch machine learning library, and it is freely available at https://github.com/DHUDBlab/IGEGRNS.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Jiacheng Yu
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Guangwei Xu
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Cairong Yan
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Guobing Zou
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| |
Collapse
|
37
|
Wang Y, Chen X, Zheng Z, Huang L, Xie W, Wang F, Zhang Z, Wong KC. scGREAT: Transformer-based deep-language model for gene regulatory network inference from single-cell transcriptomics. iScience 2024; 27:109352. [PMID: 38510148 PMCID: PMC10951644 DOI: 10.1016/j.isci.2024.109352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 12/29/2023] [Accepted: 02/23/2024] [Indexed: 03/22/2024] Open
Abstract
Gene regulatory networks (GRNs) involve complex and multi-layer regulatory interactions between regulators and their target genes. Precise knowledge of GRNs is important in understanding cellular processes and molecular functions. Recent breakthroughs in single-cell sequencing technology made it possible to infer GRNs at single-cell level. Existing methods, however, are limited by expensive computations, and sometimes simplistic assumptions. To overcome these obstacles, we propose scGREAT, a framework to infer GRN using gene embeddings and transformer from single-cell transcriptomics. scGREAT starts by constructing gene expression and gene biotext dictionaries from scRNA-seq data and gene text information. The representation of TF gene pairs is learned through optimizing embedding space by transformer-based engine. Results illustrated scGREAT outperformed other contemporary methods on benchmarks. Besides, gene representations from scGREAT provide valuable gene regulation insights, and external validation on spatial transcriptomics illuminated the mechanism behind scGREAT annotation. Moreover, scGREAT identified several TF target regulations corroborated in studies.
Collapse
Affiliation(s)
- Yuchen Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
- Cutaneous Biology Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Zetian Zheng
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Lei Huang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Weidun Xie
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Zhaolei Zhang
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China
| |
Collapse
|
38
|
Goshisht MK. Machine Learning and Deep Learning in Synthetic Biology: Key Architectures, Applications, and Challenges. ACS OMEGA 2024; 9:9921-9945. [PMID: 38463314 PMCID: PMC10918679 DOI: 10.1021/acsomega.3c05913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 01/19/2024] [Accepted: 01/30/2024] [Indexed: 03/12/2024]
Abstract
Machine learning (ML), particularly deep learning (DL), has made rapid and substantial progress in synthetic biology in recent years. Biotechnological applications of biosystems, including pathways, enzymes, and whole cells, are being probed frequently with time. The intricacy and interconnectedness of biosystems make it challenging to design them with the desired properties. ML and DL have a synergy with synthetic biology. Synthetic biology can be employed to produce large data sets for training models (for instance, by utilizing DNA synthesis), and ML/DL models can be employed to inform design (for example, by generating new parts or advising unrivaled experiments to perform). This potential has recently been brought to light by research at the intersection of engineering biology and ML/DL through achievements like the design of novel biological components, best experimental design, automated analysis of microscopy data, protein structure prediction, and biomolecular implementations of ANNs (Artificial Neural Networks). I have divided this review into three sections. In the first section, I describe predictive potential and basics of ML along with myriad applications in synthetic biology, especially in engineering cells, activity of proteins, and metabolic pathways. In the second section, I describe fundamental DL architectures and their applications in synthetic biology. Finally, I describe different challenges causing hurdles in the progress of ML/DL and synthetic biology along with their solutions.
Collapse
Affiliation(s)
- Manoj Kumar Goshisht
- Department of Chemistry, Natural and
Applied Sciences, University of Wisconsin—Green
Bay, Green
Bay, Wisconsin 54311-7001, United States
| |
Collapse
|
39
|
Hassan J, Saeed SM, Deka L, Uddin MJ, Das DB. Applications of Machine Learning (ML) and Mathematical Modeling (MM) in Healthcare with Special Focus on Cancer Prognosis and Anticancer Therapy: Current Status and Challenges. Pharmaceutics 2024; 16:260. [PMID: 38399314 PMCID: PMC10892549 DOI: 10.3390/pharmaceutics16020260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 01/29/2024] [Accepted: 02/07/2024] [Indexed: 02/25/2024] Open
Abstract
The use of data-driven high-throughput analytical techniques, which has given rise to computational oncology, is undisputed. The widespread use of machine learning (ML) and mathematical modeling (MM)-based techniques is widely acknowledged. These two approaches have fueled the advancement in cancer research and eventually led to the uptake of telemedicine in cancer care. For diagnostic, prognostic, and treatment purposes concerning different types of cancer research, vast databases of varied information with manifold dimensions are required, and indeed, all this information can only be managed by an automated system developed utilizing ML and MM. In addition, MM is being used to probe the relationship between the pharmacokinetics and pharmacodynamics (PK/PD interactions) of anti-cancer substances to improve cancer treatment, and also to refine the quality of existing treatment models by being incorporated at all steps of research and development related to cancer and in routine patient care. This review will serve as a consolidation of the advancement and benefits of ML and MM techniques with a special focus on the area of cancer prognosis and anticancer therapy, leading to the identification of challenges (data quantity, ethical consideration, and data privacy) which are yet to be fully addressed in current studies.
Collapse
Affiliation(s)
- Jasmin Hassan
- Drug Delivery & Therapeutics Lab, Dhaka 1212, Bangladesh; (J.H.); (S.M.S.)
| | | | - Lipika Deka
- Faculty of Computing, Engineering and Media, De Montfort University, Leicester LE1 9BH, UK;
| | - Md Jasim Uddin
- Department of Pharmaceutical Technology, Faculty of Pharmacy, Universiti Malaya, Kuala Lumpur 50603, Malaysia
| | - Diganta B. Das
- Department of Chemical Engineering, Loughborough University, Loughborough LE11 3TU, UK
| |
Collapse
|
40
|
Guo Z, Liu J, Wang Y, Chen M, Wang D, Xu D, Cheng J. Diffusion models in bioinformatics and computational biology. NATURE REVIEWS BIOENGINEERING 2024; 2:136-154. [PMID: 38576453 PMCID: PMC10994218 DOI: 10.1038/s44222-023-00114-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 08/25/2023] [Indexed: 04/06/2024]
Abstract
Denoising diffusion models embody a type of generative artificial intelligence that can be applied in computer vision, natural language processing and bioinformatics. In this Review, we introduce the key concepts and theoretical foundations of three diffusion modelling frameworks (denoising diffusion probabilistic models, noise-conditioned scoring networks and score stochastic differential equations). We then explore their applications in bioinformatics and computational biology, including protein design and generation, drug and small-molecule design, protein-ligand interaction modelling, cryo-electron microscopy image data analysis and single-cell data analysis. Finally, we highlight open-source diffusion model tools and consider the future applications of diffusion models in bioinformatics.
Collapse
Affiliation(s)
- Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Yanli Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Mengrui Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Duolin Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| |
Collapse
|
41
|
Wu Z, Sinha S. SPREd: a simulation-supervised neural network tool for gene regulatory network reconstruction. BIOINFORMATICS ADVANCES 2024; 4:vbae011. [PMID: 38444538 PMCID: PMC10913396 DOI: 10.1093/bioadv/vbae011] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 11/08/2023] [Accepted: 01/18/2024] [Indexed: 03/07/2024]
Abstract
Summary Reconstruction of gene regulatory networks (GRNs) from expression data is a significant open problem. Common approaches train a machine learning (ML) model to predict a gene's expression using transcription factors' (TFs') expression as features and designate important features/TFs as regulators of the gene. Here, we present an entirely different paradigm, where GRN edges are directly predicted by the ML model. The new approach, named "SPREd," is a simulation-supervised neural network for GRN inference. Its inputs comprise expression relationships (e.g. correlation, mutual information) between the target gene and each TF and between pairs of TFs. The output includes binary labels indicating whether each TF regulates the target gene. We train the neural network model using synthetic expression data generated by a biophysics-inspired simulation model that incorporates linear as well as non-linear TF-gene relationships and diverse GRN configurations. We show SPREd to outperform state-of-the-art GRN reconstruction tools GENIE3, ENNET, PORTIA, and TIGRESS on synthetic datasets with high co-expression among TFs, similar to that seen in real data. A key advantage of the new approach is its robustness to relatively small numbers of conditions (columns) in the expression matrix, which is a common problem faced by existing methods. Finally, we evaluate SPREd on real data sets in yeast that represent gold-standard benchmarks of GRN reconstruction and show it to perform significantly better than or comparably to existing methods. In addition to its high accuracy and speed, SPREd marks a first step toward incorporating biophysics principles of gene regulation into ML-based approaches to GRN reconstruction. Availability and implementation Data and code are available from https://github.com/iiiime/SPREd.
Collapse
Affiliation(s)
- Zijun Wu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
- H. Milton Steward School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
| |
Collapse
|
42
|
Li S, Liu Y, Shen LC, Yan H, Song J, Yu DJ. GMFGRN: a matrix factorization and graph neural network approach for gene regulatory network inference. Brief Bioinform 2024; 25:bbad529. [PMID: 38261340 PMCID: PMC10805180 DOI: 10.1093/bib/bbad529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/08/2023] [Accepted: 12/19/2023] [Indexed: 01/24/2024] Open
Abstract
The recent advances of single-cell RNA sequencing (scRNA-seq) have enabled reliable profiling of gene expression at the single-cell level, providing opportunities for accurate inference of gene regulatory networks (GRNs) on scRNA-seq data. Most methods for inferring GRNs suffer from the inability to eliminate transitive interactions or necessitate expensive computational resources. To address these, we present a novel method, termed GMFGRN, for accurate graph neural network (GNN)-based GRN inference from scRNA-seq data. GMFGRN employs GNN for matrix factorization and learns representative embeddings for genes. For transcription factor-gene pairs, it utilizes the learned embeddings to determine whether they interact with each other. The extensive suite of benchmarking experiments encompassing eight static scRNA-seq datasets alongside several state-of-the-art methods demonstrated mean improvements of 1.9 and 2.5% over the runner-up in area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). In addition, across four time-series datasets, maximum enhancements of 2.4 and 1.3% in AUROC and AUPRC were observed in comparison to the runner-up. Moreover, GMFGRN requires significantly less training time and memory consumption, with time and memory consumed <10% compared to the second-best method. These findings underscore the substantial potential of GMFGRN in the inference of GRNs. It is publicly available at https://github.com/Lishuoyy/GMFGRN.
Collapse
Affiliation(s)
- Shuo Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Yan Liu
- School of information Engineering, Yangzhou University, 196 West Huayang, Yangzhou, 225000, China
| | - Long-Chen Shen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - He Yan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| |
Collapse
|
43
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure-primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. Genome Biol 2024; 25:24. [PMID: 38238840 PMCID: PMC10797903 DOI: 10.1186/s13059-023-03134-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 11/30/2023] [Indexed: 01/22/2024] Open
Abstract
BACKGROUND Modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of genome-wide transcription factor activity (TFA) making it difficult to separate covariance and regulatory interactions. Inference of regulatory interactions and TFA requires aggregation of complementary evidence. Estimating TFA explicitly is problematic as it disconnects GRN inference and TFA estimation and is unable to account for, for example, contextual transcription factor-transcription factor interactions, and other higher order features. Deep-learning offers a potential solution, as it can model complex interactions and higher-order latent features, although does not provide interpretable models and latent features. RESULTS We propose a novel autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor) for modeling, and a metric, explained relative variance (ERV), for interpretation of GRNs. We evaluate SupirFactor with ERV in a wide set of contexts. Compared to current state-of-the-art GRN inference methods, SupirFactor performs favorably. We evaluate latent feature activity as an estimate of TFA and biological function in S. cerevisiae as well as in peripheral blood mononuclear cells (PBMC). CONCLUSION Here we present a framework for structure-primed inference and interpretation of GRNs, SupirFactor, demonstrating interpretability using ERV in multiple biological and experimental settings. SupirFactor enables TFA estimation and pathway analysis using latent factor activity, demonstrated here on two large-scale single-cell datasets, modeling S. cerevisiae and PBMC. We find that the SupirFactor model facilitates biological analysis acquiring novel functional and regulatory insight.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA.
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA.
- Department of Neuro-Science, University of Wisconsin-Madison - Waisman Center, Madison, USA.
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA
- Department of Biology, NYU, New York, NY, 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, 10010, USA.
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY, 10003, USA.
- Center For Data Science, NYU, New York, NY, 10008, USA.
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA.
| |
Collapse
|
44
|
Rashid A, Al-Obeidat F, Hafez W, Benakatti G, Malik RA, Koutentis C, Sharief J, Brierley J, Quraishi N, Malik ZA, Anwary A, Alkhzaimi H, Zaki SA, Khilnani P, Kadwa R, Phatak R, Schumacher M, Shaikh MG, Al-Dubai A, Hussain A. ADVANCING THE UNDERSTANDING OF CLINICAL SEPSIS USING GENE EXPRESSION-DRIVEN MACHINE LEARNING TO IMPROVE PATIENT OUTCOMES. Shock 2024; 61:4-18. [PMID: 37752080 PMCID: PMC11841734 DOI: 10.1097/shk.0000000000002227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 09/05/2023] [Indexed: 09/28/2023]
Abstract
ABSTRACT Sepsis remains a major challenge that necessitates improved approaches to enhance patient outcomes. This study explored the potential of machine learning (ML) techniques to bridge the gap between clinical data and gene expression information to better predict and understand sepsis. We discuss the application of ML algorithms, including neural networks, deep learning, and ensemble methods, to address key evidence gaps and overcome the challenges in sepsis research. The lack of a clear definition of sepsis is highlighted as a major hurdle, but ML models offer a workaround by focusing on endpoint prediction. We emphasize the significance of gene transcript information and its use in ML models to provide insights into sepsis pathophysiology and biomarker identification. Temporal analysis and integration of gene expression data further enhance the accuracy and predictive capabilities of ML models for sepsis. Although challenges such as interpretability and bias exist, ML research offers exciting prospects for addressing critical clinical problems, improving sepsis management, and advancing precision medicine approaches. Collaborative efforts between clinicians and data scientists are essential for the successful implementation and translation of ML models into clinical practice. Machine learning has the potential to revolutionize our understanding of sepsis and significantly improve patient outcomes. Further research and collaboration between clinicians and data scientists are needed to fully understand the potential of ML in sepsis management.
Collapse
Affiliation(s)
- Asrar Rashid
- School of Computing, Edinburgh Napier University, Edinburgh, UK
- NMC Royal Hospital, Khalifa, Abu Dhabi, UAE
| | - Feras Al-Obeidat
- College of Technological Innovation Zayed University, Abu Dhabi, UAE
| | - Wael Hafez
- NMC Royal Hospital, Khalifa, Abu Dhabi, UAE
- Internal Medicine Department, The Medical Research Division, The National Research Centre, Cairo, Egypt
| | | | - Rayaz A. Malik
- Institute of Cardiovascular Science, University of Manchester, Manchester, UK
- Weill Cornell Medicine-Qatar, Doha, Qatar
| | - Christos Koutentis
- Department of Anesthesiology, SUNY Downstate Medical Center, Brooklyn, New York
| | | | - Joe Brierley
- University College London, NIHR Great Ormond Street Hospital Biomedical Research Centre, London, UK
| | - Nasir Quraishi
- Centre for Spinal Studies & Surgery, Queen’s Medical Centre; The University of Nottingham, Nottingham, UK
| | - Zainab A. Malik
- College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, U.A.E
| | - Arif Anwary
- School of Computing, Edinburgh Napier University, Edinburgh, UK
| | | | - Syed Ahmed Zaki
- All India Institute of Medical Sciences, Bibinagar, Hyderabad, India
| | | | - Raziya Kadwa
- Department of Anesthesiology, SUNY Downstate Medical Center, Brooklyn, New York
| | - Rajesh Phatak
- Pediatric Intensive Care, Burjeel Hospital, Najda, Abu Dhabi
| | | | - M. Guftar Shaikh
- Department of Paediatric Endocrinology, Royal Hospital for Children, Glasgow, UK
| | - Ahmed Al-Dubai
- School of Computing, Edinburgh Napier University, Edinburgh, UK
| | - Amir Hussain
- School of Computing, Edinburgh Napier University, Edinburgh, UK
| |
Collapse
|
45
|
Wu Z, Sinha S. SPREd: A simulation-supervised neural network tool for gene regulatory network reconstruction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.09.566399. [PMID: 38014297 PMCID: PMC10680606 DOI: 10.1101/2023.11.09.566399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Reconstruction of gene regulatory networks (GRNs) from expression data is a significant open problem. Common approaches train a machine learning (ML) model to predict a gene's expression using transcription factors' (TFs') expression as features and designate important features/TFs as regulators of the gene. Here, we present an entirely different paradigm, where GRN edges are directly predicted by the ML model. The new approach, named "SPREd" is a simulation-supervised neural network for GRN inference. Its inputs comprise expression relationships (e.g., correlation, mutual information) between the target gene and each TF and between pairs of TFs. The output includes binary labels indicating whether each TF regulates the target gene. We train the neural network model using synthetic expression data generated by a biophysics-inspired simulation model that incorporates linear as well as non-linear TF-gene relationships and diverse GRN configurations. We show SPREd to outperform state-of-the-art GRN reconstruction tools GENIE3, ENNET, PORTIA and TIGRESS on synthetic datasets with high co-expression among TFs, similar to that seen in real data. A key advantage of the new approach is its robustness to relatively small numbers of conditions (columns) in the expression matrix, which is a common problem faced by existing methods. Finally, we evaluate SPREd on real data sets in yeast that represent gold standard benchmarks of GRN reconstruction and show it to perform significantly better than or comparably to existing methods. In addition to its high accuracy and speed, SPREd marks a first step towards incorporating biophysics principles of gene regulation into ML-based approaches to GRN reconstruction.
Collapse
Affiliation(s)
- Zijun Wu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
- H. Milton Steward School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30318, USA
| |
Collapse
|
46
|
Bi X, Liang W, Zhao Q, Wang J. SSLpheno: a self-supervised learning approach for gene-phenotype association prediction using protein-protein interactions and gene ontology data. Bioinformatics 2023; 39:btad662. [PMID: 37941450 PMCID: PMC10666204 DOI: 10.1093/bioinformatics/btad662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 10/17/2023] [Accepted: 11/03/2023] [Indexed: 11/10/2023] Open
Abstract
MOTIVATION Medical genomics faces significant challenges in interpreting disease phenotype and genetic heterogeneity. Despite the establishment of standardized disease phenotype databases, computational methods for predicting gene-phenotype associations still suffer from imbalanced category distribution and a lack of labeled data in small categories. RESULTS To address the problem of labeled-data scarcity, we propose a self-supervised learning strategy for gene-phenotype association prediction, called SSLpheno. Our approach utilizes an attributed network that integrates protein-protein interactions and gene ontology data. We apply a Laplacian-based filter to ensure feature smoothness and use self-supervised training to optimize node feature representation. Specifically, we calculate the cosine similarity of feature vectors and select positive and negative sample nodes for reconstruction training labels. We employ a deep neural network for multi-label classification of phenotypes in the downstream task. Our experimental results demonstrate that SSLpheno outperforms state-of-the-art methods, especially in categories with fewer annotations. Moreover, our case studies illustrate the potential of SSLpheno as an effective prescreening tool for gene-phenotype association identification. AVAILABILITY AND IMPLEMENTATION https://github.com/bixuehua/SSLpheno.
Collapse
Affiliation(s)
- Xuehua Bi
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi 830017, China
| | - Weiyang Liang
- College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
| | - Qichang Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
47
|
Wu Y, Qian B, Wang A, Dong H, Zhu E, Ma B. iLSGRN: inference of large-scale gene regulatory networks based on multi-model fusion. Bioinformatics 2023; 39:btad619. [PMID: 37851379 PMCID: PMC10589915 DOI: 10.1093/bioinformatics/btad619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/04/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) are a way of describing the interaction between genes, which contribute to revealing the different biological mechanisms in the cell. Reconstructing GRNs based on gene expression data has been a central computational problem in systems biology. However, due to the high dimensionality and non-linearity of large-scale GRNs, accurately and efficiently inferring GRNs is still a challenging task. RESULTS In this article, we propose a new approach, iLSGRN, to reconstruct large-scale GRNs from steady-state and time-series gene expression data based on non-linear ordinary differential equations. Firstly, the regulatory gene recognition algorithm calculates the Maximal Information Coefficient between genes and excludes redundant regulatory relationships to achieve dimensionality reduction. Then, the feature fusion algorithm constructs a model leveraging the feature importance derived from XGBoost (eXtreme Gradient Boosting) and RF (Random Forest) models, which can effectively train the non-linear ordinary differential equations model of GRNs and improve the accuracy and stability of the inference algorithm. The extensive experiments on different scale datasets show that our method makes sensible improvement compared with the state-of-the-art methods. Furthermore, we perform cross-validation experiments on the real gene datasets to validate the robustness and effectiveness of the proposed method. AVAILABILITY AND IMPLEMENTATION The proposed method is written in the Python language, and is available at: https://github.com/lab319/iLSGRN.
Collapse
Affiliation(s)
- Yiming Wu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Bing Qian
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Anqi Wang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong 999077, China
| | - Heng Dong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Enqiang Zhu
- Institution of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
48
|
Gong M, He Y, Wang M, Zhang Y, Ding C. Interpretable single-cell transcription factor prediction based on deep learning with attention mechanism. Comput Biol Chem 2023; 106:107923. [PMID: 37598467 DOI: 10.1016/j.compbiolchem.2023.107923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 07/01/2023] [Accepted: 07/12/2023] [Indexed: 08/22/2023]
Abstract
Predicting the transcription factor binding site (TFBS) in the whole genome range is essential in exploring the rule of gene transcription control. Although many deep learning methods to predict TFBS have been proposed, predicting TFBS using single-cell ATAC-seq data and embedding attention mechanisms needs to be improved. To this end, we present IscPAM, an interpretable method based on deep learning with an attention mechanism to predict single-cell transcription factors. Our model adopts the convolution neural network to extract the data feature and optimize the pre-trained model. In particular, the model obtains faster training and prediction due to the embedded attention mechanism. For datasets, we take ATAC-seq, ChIP-seq, and DNA sequences data for the pre-trained model, and single-cell ATAC-seq data is used to predict the TF binding graph in the given cell. We verify the interpretability of the model through ablation experiments and sensitivity analysis. IscPAM can efficiently predict the combination of whole genome transcription factors in single cells and study cellular heterogeneity through chromatin accessibility of related diseases.
Collapse
Affiliation(s)
- Meiqin Gong
- West China Second University Hospital, Sichuan University, Chengdu 610041, China
| | - Yuchen He
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Maocheng Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Chunli Ding
- Sichuan Institute of Computer Sciences, Chengdu 610041, China.
| |
Collapse
|
49
|
Cao Z, Li C, Wang K, He K, Wang X, Yu W. A fast and accurate identification model for Rhinolophus bats based on fine-grained information. Sci Rep 2023; 13:16375. [PMID: 37773197 PMCID: PMC10541429 DOI: 10.1038/s41598-023-42577-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 09/12/2023] [Indexed: 10/01/2023] Open
Abstract
Bats are a crucial component within ecosystems, providing valuable ecosystem services such as pollination and pest control. In practical conservation efforts, the classification and identification of bats are essential in order to develop effective conservation management programs for bats and their habitats. Traditionally, the identification of bats has been a manual and time-consuming process. With the development of artificial intelligence technology, the accuracy and speed of identification work of such fine-grained images as bats identification can be greatly improved. Bats identification relies on the fine features of their beaks and faces, so mining the fine-grained information in images is crucial to improve the accuracy of bats identification. This paper presents a deep learning-based model designed for the rapid and precise identification of common horseshoe bats (Chiroptera: Rhinolophidae: Rhinolophus) from Southern China. The model was developed by utilizing a comprehensive dataset of 883 high-resolution images of seven distinct Rhinolophus species which were collected during surveys conducted between 2010 and 2022. An improved EfficientNet model with an attention mechanism module is architected to mine the fine-grained appearance of these Rhinolophus. The performance of the model beat other classical models, including SqueezeNet, AlexNet, VGG16_BN, ShuffleNetV2, GoogleNet, ResNet50 and EfficientNet_B0, according to the predicting precision, recall, accuracy, F1-score. Our model achieved the highest identification accuracy of 94.22% and an F1-score of 0.948 with low computational complexity. Heat maps obtained with Grad-CAM show that our model meets the identification criteria of the morphology of Rhinolophus. Our study highlights the potential of artificial intelligence technology for the identification of small mammals, and facilitating fast species identification in the future.
Collapse
Affiliation(s)
- Zhong Cao
- School of Electronics and Communication Engineering, Guangzhou University, Guangzhou, 510006, China
| | - Chuxian Li
- School of Electronics and Communication Engineering, Guangzhou University, Guangzhou, 510006, China
| | - Kunhui Wang
- School of Electronics and Communication Engineering, Guangzhou University, Guangzhou, 510006, China
| | - Kai He
- School of Life Sciences, Guangzhou University, Guangzhou, 510006, China
| | - Xiaoyun Wang
- School of Life Sciences, Guangzhou University, Guangzhou, 510006, China.
| | - Wenhua Yu
- School of Life Sciences, Guangzhou University, Guangzhou, 510006, China.
| |
Collapse
|
50
|
Mao G, Pang Z, Zuo K, Wang Q, Pei X, Chen X, Liu J. Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks. Brief Bioinform 2023; 24:bbad414. [PMID: 37985457 PMCID: PMC10661972 DOI: 10.1093/bib/bbad414] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 10/25/2023] [Accepted: 10/26/2023] [Indexed: 11/22/2023] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful technique for studying gene expression patterns at the single-cell level. Inferring gene regulatory networks (GRNs) from scRNA-seq data provides insight into cellular phenotypes from the genomic level. However, the high sparsity, noise and dropout events inherent in scRNA-seq data present challenges for GRN inference. In recent years, the dramatic increase in data on experimentally validated transcription factors binding to DNA has made it possible to infer GRNs by supervised methods. In this study, we address the problem of GRN inference by framing it as a graph link prediction task. In this paper, we propose a novel framework called GNNLink, which leverages known GRNs to deduce the potential regulatory interdependencies between genes. First, we preprocess the raw scRNA-seq data. Then, we introduce a graph convolutional network-based interaction graph encoder to effectively refine gene features by capturing interdependencies between nodes in the network. Finally, the inference of GRN is obtained by performing matrix completion operation on node features. The features obtained from model training can be applied to downstream tasks such as measuring similarity and inferring causality between gene pairs. To evaluate the performance of GNNLink, we compare it with six existing GRN reconstruction methods using seven scRNA-seq datasets. These datasets encompass diverse ground truth networks, including functional interaction networks, Loss of Function/Gain of Function data, non-specific ChIP-seq data and cell-type-specific ChIP-seq data. Our experimental results demonstrate that GNNLink achieves comparable or superior performance across these datasets, showcasing its robustness and accuracy. Furthermore, we observe consistent performance across datasets of varying scales. For reproducibility, we provide the data and source code of GNNLink on our GitHub repository: https://github.com/sdesignates/GNNLink.
Collapse
Affiliation(s)
- Guo Mao
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Zhengbin Pang
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Ke Zuo
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Qinglin Wang
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Xiangdong Pei
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Xinhai Chen
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Jie Liu
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
- Laboratory of Software Engineering for Complex System, National University of Defense Technology, deya, 410073 Changsha, China
| |
Collapse
|