1
|
Wang JC, Chen YJ, Zou Q. GRACE: Unveiling Gene Regulatory Networks With Causal Mechanistic Graph Neural Networks in Single-Cell RNA-Sequencing Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:9005-9017. [PMID: 38896510 DOI: 10.1109/tnnls.2024.3412753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Reconstructing gene regulatory networks (GRNs) using single-cell RNA sequencing (scRNA-seq) data holds great promise for unraveling cellular fate development and heterogeneity. While numerous machine-learning methods have been proposed to infer GRNs from scRNA-seq gene expression data, many of them operate solely in a statistical or black box manner, limiting their capacity for making causal inferences between genes. In this study, we introduce GRN inference with Accuracy and Causal Explanation (GRACE), a novel graph-based causal autoencoder framework that combines a structural causal model (SCM) with graph neural networks (GNNs) to enable GRN inference and gene causal reasoning from scRNA-seq data. By explicitly modeling causal relationships between genes, GRACE facilitates the learning of regulatory context and gene embeddings. With the learned gene signals, our model successfully decoding the causal structures and alleviates the accurate determination of multiple attributes of gene regulation that is important to determine the regulatory levels. Through extensive evaluations on seven benchmarks, we demonstrate that GRACE outperforms 14 state-of-the-art GRN inference methods, with the incorporation of causal mechanisms significantly enhancing the accuracy of GRN and gene causality inference. Furthermore, the application to human peripheral blood mononuclear cell (PBMC) samples reveals cell type-specific regulators in monocyte phagocytosis and immune regulation, validated through network analysis and functional enrichment analysis.
Collapse
|
2
|
Wang J, Ye F, Chai H, Jiang Y, Wang T, Ran X, Xia Q, Xu Z, Fu Y, Zhang G, Wu H, Guo G, Guo H, Ruan Y, Wang Y, Xing D, Xu X, Zhang Z. Advances and applications in single-cell and spatial genomics. SCIENCE CHINA. LIFE SCIENCES 2025; 68:1226-1282. [PMID: 39792333 DOI: 10.1007/s11427-024-2770-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Accepted: 10/10/2024] [Indexed: 01/12/2025]
Abstract
The applications of single-cell and spatial technologies in recent times have revolutionized the present understanding of cellular states and the cellular heterogeneity inherent in complex biological systems. These advancements offer unprecedented resolution in the examination of the functional genomics of individual cells and their spatial context within tissues. In this review, we have comprehensively discussed the historical development and recent progress in the field of single-cell and spatial genomics. We have reviewed the breakthroughs in single-cell multi-omics technologies, spatial genomics methods, and the computational strategies employed toward the analyses of single-cell atlas data. Furthermore, we have highlighted the advances made in constructing cellular atlases and their clinical applications, particularly in the context of disease. Finally, we have discussed the emerging trends, challenges, and opportunities in this rapidly evolving field.
Collapse
Affiliation(s)
- Jingjing Wang
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Fang Ye
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Haoxi Chai
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, 310058, China
| | - Yujia Jiang
- BGI Research, Shenzhen, 518083, China
- BGI Research, Hangzhou, 310030, China
| | - Teng Wang
- Biomedical Pioneering Innovation Center (BIOPIC) and School of Life Sciences, Peking University, Beijing, 100871, China
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Xia Ran
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Institute of Hematology, Zhejiang University, Hangzhou, 310000, China
| | - Qimin Xia
- Biomedical Pioneering Innovation Center (BIOPIC) and School of Life Sciences, Peking University, Beijing, 100871, China
| | - Ziye Xu
- Department of Laboratory Medicine of The First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Yuting Fu
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Guodong Zhang
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Hanyu Wu
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Guoji Guo
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China.
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou, 310058, China.
- Zhejiang Provincial Key Lab for Tissue Engineering and Regenerative Medicine, Dr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative Medicine, Hangzhou, 310058, China.
- Institute of Hematology, Zhejiang University, Hangzhou, 310000, China.
| | - Hongshan Guo
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China.
- Institute of Hematology, Zhejiang University, Hangzhou, 310000, China.
| | - Yijun Ruan
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, 310058, China.
| | - Yongcheng Wang
- Department of Laboratory Medicine of The First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China.
| | - Dong Xing
- Biomedical Pioneering Innovation Center (BIOPIC) and School of Life Sciences, Peking University, Beijing, 100871, China.
- Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, 100871, China.
| | - Xun Xu
- BGI Research, Shenzhen, 518083, China.
- BGI Research, Hangzhou, 310030, China.
- Guangdong Provincial Key Laboratory of Genome Read and Write, BGI Research, Shenzhen, 518083, China.
| | - Zemin Zhang
- Biomedical Pioneering Innovation Center (BIOPIC) and School of Life Sciences, Peking University, Beijing, 100871, China.
| |
Collapse
|
3
|
Wei PJ, Jin HW, Gao Z, Su Y, Zheng CH. GAEDGRN: reconstruction of gene regulatory networks based on gravity-inspired graph autoencoders. Brief Bioinform 2025; 26:bbaf232. [PMID: 40415678 DOI: 10.1093/bib/bbaf232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2025] [Revised: 04/25/2025] [Accepted: 05/04/2025] [Indexed: 05/27/2025] Open
Abstract
Reconstructing high-resolution gene regulatory networks (GRNs) based on single-cell RNA sequencing data provides an opportunity to gain insight into disease pathogenesis. At present, there are a large number of GRN reconstruction methods based on graph neural networks, and they can obtain excellent performance in GRN inference by extracting network structure features. However, most of these methods fail to fully exploit the directional characteristics or even ignore them when extracting network structural features. To this end, a novel framework called GAEDGRN is proposed based on gravity-inspired graph autoencoder (GIGAE) to infer potential causal relationships between genes. Among them, GIGAE can help us capture the complex directed network topology in GRN. Additionally, due to the uneven distribution of the latent vectors generated by the graph autoencoder, a random walk-based method is used to regularize the latent vectors learnt by the encoder. Furthermore, considering that some genes in GRN usually have a significant impact on biological functions, GAEDGRN designs a gene importance score calculation method and pays attention to genes with high importance in the process of GRN reconstruction. Experimental results on seven cell types of three GRN types show that GAEDGRN achieves high accuracy and strong robustness. Moreover, a case study on human embryonic stem cells demonstrates that GAEDGRN can help identify important genes.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institute of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Huai-Wan Jin
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Zhen Gao
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Yansen Su
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Chun-Hou Zheng
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| |
Collapse
|
4
|
Wang K, Li Y, Liu F, Luan X, Wang X, Zhou J. GRLGRN: graph representation-based learning to infer gene regulatory networks from single-cell RNA-seq data. BMC Bioinformatics 2025; 26:108. [PMID: 40251476 PMCID: PMC12008888 DOI: 10.1186/s12859-025-06116-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2024] [Accepted: 03/18/2025] [Indexed: 04/20/2025] Open
Abstract
BACKGROUND A gene regulatory network (GRN) is a graph-level representation that describes the regulatory relationships between transcription factors and target genes in cells. The reconstruction of GRNs can help investigate cellular dynamics, drug design, and metabolic systems, and the rapid development of single-cell RNA sequencing (scRNA-seq) technology provides important opportunities while posing significant challenges for reconstructing GRNs. A number of methods for inferring GRNs have been proposed in recent years based on traditional machine learning and deep learning algorithms. However, inferring the GRN from scRNA-seq data remains challenging owing to cellular heterogeneity, measurement noise, and data dropout. RESULTS In this study, we propose a deep learning model called graph representational learning GRN (GRLGRN) to infer the latent regulatory dependencies between genes based on a prior GRN and data on the profiles of single-cell gene expressions. GRLGRN uses a graph transformer network to extract implicit links from the prior GRN, and encodes the features of genes by using both an adjacency matrix of implicit links and a matrix of the profile of gene expression. Moreover, it uses attention mechanisms to improve feature extraction, and feeds the refined gene embeddings into an output module to infer gene regulatory relationships. To evaluate the performance of GRLGRN, we compared it with prevalent models and performed ablation experiments on seven cell-line datasets with three ground-truth networks. The results showed that GRLGRN achieved the best predictions in AUROC and AUPRC on 78.6% and 80.9% of the datasets, and achieved an average improvement of 7.3% in AUROC and 30.7% in AUPRC. The interpretation discussion and the network visualization were conducted. CONCLUSIONS The experimental results and case studies illustrate the considerable performance of GRLGRN in predicting gene interactions and provide interpretability for the prediction tasks, such as identifying hub genes in the network and uncovering implicit links.
Collapse
Affiliation(s)
- Kai Wang
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
| | - Yulong Li
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
| | - Fei Liu
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
| | - Xiaoli Luan
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
| | - Xinglong Wang
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
- Key Laboratory of Industrial Biotechnology, Ministry of Education and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
| | - Jingwen Zhou
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China.
- Key Laboratory of Industrial Biotechnology, Ministry of Education and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China.
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China.
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China.
| |
Collapse
|
5
|
Huang K, Tian J, Sun L, Hu H, Huang X, Zhou S, Deng A, Zhou Z, Jiang M, Li G, Xie P, Wang Y, Jiang X. TransGeneSelector: using a transformer approach to mine key genes from small transcriptomic datasets in plant responses to various environments. BMC Genomics 2025; 26:259. [PMID: 40098114 PMCID: PMC11912617 DOI: 10.1186/s12864-025-11434-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 03/04/2025] [Indexed: 03/19/2025] Open
Abstract
Gene mining is crucial for understanding the regulatory mechanisms underlying complex biological processes, particularly in plants responding to environmental conditions. Traditional machine learning methods, while useful, often overlook important gene relationships due to their reliance on manual feature selection and limited ability to capture complex inter-gene regulatory dynamics. Deep learning approaches, while powerful, are often unsuitable for small sample sizes. This study introduces TransGeneSelector, the first deep learning framework specifically designed for mining key genes from small transcriptomic datasets. By integrating a Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) for sample generation and a Transformer-based network for classification, TransGeneSelector efficiently addresses the challenges of small-sample transcriptomic data, capturing both global gene regulatory interactions and specific biological processes. Evaluated in Arabidopsis thaliana, the model achieved high classification accuracy in predicting seed germination and heat stress conditions, outperforming traditional methods like Random Forest and Support Vector Machines (SVM). Moreover, Shapley Additive Explanations (SHAP) analysis and gene regulatory network construction revealed that TransGeneSelector effectively identified genes that appear to have upstream regulatory functions based on our analyses, enriching them in multiple key pathways which are critical for seed germination and heat stress response. RT-qPCR validation further confirmed the model's gene selection accuracy, demonstrating consistent expression patterns across varying germination conditions. The findings underscore the potential of TransGeneSelector as a robust tool for gene mining, offering deeper insights into gene regulation and organism adaptation under diverse environmental conditions. This work provides a framework that leverages deep learning for key gene identification in small transcriptomic datasets.
Collapse
Affiliation(s)
- Kerui Huang
- Key Laboratory of Agricultural Products Processing and Food Safety in Hunan Higher Education, Hunan University of Arts and Science, Changde, 415000, China
| | - Jianhong Tian
- College of Life Sciences, Hunan Normal University, Changsha, 410081, China
| | - Lei Sun
- Key Laboratory of Research and Utilization of Ethnomedicinal Plant Resources of Hunan Province, College of Biological and Food Engineering, Huaihua University, Huaihua, 418000, China
| | - Haoliang Hu
- Key Laboratory of Agricultural Products Processing and Food Safety in Hunan Higher Education, Hunan University of Arts and Science, Changde, 415000, China
| | - Xuebin Huang
- Key Laboratory of Agricultural Products Processing and Food Safety in Hunan Higher Education, Hunan University of Arts and Science, Changde, 415000, China
| | - Shiqi Zhou
- Rice Research Institute of Jiangxi Academy of Agricultural Sciences, Nanchang, 330000, China
| | - Aihua Deng
- Key Laboratory of Agricultural Products Processing and Food Safety in Hunan Higher Education, Hunan University of Arts and Science, Changde, 415000, China
| | - Zhibo Zhou
- Key Laboratory of Agricultural Products Processing and Food Safety in Hunan Higher Education, Hunan University of Arts and Science, Changde, 415000, China
| | - Ming Jiang
- Key Laboratory of Agricultural Products Processing and Food Safety in Hunan Higher Education, Hunan University of Arts and Science, Changde, 415000, China
| | - Guiwu Li
- College of Life Sciences, Hunan Normal University, Changsha, 410081, China
| | - Peng Xie
- Key Laboratory of Agricultural Products Processing and Food Safety in Hunan Higher Education, Hunan University of Arts and Science, Changde, 415000, China.
| | - Yun Wang
- Key Laboratory of Agricultural Products Processing and Food Safety in Hunan Higher Education, Hunan University of Arts and Science, Changde, 415000, China.
| | - Xiaocheng Jiang
- College of Life Sciences, Hunan Normal University, Changsha, 410081, China.
| |
Collapse
|
6
|
Yu W, Lin Z, Lan M, Ou-Yang L. GCLink: a graph contrastive link prediction framework for gene regulatory network inference. Bioinformatics 2025; 41:btaf074. [PMID: 39960893 PMCID: PMC11881698 DOI: 10.1093/bioinformatics/btaf074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 01/10/2025] [Accepted: 02/13/2025] [Indexed: 03/06/2025] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) unveil the intricate interactions among genes, pivotal in elucidating the complex biological processes within cells. The advent of single-cell RNA-sequencing (scRNA-seq) enables the inference of GRNs at single-cell resolution. However, the majority of current supervised network inference methods typically concentrate on predicting pairwise gene regulatory interaction, thus failing to fully exploit correlations among all genes and exhibiting limited generalization performance. RESULTS To address these issues, we propose a graph contrastive link prediction (GCLink) model to infer potential gene regulatory interactions from scRNA-seq data. Based on known gene regulatory interactions and scRNA-seq data, GCLink introduces a graph contrastive learning strategy to aggregate the feature and neighborhood information of genes to learn their representations. This approach reduces the dependence of our model on sample size and enhance its ability in predicting potential gene regulatory interactions. Extensive experiments on real scRNA-seq datasets demonstrate that GCLink outperforms other state-of-the-art methods in most cases. Furthermore, by pretraining GCLink on a source cell line with abundant known regulatory interactions and fine-tuning it on a target cell line with limited amount of known interactions, our GCLink model exhibits good performance in GRN inference, demonstrating its effectiveness in inferring GRNs from datasets with limited known interactions. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/Yoyiming/GCLink.
Collapse
Affiliation(s)
- Weiming Yu
- Guangdong Provincial Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
| | - Zerun Lin
- Guangdong Provincial Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
| | - Miaofang Lan
- Guangdong Provincial Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
| | - Le Ou-Yang
- Guangdong Laboratory of Machine Perception and Intelligent Computing, Faculty of Engineering, Shenzhen MSU-BIT University, Shenzhen 518116, China
| |
Collapse
|
7
|
Gao Z, Su Y, Tang J, Jin H, Ding Y, Cao RF, Wei PJ, Zheng CH. AttentionGRN: a functional and directed graph transformer for gene regulatory network reconstruction from scRNA-seq data. Brief Bioinform 2025; 26:bbaf118. [PMID: 40116659 PMCID: PMC11926986 DOI: 10.1093/bib/bbaf118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Revised: 02/12/2025] [Accepted: 02/27/2025] [Indexed: 03/23/2025] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) enables the reconstruction of cell type-specific gene regulatory networks (GRNs), offering detailed insights into gene regulation at high resolution. While graph neural networks have become widely used for GRN inference, their message-passing mechanisms are often limited by issues such as over-smoothing and over-squashing, which hinder the preservation of essential network structure. To address these challenges, we propose a novel graph transformer-based model, AttentionGRN, which leverages soft encoding to enhance model expressiveness and improve the accuracy of GRN inference from scRNA-seq data. Furthermore, the GRN-oriented message aggregation strategies are designed to capture both the directed network structure information and functional information inherent in GRNs. Specifically, we design directed structure encoding to facilitate the learning of directed network topologies and employ functional gene sampling to capture key functional modules and global network structure. Our extensive experiments, conducted on 88 datasets across two distinct tasks, demonstrate that AttentionGRN consistently outperforms existing methods. Furthermore, AttentionGRN has been successfully applied to reconstruct cell type-specific GRNs for human mature hepatocytes, revealing novel hub genes and previously unidentified transcription factor-target gene regulatory associations.
Collapse
Affiliation(s)
- Zhen Gao
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Yansen Su
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Jin Tang
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Huaiwan Jin
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Yun Ding
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Rui-Fen Cao
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Pi-Jing Wei
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institute of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Chun-Hou Zheng
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| |
Collapse
|
8
|
Chen L, Dautle M, Gao R, Zhang S, Chen Y. Inferring gene regulatory networks from time-series scRNA-seq data via GRANGER causal recurrent autoencoders. Brief Bioinform 2025; 26:bbaf089. [PMID: 40062616 PMCID: PMC11891664 DOI: 10.1093/bib/bbaf089] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 01/26/2025] [Accepted: 02/18/2025] [Indexed: 05/13/2025] Open
Abstract
The development of single-cell RNA sequencing (scRNA-seq) technology provides valuable data resources for inferring gene regulatory networks (GRNs), enabling deeper insights into cellular mechanisms and diseases. While many methods exist for inferring GRNs from static scRNA-seq data, current approaches face challenges in accurately handling time-series scRNA-seq data due to high noise levels and data sparsity. The temporal dimension introduces additional complexity by requiring models to capture dynamic changes, increasing sensitivity to noise, and exacerbating data sparsity across time points. In this study, we introduce GRANGER, an unsupervised deep learning-based method that integrates multiple advanced techniques, including a recurrent variational autoencoder, GRANGER causality, sparsity-inducing penalties, and negative binomial (NB)-based loss functions, to infer GRNs. GRANGER was evaluated using multiple popular benchmarking datasets, where it demonstrated superior performance compared to eight well-known GRN inference methods. The integration of a NB-based loss function and sparsity-inducing penalties in GRANGER significantly enhanced its capacity to address dropout noise and sparsity in scRNA-seq data. Additionally, GRANGER exhibited robustness against high levels of dropout noise. We applied GRANGER to scRNA-seq data from the whole mouse brain obtained through the BRAIN Initiative project and identified GRNs for five transcription regulators: E2f7, Gbx1, Sox10, Prox1, and Onecut2, which play crucial roles in diverse brain cell types. The inferred GRNs not only recalled many known regulatory relationships but also revealed sets of novel regulatory interactions with functional potential. These findings demonstrate that GRANGER is a highly effective tool for real-world applications in discovering novel gene regulatory relationships.
Collapse
Affiliation(s)
- Liang Chen
- College of Computer and Information Engineering, Tianjin Normal University, 393 Binshui W Ave, Tianjin, Tianjin 300387, China
| | - Madison Dautle
- Department of Biological and Biomedical Sciences, Rowan University, 201 Mullica Hill Road, Glassboro, NJ 08028, United States
| | - Ruoying Gao
- College of Computer and Information Engineering, Tianjin Normal University, 393 Binshui W Ave, Tianjin, Tianjin 300387, China
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, 393 Binshui W Ave, Tianjin, Tianjin 300387, China
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, 201 Mullica Hill Road, Glassboro, NJ 08028, United States
| |
Collapse
|
9
|
Sun Y, Gao J. HGATLink: single-cell gene regulatory network inference via the fusion of heterogeneous graph attention networks and transformer. BMC Bioinformatics 2025; 26:49. [PMID: 39934680 PMCID: PMC11817978 DOI: 10.1186/s12859-025-06071-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Accepted: 01/29/2025] [Indexed: 02/13/2025] Open
Abstract
BACKGROUND Gene regulatory networks (GRNs) involve complex regulatory relationships between genes and play important roles in the study of various biological systems and diseases. The introduction of single-cell sequencing (scRNA-seq) technology has allowed gene regulation studies to be carried out on specific cell types, providing the opportunity to accurately infer gene regulatory networks. However, the sparsity and noise problems of single-cell sequencing data pose challenges for gene regulatory network inference, and although many gene regulatory network inference methods have been proposed, they often fail to eliminate transitive interactions or do not address multilevel relationships and nonlinear features in the graph data well. RESULTS On the basis of the above limitations, we propose a gene regulatory network inference framework named HGATLink. HGATLink combines the heterogeneous graph attention network and simplified transformer to capture complex interactions effectively between genes in low-dimensional space via matrix decomposition techniques, which not only enhances the ability to model complex heterogeneous graph structures and alleviate transitive interactions, but also effectively captures the long-range dependencies between genes to ensure more accurate prediction. CONCLUSIONS Compared with 10 state-of-the-art GRN inference methods on 14 scRNA-seq datasets under two metrics, AUROC and AUPRC, HGATLink shows good stability and accuracy in gene regulatory network inference tasks.
Collapse
Affiliation(s)
- Yao Sun
- Department of Computer Science and Technology, College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, 010011, Inner Mongolia, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research, Hohhot, 010018, Inner Mongolia, China
| | - Jing Gao
- Department of Computer Science and Technology, College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, 010011, Inner Mongolia, China.
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research, Hohhot, 010018, Inner Mongolia, China.
| |
Collapse
|
10
|
Asim MN, Ibrahim MA, Asif T, Dengel A. RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models. Heliyon 2025; 11:e41488. [PMID: 39897847 PMCID: PMC11783440 DOI: 10.1016/j.heliyon.2024.e41488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 12/23/2024] [Accepted: 12/24/2024] [Indexed: 02/04/2025] Open
Abstract
Deciphering information of RNA sequences reveals their diverse roles in living organisms, including gene regulation and protein synthesis. Aberrations in RNA sequence such as dysregulation and mutations can drive a diverse spectrum of diseases including cancers, genetic disorders, and neurodegenerative conditions. Furthermore, researchers are harnessing RNA's therapeutic potential for transforming traditional treatment paradigms into personalized therapies through the development of RNA-based drugs and gene therapies. To gain insights of biological functions and to detect diseases at early stages and develop potent therapeutics, researchers are performing diverse types RNA sequence analysis tasks. RNA sequence analysis through conventional wet-lab methods is expensive, time-consuming and error prone. To enable large-scale RNA sequence analysis, empowerment of wet-lab experimental methods with Artificial Intelligence (AI) applications necessitates scientists to have a comprehensive knowledge of both DNA and AI fields. While molecular biologists encounter challenges in understanding AI methods, computer scientists often lack basic foundations of RNA sequence analysis tasks. Considering the absence of a comprehensive literature that bridges this research gap and promotes the development of AI-driven RNA sequence analysis applications, the contributions of this manuscript are manifold: It equips AI researchers with biological foundations of 47 distinct RNA sequence analysis tasks. It sets a stage for development of benchmark datasets related to 47 distinct RNA sequence analysis tasks by facilitating cruxes of 64 different biological databases. It presents word embeddings and language models applications across 47 distinct RNA sequence analysis tasks. It streamlines the development of new predictors by providing a comprehensive survey of 58 word embeddings and 70 language models based predictive pipelines performance values as well as top performing traditional sequence encoding based predictors and their performances across 47 RNA sequence analysis tasks.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Tayyaba Asif
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| |
Collapse
|
11
|
Cao G, Chen D. Unveiling Long Non-coding RNA Networks from Single-Cell Omics Data Through Artificial Intelligence. Methods Mol Biol 2025; 2883:257-279. [PMID: 39702712 DOI: 10.1007/978-1-0716-4290-0_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
Single-cell omics technologies have revolutionized the study of long non-coding RNAs (lncRNAs), offering unprecedented resolution in elucidating their expression dynamics, cell-type specificity, and associated gene regulatory networks (GRNs). Concurrently, the integration of artificial intelligence (AI) methodologies has significantly advanced our understanding of lncRNA functions and its implications in disease pathogenesis. This chapter discusses the progress in single-cell omics data analysis, emphasizing its pivotal role in unraveling the molecular mechanisms underlying cellular heterogeneity and the associated regulatory networks involving lncRNAs. Additionally, we provide a summary of single-cell omics resources and AI models for constructing single-cell gene regulatory networks (scGRNs). Finally, we explore the challenges and prospects of exploring scGRNs in the context of lncRNA biology.
Collapse
Affiliation(s)
- Guangshuo Cao
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| | - Dijun Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China.
| |
Collapse
|
12
|
Li R, Wu J, Li G, Liu J, Liu J, Xuan J, Deng Z. SIGRN: Inferring Gene Regulatory Network with Soft Introspective Variational Autoencoders. Int J Mol Sci 2024; 25:12741. [PMID: 39684451 DOI: 10.3390/ijms252312741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 11/21/2024] [Accepted: 11/25/2024] [Indexed: 12/18/2024] Open
Abstract
Gene regulatory networks (GRNs) exhibit the complex regulatory relationships among genes, which are essential for understanding developmental biology and uncovering the fundamental aspects of various biological phenomena. It is an effective and economical way to infer GRNs from single-cell RNA sequencing (scRNA-seq) with computational methods. Recent researches have been done on the problem by using variational autoencoder (VAE) and structural equation model (SEM). Due to the shortcoming of VAE generating poor-quality data, in this paper, a soft introspective adversarial gene regulatory network unsupervised inference model, called SIGRN, is proposed by introducing adversarial mechanism in building a variational autoencoder model. SIGRN applies "soft" introspective adversarial mode to avoid training additional neural networks and adding additional training parameters. It demonstrates superior inference accuracy across most benchmark datasets when compared to nine leading-edge methods. In addition, method SIGRN also achieves better performance on representing cells and generating scRNA-seq data in most datasets. All of which have been verified via substantial experiments. The SIGRN method shows promise for generating scRNA-seq data and inferring GRNs.
Collapse
Affiliation(s)
- Rongyuan Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Jingli Wu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Gaoshi Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Jiafei Liu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Jinlu Liu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Junbo Xuan
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| | - Zheng Deng
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China
| |
Collapse
|
13
|
Li J, Zhang H, Wang J, Deng M, Li Z, Jiang W, Xu K, Wu L, Dong Z, Liu J, Ding Q, Yu H. Development and Validation of an AI-Driven System for Automatic Literature Analysis and Molecular Regulatory Network Construction. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2405395. [PMID: 39373342 PMCID: PMC11600262 DOI: 10.1002/advs.202405395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 09/06/2024] [Indexed: 10/08/2024]
Abstract
Decoding gene regulatory networks is essential for understanding the mechanisms underlying many complex diseases. GENET is developed, an automated system designed to extract and visualize extensive molecular relationships from published biomedical literature. Using natural language processing, entities and relations are identified from a randomly selected set of 1788 scientific articles, and visualized in a filterable knowledge graph. The performance of GENET is evaluated and compared with existing methods. The named entity recognition model has achieved an overall precision of 94.23% (4835/5131; 93.56-94.84%), recall of 97.72% (4835/4948; 97.27-98.10%), and an F1 score of 95.94%. The relation extraction model has demonstrated an overall precision of 91.63% (2593/2830; 90.55-92.59%), recall of 89.17% (2593/2908; 87.99-90.25%), and an F1 score of 90.38%. GENET significantly outperforms existing methods in extracting molecular relationships (P < 0.001). Additionally, GENET has successfully predicted WNT family member 4 regulates insulin-like growth factor 2 via signal transducer and activator of transcription 3 in colon cancer. With RNA sequencing data and multiple immunofluorescence, the authenticity of this prediction is validated, supporting the promising feasibility of GENET.
Collapse
Affiliation(s)
- Jia Li
- Department of GastroenterologyRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Key Laboratory of Digestive SystemRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Provincial Clinical Research Center for Digestive Disease Minimally Invasive IncisionRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Engineering Research Center for Artificial Intelligence Endoscopy Interventional Treatment of Hubei ProvinceWuhanHubei430060P. R. China
| | - Hailin Zhang
- Department of GastroenterologyRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Key Laboratory of Digestive SystemRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Provincial Clinical Research Center for Digestive Disease Minimally Invasive IncisionRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Engineering Research Center for Artificial Intelligence Endoscopy Interventional Treatment of Hubei ProvinceWuhanHubei430060P. R. China
| | - Jiamin Wang
- Department of GastroenterologyRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Key Laboratory of Digestive SystemRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Provincial Clinical Research Center for Digestive Disease Minimally Invasive IncisionRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Engineering Research Center for Artificial Intelligence Endoscopy Interventional Treatment of Hubei ProvinceWuhanHubei430060P. R. China
| | - Mei Deng
- Department of GastroenterologyRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Key Laboratory of Digestive SystemRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Provincial Clinical Research Center for Digestive Disease Minimally Invasive IncisionRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Engineering Research Center for Artificial Intelligence Endoscopy Interventional Treatment of Hubei ProvinceWuhanHubei430060P. R. China
| | - Zhiyong Li
- Engineering Research Center for Artificial Intelligence Endoscopy Interventional Treatment of Hubei ProvinceWuhanHubei430060P. R. China
| | - Wei Jiang
- Engineering Research Center for Artificial Intelligence Endoscopy Interventional Treatment of Hubei ProvinceWuhanHubei430060P. R. China
| | - Kejin Xu
- Engineering Research Center for Artificial Intelligence Endoscopy Interventional Treatment of Hubei ProvinceWuhanHubei430060P. R. China
| | - Lianlian Wu
- Department of GastroenterologyRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Key Laboratory of Digestive SystemRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Provincial Clinical Research Center for Digestive Disease Minimally Invasive IncisionRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Engineering Research Center for Artificial Intelligence Endoscopy Interventional Treatment of Hubei ProvinceWuhanHubei430060P. R. China
| | - Zehua Dong
- Department of GastroenterologyRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Key Laboratory of Digestive SystemRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Provincial Clinical Research Center for Digestive Disease Minimally Invasive IncisionRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Engineering Research Center for Artificial Intelligence Endoscopy Interventional Treatment of Hubei ProvinceWuhanHubei430060P. R. China
| | - Jun Liu
- Department of GastroenterologyRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Provincial Clinical Research Center for Digestive Disease Minimally Invasive IncisionRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Nursing Department of Renmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
| | - Qianshan Ding
- Department of GastroenterologyRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Key Laboratory of Digestive SystemRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Provincial Clinical Research Center for Digestive Disease Minimally Invasive IncisionRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Engineering Research Center for Artificial Intelligence Endoscopy Interventional Treatment of Hubei ProvinceWuhanHubei430060P. R. China
| | - Honggang Yu
- Department of GastroenterologyRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Key Laboratory of Digestive SystemRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Hubei Provincial Clinical Research Center for Digestive Disease Minimally Invasive IncisionRenmin Hospital of Wuhan UniversityWuhanHubei430060P. R. China
- Engineering Research Center for Artificial Intelligence Endoscopy Interventional Treatment of Hubei ProvinceWuhanHubei430060P. R. China
| |
Collapse
|
14
|
Cui W, Long Q, Xiao M, Wang X, Feng G, Li X, Wang P, Zhou Y. Refining computational inference of gene regulatory networks: integrating knockout data within a multi-task framework. Brief Bioinform 2024; 25:bbae361. [PMID: 39082651 PMCID: PMC11289685 DOI: 10.1093/bib/bbae361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/09/2024] [Accepted: 07/16/2024] [Indexed: 08/03/2024] Open
Abstract
Constructing accurate gene regulatory network s (GRNs), which reflect the dynamic governing process between genes, is critical to understanding the diverse cellular process and unveiling the complexities in biological systems. With the development of computer sciences, computational-based approaches have been applied to the GRNs inference task. However, current methodologies face challenges in effectively utilizing existing topological information and prior knowledge of gene regulatory relationships, hindering the comprehensive understanding and accurate reconstruction of GRNs. In response, we propose a novel graph neural network (GNN)-based Multi-Task Learning framework for GRN reconstruction, namely MTLGRN. Specifically, we first encode the gene promoter sequences and the gene biological features and concatenate the corresponding feature representations. Then, we construct a multi-task learning framework including GRN reconstruction, Gene knockout predict, and Gene expression matrix reconstruction. With joint training, MTLGRN can optimize the gene latent representations by integrating gene knockout information, promoter characteristics, and other biological attributes. Extensive experimental results demonstrate superior performance compared with state-of-the-art baselines on the GRN reconstruction task, efficiently leveraging biological knowledge and comprehensively understanding the gene regulatory relationships. MTLGRN also pioneered attempts to simulate gene knockouts on bulk data by incorporating gene knockout information.
Collapse
Affiliation(s)
- Wentao Cui
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Qingqing Long
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
| | - Meng Xiao
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Xuezhi Wang
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Guihai Feng
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing, 100101, China
| | - Xin Li
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing, 100101, China
| | - Pengfei Wang
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Yuanchun Zhou
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| |
Collapse
|
15
|
Peng H, Xu J, Liu K, Liu F, Zhang A, Zhang X. EIEPCF: accurate inference of functional gene regulatory networks by eliminating indirect effects from confounding factors. Brief Funct Genomics 2024; 23:373-383. [PMID: 37642217 DOI: 10.1093/bfgp/elad040] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/07/2023] [Accepted: 08/14/2023] [Indexed: 08/31/2023] Open
Abstract
Reconstructing functional gene regulatory networks (GRNs) is a primary prerequisite for understanding pathogenic mechanisms and curing diseases in animals, and it also provides an important foundation for cultivating vegetable and fruit varieties that are resistant to diseases and corrosion in plants. Many computational methods have been developed to infer GRNs, but most of the regulatory relationships between genes obtained by these methods are biased. Eliminating indirect effects in GRNs remains a significant challenge for researchers. In this work, we propose a novel approach for inferring functional GRNs, named EIEPCF (eliminating indirect effects produced by confounding factors), which eliminates indirect effects caused by confounding factors. This method eliminates the influence of confounding factors on regulatory factors and target genes by measuring the similarity between their residuals. The validation results of the EIEPCF method on simulation studies, the gold-standard networks provided by the DREAM3 Challenge and the real gene networks of Escherichia coli demonstrate that it achieves significantly higher accuracy compared to other popular computational methods for inferring GRNs. As a case study, we utilized the EIEPCF method to reconstruct the cold-resistant specific GRN from gene expression data of cold-resistant in Arabidopsis thaliana. The source code and data are available at https://github.com/zhanglab-wbgcas/EIEPCF.
Collapse
Affiliation(s)
- Huixiang Peng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Kangchen Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Fang Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, China
| |
Collapse
|
16
|
Moeckel C, Mouratidis I, Chantzi N, Uzun Y, Georgakopoulos-Soares I. Advances in computational and experimental approaches for deciphering transcriptional regulatory networks: Understanding the roles of cis-regulatory elements is essential, and recent research utilizing MPRAs, STARR-seq, CRISPR-Cas9, and machine learning has yielded valuable insights. Bioessays 2024; 46:e2300210. [PMID: 38715516 PMCID: PMC11444527 DOI: 10.1002/bies.202300210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/22/2024] [Accepted: 04/23/2024] [Indexed: 05/16/2024]
Abstract
Understanding the influence of cis-regulatory elements on gene regulation poses numerous challenges given complexities stemming from variations in transcription factor (TF) binding, chromatin accessibility, structural constraints, and cell-type differences. This review discusses the role of gene regulatory networks in enhancing understanding of transcriptional regulation and covers construction methods ranging from expression-based approaches to supervised machine learning. Additionally, key experimental methods, including MPRAs and CRISPR-Cas9-based screening, which have significantly contributed to understanding TF binding preferences and cis-regulatory element functions, are explored. Lastly, the potential of machine learning and artificial intelligence to unravel cis-regulatory logic is analyzed. These computational advances have far-reaching implications for precision medicine, therapeutic target discovery, and the study of genetic variations in health and disease.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
17
|
Zhou X, Pan J, Chen L, Zhang S, Chen Y. DeepIMAGER: Deeply Analyzing Gene Regulatory Networks from scRNA-seq Data. Biomolecules 2024; 14:766. [PMID: 39062480 PMCID: PMC11274664 DOI: 10.3390/biom14070766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 06/22/2024] [Accepted: 06/25/2024] [Indexed: 07/28/2024] Open
Abstract
Understanding the dynamics of gene regulatory networks (GRNs) across diverse cell types poses a challenge yet holds immense value in unraveling the molecular mechanisms governing cellular processes. Current computational methods, which rely solely on expression changes from bulk RNA-seq and/or scRNA-seq data, often result in high rates of false positives and low precision. Here, we introduce an advanced computational tool, DeepIMAGER, for inferring cell-specific GRNs through deep learning and data integration. DeepIMAGER employs a supervised approach that transforms the co-expression patterns of gene pairs into image-like representations and leverages transcription factor (TF) binding information for model training. It is trained using comprehensive datasets that encompass scRNA-seq profiles and ChIP-seq data, capturing TF-gene pair information across various cell types. Comprehensive validations on six cell lines show DeepIMAGER exhibits superior performance in ten popular GRN inference tools and has remarkable robustness against dropout-zero events. DeepIMAGER was applied to scRNA-seq datasets of multiple myeloma (MM) and detected potential GRNs for TFs of RORC, MITF, and FOXD2 in MM dendritic cells. This technical innovation, combined with its capability to accurately decode GRNs from scRNA-seq, establishes DeepIMAGER as a valuable tool for unraveling complex regulatory networks in various cell types.
Collapse
Affiliation(s)
- Xiguo Zhou
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Jingyi Pan
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Liang Chen
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| |
Collapse
|
18
|
Wagle MM, Long S, Chen C, Liu C, Yang P. Interpretable deep learning in single-cell omics. Bioinformatics 2024; 40:btae374. [PMID: 38889275 PMCID: PMC11211213 DOI: 10.1093/bioinformatics/btae374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/11/2024] [Accepted: 06/12/2024] [Indexed: 06/20/2024] Open
Abstract
MOTIVATION Single-cell omics technologies have enabled the quantification of molecular profiles in individual cells at an unparalleled resolution. Deep learning, a rapidly evolving sub-field of machine learning, has instilled a significant interest in single-cell omics research due to its remarkable success in analysing heterogeneous high-dimensional single-cell omics data. Nevertheless, the inherent multi-layer nonlinear architecture of deep learning models often makes them 'black boxes' as the reasoning behind predictions is often unknown and not transparent to the user. This has stimulated an increasing body of research for addressing the lack of interpretability in deep learning models, especially in single-cell omics data analyses, where the identification and understanding of molecular regulators are crucial for interpreting model predictions and directing downstream experimental validations. RESULTS In this work, we introduce the basics of single-cell omics technologies and the concept of interpretable deep learning. This is followed by a review of the recent interpretable deep learning models applied to various single-cell omics research. Lastly, we highlight the current limitations and discuss potential future directions.
Collapse
Affiliation(s)
- Manoj M Wagle
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Siqu Long
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Carissa Chen
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Chunlei Liu
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Pengyi Yang
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| |
Collapse
|
19
|
Li R, Du K, Zhang C, Shen X, Yun L, Wang S, Li Z, Sun Z, Wei J, Li Y, Guo B, Sun C. Single-cell transcriptome profiling reveals the spatiotemporal distribution of triterpenoid saponin biosynthesis and transposable element activity in Gynostemma pentaphyllum shoot apexes and leaves. FRONTIERS IN PLANT SCIENCE 2024; 15:1394587. [PMID: 38779067 PMCID: PMC11109411 DOI: 10.3389/fpls.2024.1394587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 04/24/2024] [Indexed: 05/25/2024]
Abstract
Gynostemma pentaphyllum (Thunb.) Makino is an important producer of dammarene-type triterpenoid saponins. These saponins (gypenosides) exhibit diverse pharmacological benefits such as anticancer, antidiabetic, and immunomodulatory effects, and have major potential in the pharmaceutical and health care industries. Here, we employed single-cell RNA sequencing (scRNA-seq) to profile the transcriptomes of more than 50,000 cells derived from G. pentaphyllum shoot apexes and leaves. Following cell clustering and annotation, we identified five major cell types in shoot apexes and four in leaves. Each cell type displayed substantial transcriptomic heterogeneity both within and between tissues. Examining gene expression patterns across various cell types revealed that gypenoside biosynthesis predominantly occurred in mesophyll cells, with heightened activity observed in shoot apexes compared to leaves. Furthermore, we explored the impact of transposable elements (TEs) on G. pentaphyllum transcriptomic landscapes. Our findings the highlighted the unbalanced expression of certain TE families across different cell types in shoot apexes and leaves, marking the first investigation of TE expression at the single-cell level in plants. Additionally, we observed dynamic expression of genes involved in gypenoside biosynthesis and specific TE families during epidermal and vascular cell development. The involvement of TE expression in regulating cell differentiation and gypenoside biosynthesis warrant further exploration. Overall, this study not only provides new insights into the spatiotemporal organization of gypenoside biosynthesis and TE activity in G. pentaphyllum shoot apexes and leaves but also offers valuable cellular and genetic resources for a deeper understanding of developmental and physiological processes at single-cell resolution in this species.
Collapse
Affiliation(s)
- Rucan Li
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Ke Du
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Chuyi Zhang
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xiaofeng Shen
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Lingling Yun
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Shu Wang
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Ziqin Li
- College of Pharmacy, Shandong University of Traditional Chinese Medicine, Jinan, Shandong, China
| | - Zhiying Sun
- College of Pharmacy, Shandong University of Traditional Chinese Medicine, Jinan, Shandong, China
| | - Jianhe Wei
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Ying Li
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Baolin Guo
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Chao Sun
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
20
|
Gan Y, Yu J, Xu G, Yan C, Zou G. Inferring gene regulatory networks from single-cell transcriptomics based on graph embedding. Bioinformatics 2024; 40:btae291. [PMID: 38810116 PMCID: PMC11142726 DOI: 10.1093/bioinformatics/btae291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 03/06/2024] [Accepted: 05/28/2024] [Indexed: 05/31/2024] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) encode gene regulation in living organisms, and have become a critical tool to understand complex biological processes. However, due to the dynamic and complex nature of gene regulation, inferring GRNs from scRNA-seq data is still a challenging task. Existing computational methods usually focus on the close connections between genes, and ignore the global structure and distal regulatory relationships. RESULTS In this study, we develop a supervised deep learning framework, IGEGRNS, to infer GRNs from scRNA-seq data based on graph embedding. In the framework, contextual information of genes is captured by GraphSAGE, which aggregates gene features and neighborhood structures to generate low-dimensional embedding for genes. Then, the k most influential nodes in the whole graph are filtered through Top-k pooling. Finally, potential regulatory relationships between genes are predicted by stacking CNNs. Compared with nine competing supervised and unsupervised methods, our method achieves better performance on six time-series scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION Our method IGEGRNS is implemented in Python using the Pytorch machine learning library, and it is freely available at https://github.com/DHUDBlab/IGEGRNS.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Jiacheng Yu
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Guangwei Xu
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Cairong Yan
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Guobing Zou
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| |
Collapse
|
21
|
Xu J, Huang D, Zhang X. scmFormer Integrates Large-Scale Single-Cell Proteomics and Transcriptomics Data by Multi-Task Transformer. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2307835. [PMID: 38483032 PMCID: PMC11109621 DOI: 10.1002/advs.202307835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/24/2024] [Indexed: 05/23/2024]
Abstract
Transformer-based models have revolutionized single cell RNA-seq (scRNA-seq) data analysis. However, their applicability is challenged by the complexity and scale of single-cell multi-omics data. Here a novel single-cell multi-modal/multi-task transformer (scmFormer) is proposed to fill up the existing blank of integrating single-cell proteomics with other omics data. Through systematic benchmarking, it is demonstrated that scmFormer excels in integrating large-scale single-cell multimodal data and heterogeneous multi-batch paired multi-omics data, while preserving shared information across batchs and distinct biological information. scmFormer achieves 54.5% higher average F1 score compared to the second method in transferring cell-type labels from single-cell transcriptomics to proteomics data. Using COVID-19 datasets, it is presented that scmFormer successfully integrates over 1.48 million cells on a personal computer. Moreover, it is also proved that scmFormer performs better than existing methods on generating the unmeasured modality and is well-suited for spatial multi-omic data. Thus, scmFormer is a powerful and comprehensive tool for analyzing single-cell multi-omics data.
Collapse
Affiliation(s)
- Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty AgricultureWuhan Botanical GardenChinese Academy of SciencesWuhan430074China
- University of Chinese Academy of SciencesBeijing100049China
| | - De‐Shuang Huang
- Eastern Institute for Advanced StudyEastern Institute of TechnologyNingbo315200China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty AgricultureWuhan Botanical GardenChinese Academy of SciencesWuhan430074China
- Center of Economic BotanyCore Botanical GardensChinese Academy of SciencesWuhan430074China
| |
Collapse
|
22
|
Liu F, Shi F, Du F, Cao X, Yu Z. CoT: a transformer-based method for inferring tumor clonal copy number substructure from scDNA-seq data. Brief Bioinform 2024; 25:bbae187. [PMID: 38670159 PMCID: PMC11052634 DOI: 10.1093/bib/bbae187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 03/08/2024] [Accepted: 04/16/2024] [Indexed: 04/28/2024] Open
Abstract
Single-cell DNA sequencing (scDNA-seq) has been an effective means to unscramble intra-tumor heterogeneity, while joint inference of tumor clones and their respective copy number profiles remains a challenging task due to the noisy nature of scDNA-seq data. We introduce a new bioinformatics method called CoT for deciphering clonal copy number substructure. The backbone of CoT is a Copy number Transformer autoencoder that leverages multi-head attention mechanism to explore correlations between different genomic regions, and thus capture global features to create latent embeddings for the cells. CoT makes it convenient to first infer cell subpopulations based on the learned embeddings, and then estimate single-cell copy numbers through joint analysis of read counts data for the cells belonging to the same cluster. This exploitation of clonal substructure information in copy number analysis helps to alleviate the effect of read counts non-uniformity, and yield robust estimations of the tumor copy numbers. Performance evaluation on synthetic and real datasets showcases that CoT outperforms the state of the arts, and is highly useful for deciphering clonal copy number substructure.
Collapse
Affiliation(s)
- Furui Liu
- School of Information Engineering, Ningxia University, 750021, Ningxia, China
| | - Fangyuan Shi
- School of Information Engineering, Ningxia University, 750021, Ningxia, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, 750021, Ningxia, China
| | - Fang Du
- School of Information Engineering, Ningxia University, 750021, Ningxia, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, 750021, Ningxia, China
| | - Xiangmei Cao
- Basic Medical School, Ningxia Medical University, 750001, Ningxia, China
| | - Zhenhua Yu
- School of Information Engineering, Ningxia University, 750021, Ningxia, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, 750021, Ningxia, China
| |
Collapse
|
23
|
Li S, Liu Y, Shen LC, Yan H, Song J, Yu DJ. GMFGRN: a matrix factorization and graph neural network approach for gene regulatory network inference. Brief Bioinform 2024; 25:bbad529. [PMID: 38261340 PMCID: PMC10805180 DOI: 10.1093/bib/bbad529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/08/2023] [Accepted: 12/19/2023] [Indexed: 01/24/2024] Open
Abstract
The recent advances of single-cell RNA sequencing (scRNA-seq) have enabled reliable profiling of gene expression at the single-cell level, providing opportunities for accurate inference of gene regulatory networks (GRNs) on scRNA-seq data. Most methods for inferring GRNs suffer from the inability to eliminate transitive interactions or necessitate expensive computational resources. To address these, we present a novel method, termed GMFGRN, for accurate graph neural network (GNN)-based GRN inference from scRNA-seq data. GMFGRN employs GNN for matrix factorization and learns representative embeddings for genes. For transcription factor-gene pairs, it utilizes the learned embeddings to determine whether they interact with each other. The extensive suite of benchmarking experiments encompassing eight static scRNA-seq datasets alongside several state-of-the-art methods demonstrated mean improvements of 1.9 and 2.5% over the runner-up in area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). In addition, across four time-series datasets, maximum enhancements of 2.4 and 1.3% in AUROC and AUPRC were observed in comparison to the runner-up. Moreover, GMFGRN requires significantly less training time and memory consumption, with time and memory consumed <10% compared to the second-best method. These findings underscore the substantial potential of GMFGRN in the inference of GRNs. It is publicly available at https://github.com/Lishuoyy/GMFGRN.
Collapse
Affiliation(s)
- Shuo Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Yan Liu
- School of information Engineering, Yangzhou University, 196 West Huayang, Yangzhou, 225000, China
| | - Long-Chen Shen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - He Yan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| |
Collapse
|
24
|
Hu Y, Xiao K, Yang H, Liu X, Zhang C, Shi Q. Spatially contrastive variational autoencoder for deciphering tissue heterogeneity from spatially resolved transcriptomics. Brief Bioinform 2024; 25:bbae016. [PMID: 38324623 PMCID: PMC10849194 DOI: 10.1093/bib/bbae016] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 12/29/2023] [Indexed: 02/09/2024] Open
Abstract
Recent advances in spatially resolved transcriptomics (SRT) have brought ever-increasing opportunities to characterize expression landscape in the context of tissue spatiality. Nevertheless, there still exist multiple challenges to accurately detect spatial functional regions in tissue. Here, we present a novel contrastive learning framework, SPAtially Contrastive variational AutoEncoder (SpaCAE), which contrasts transcriptomic signals of each spot and its spatial neighbors to achieve fine-grained tissue structures detection. By employing a graph embedding variational autoencoder and incorporating a deep contrastive strategy, SpaCAE achieves a balance between spatial local information and global information of expression, enabling effective learning of representations with spatial constraints. Particularly, SpaCAE provides a graph deconvolutional decoder to address the smoothing effect of local spatial structure on expression's self-supervised learning, an aspect often overlooked by current graph neural networks. We demonstrated that SpaCAE could achieve effective performance on SRT data generated from multiple technologies for spatial domains identification and data denoising, making it a remarkable tool to obtain novel insights from SRT studies.
Collapse
Affiliation(s)
- Yaofeng Hu
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, Hangzhou 310024; University of Chinese Academy of Sciences, China
| | - Kai Xiao
- State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hengyu Yang
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, Hangzhou 310024; University of Chinese Academy of Sciences, China
| | - Xiaoping Liu
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, Hangzhou 310024; University of Chinese Academy of Sciences, China
| | - Chuanchao Zhang
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, Hangzhou 310024; University of Chinese Academy of Sciences, China
| | - Qianqian Shi
- Hubei Engineering Technology Research Center of Agricultural Big Data, Huazhong Agricultural University, Wuhan 430070, Hubei, China
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
25
|
Mao G, Pang Z, Zuo K, Wang Q, Pei X, Chen X, Liu J. Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks. Brief Bioinform 2023; 24:bbad414. [PMID: 37985457 PMCID: PMC10661972 DOI: 10.1093/bib/bbad414] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 10/25/2023] [Accepted: 10/26/2023] [Indexed: 11/22/2023] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful technique for studying gene expression patterns at the single-cell level. Inferring gene regulatory networks (GRNs) from scRNA-seq data provides insight into cellular phenotypes from the genomic level. However, the high sparsity, noise and dropout events inherent in scRNA-seq data present challenges for GRN inference. In recent years, the dramatic increase in data on experimentally validated transcription factors binding to DNA has made it possible to infer GRNs by supervised methods. In this study, we address the problem of GRN inference by framing it as a graph link prediction task. In this paper, we propose a novel framework called GNNLink, which leverages known GRNs to deduce the potential regulatory interdependencies between genes. First, we preprocess the raw scRNA-seq data. Then, we introduce a graph convolutional network-based interaction graph encoder to effectively refine gene features by capturing interdependencies between nodes in the network. Finally, the inference of GRN is obtained by performing matrix completion operation on node features. The features obtained from model training can be applied to downstream tasks such as measuring similarity and inferring causality between gene pairs. To evaluate the performance of GNNLink, we compare it with six existing GRN reconstruction methods using seven scRNA-seq datasets. These datasets encompass diverse ground truth networks, including functional interaction networks, Loss of Function/Gain of Function data, non-specific ChIP-seq data and cell-type-specific ChIP-seq data. Our experimental results demonstrate that GNNLink achieves comparable or superior performance across these datasets, showcasing its robustness and accuracy. Furthermore, we observe consistent performance across datasets of varying scales. For reproducibility, we provide the data and source code of GNNLink on our GitHub repository: https://github.com/sdesignates/GNNLink.
Collapse
Affiliation(s)
- Guo Mao
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Zhengbin Pang
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Ke Zuo
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Qinglin Wang
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Xiangdong Pei
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Xinhai Chen
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Jie Liu
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
- Laboratory of Software Engineering for Complex System, National University of Defense Technology, deya, 410073 Changsha, China
| |
Collapse
|
26
|
Choi SR, Lee M. Transformer Architecture and Attention Mechanisms in Genome Data Analysis: A Comprehensive Review. BIOLOGY 2023; 12:1033. [PMID: 37508462 PMCID: PMC10376273 DOI: 10.3390/biology12071033] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/18/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023]
Abstract
The emergence and rapid development of deep learning, specifically transformer-based architectures and attention mechanisms, have had transformative implications across several domains, including bioinformatics and genome data analysis. The analogous nature of genome sequences to language texts has enabled the application of techniques that have exhibited success in fields ranging from natural language processing to genomic data. This review provides a comprehensive analysis of the most recent advancements in the application of transformer architectures and attention mechanisms to genome and transcriptome data. The focus of this review is on the critical evaluation of these techniques, discussing their advantages and limitations in the context of genome data analysis. With the swift pace of development in deep learning methodologies, it becomes vital to continually assess and reflect on the current standing and future direction of the research. Therefore, this review aims to serve as a timely resource for both seasoned researchers and newcomers, offering a panoramic view of the recent advancements and elucidating the state-of-the-art applications in the field. Furthermore, this review paper serves to highlight potential areas of future investigation by critically evaluating studies from 2019 to 2023, thereby acting as a stepping-stone for further research endeavors.
Collapse
Affiliation(s)
| | - Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea;
| |
Collapse
|