1
|
Yu W, Lin Z, Lan M, Ou-Yang L. GCLink: a graph contrastive link prediction framework for gene regulatory network inference. Bioinformatics 2025; 41:btaf074. [PMID: 39960893 PMCID: PMC11881698 DOI: 10.1093/bioinformatics/btaf074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 01/10/2025] [Accepted: 02/13/2025] [Indexed: 03/06/2025] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) unveil the intricate interactions among genes, pivotal in elucidating the complex biological processes within cells. The advent of single-cell RNA-sequencing (scRNA-seq) enables the inference of GRNs at single-cell resolution. However, the majority of current supervised network inference methods typically concentrate on predicting pairwise gene regulatory interaction, thus failing to fully exploit correlations among all genes and exhibiting limited generalization performance. RESULTS To address these issues, we propose a graph contrastive link prediction (GCLink) model to infer potential gene regulatory interactions from scRNA-seq data. Based on known gene regulatory interactions and scRNA-seq data, GCLink introduces a graph contrastive learning strategy to aggregate the feature and neighborhood information of genes to learn their representations. This approach reduces the dependence of our model on sample size and enhance its ability in predicting potential gene regulatory interactions. Extensive experiments on real scRNA-seq datasets demonstrate that GCLink outperforms other state-of-the-art methods in most cases. Furthermore, by pretraining GCLink on a source cell line with abundant known regulatory interactions and fine-tuning it on a target cell line with limited amount of known interactions, our GCLink model exhibits good performance in GRN inference, demonstrating its effectiveness in inferring GRNs from datasets with limited known interactions. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/Yoyiming/GCLink.
Collapse
Affiliation(s)
- Weiming Yu
- Guangdong Provincial Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
| | - Zerun Lin
- Guangdong Provincial Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
| | - Miaofang Lan
- Guangdong Provincial Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
| | - Le Ou-Yang
- Guangdong Laboratory of Machine Perception and Intelligent Computing, Faculty of Engineering, Shenzhen MSU-BIT University, Shenzhen 518116, China
| |
Collapse
|
2
|
Choi JJ, Svaren J, Wang D. CoTF-reg reveals cooperative transcription factors in oligodendrocyte gene regulation using single-cell multi-omics. Commun Biol 2025; 8:181. [PMID: 39910206 PMCID: PMC11799153 DOI: 10.1038/s42003-025-07570-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Accepted: 01/17/2025] [Indexed: 02/07/2025] Open
Abstract
Oligodendrocytes are the myelinating cells within the central nervous system, but the mechanisms by which transcription factors (TFs) cooperate for gene regulation in oligodendrocytes remain unclear. We introduce coTF-reg, an analytical framework that integrates scRNA-seq and scATAC-seq data to identify cooperative TFs co-regulating the target gene (TG). First, we identify co-binding TF pairs in the same oligodendrocyte-specific regulatory regions. Next, we train a deep learning model to predict each TG expression using the co-binding TFs' expressions. Shapley interaction scores reveal high interactions between co-binding TF pairs, such as SOX10-TCF12. Validation using oligodendrocyte eQTLs and their eGenes that are regulated by these cooperative TFs show potential regulatory roles for genetic variants. Experimental validation using ChIP-seq data confirms some cooperative TF pairs, such as SOX10-OLIG2. Prediction performance of our models is evaluated through holdout data and additional datasets, and an ablation study is also conducted. The results demonstrate stable and consistent performance.
Collapse
Affiliation(s)
- Jerome J Choi
- Waisman Center, University of Wisconsin-Madison, Madison, WI, USA
- Department of Population Health Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | - John Svaren
- Waisman Center, University of Wisconsin-Madison, Madison, WI, USA
- Department of Comparative Biosciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI, USA
| | - Daifeng Wang
- Waisman Center, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
3
|
Niloofar P, Aghdam R, Eslahchi C. GAEM: Genetic Algorithm based Expectation-Maximization for inferring Gene Regulatory Networks from incomplete data. Comput Biol Med 2024; 183:109238. [PMID: 39426072 DOI: 10.1016/j.compbiomed.2024.109238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 09/02/2024] [Accepted: 09/30/2024] [Indexed: 10/21/2024]
Abstract
In Bioinformatics, inferring the structure of a Gene Regulatory Network (GRN) from incomplete gene expression data is a difficult task. One popular method for inferring the structure GRNs is to apply the Path Consistency Algorithm based on Conditional Mutual Information (PCA-CMI). Although PCA-CMI excels at extracting GRN skeletons, it struggles with missing values in datasets. As a result, applying PCA-CMI to infer GRNs, necessitates a preprocessing method for data imputation. In this paper, we present the GAEM algorithm, which uses an iterative approach based on a combination of Genetic Algorithm and Expectation-Maximization to infer the structure of GRN from incomplete gene expression datasets. GAEM learns the GRN structure from the incomplete dataset via an algorithm that iteratively updates the imputed values based on the learnt GRN until the convergence criteria are met. We evaluate the performance of this algorithm under various missingness mechanisms (ignorable and nonignorable) and percentages (5%, 15%, and 40%). The traditional approach to handling missing values in gene expression datasets involves estimating them first and then constructing the GRN. However, our methodology differs in that both missing values and the GRN are updated iteratively until convergence. Results from the DREAM3 dataset demonstrate that the GAEM algorithm appears to be a more reliable method overall, especially for smaller network sizes, GAEM outperforms methods where the incomplete dataset is imputed first, followed by learning the GRN structure from the imputed data. We have implemented the GAEM algorithm within the GAEM R package, which is accessible at the following GitHub repository: https://github.com/parniSDU/GAEM.
Collapse
Affiliation(s)
- Parisa Niloofar
- Mærsk Mc-Kinney Møller Institute, University of Southern Denmark, Campusvej 55, Odense, 5230, Denmark.
| | - Rosa Aghdam
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, WI, Madison, USA; School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Iran
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Iran; School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Iran
| |
Collapse
|
4
|
Yu J, Leng J, Yuan F, Sun D, Wu LY. Reverse network diffusion to remove indirect noise for better inference of gene regulatory networks. Bioinformatics 2024; 40:btae435. [PMID: 38963312 PMCID: PMC11236096 DOI: 10.1093/bioinformatics/btae435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Revised: 06/24/2024] [Accepted: 07/03/2024] [Indexed: 07/05/2024] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) are vital tools for delineating regulatory relationships between transcription factors and their target genes. The boom in computational biology and various biotechnologies has made inferring GRNs from multi-omics data a hot topic. However, when networks are constructed from gene expression data, they often suffer from false-positive problem due to the transitive effects of correlation. The presence of spurious noise edges obscures the real gene interactions, which makes downstream analyses, such as detecting gene function modules and predicting disease-related genes, difficult and inefficient. Therefore, there is an urgent and compelling need to develop network denoising methods to improve the accuracy of GRN inference. RESULTS In this study, we proposed a novel network denoising method named REverse Network Diffusion On Random walks (RENDOR). RENDOR is designed to enhance the accuracy of GRNs afflicted by indirect effects. RENDOR takes noisy networks as input, models higher-order indirect interactions between genes by transitive closure, eliminates false-positive effects using the inverse network diffusion method, and produces refined networks as output. We conducted a comparative assessment of GRN inference accuracy before and after denoising on simulated networks and real GRNs. Our results emphasized that the network derived from RENDOR more accurately and effectively captures gene interactions. This study demonstrates the significance of removing network indirect noise and highlights the effectiveness of the proposed method in enhancing the signal-to-noise ratio of noisy networks. AVAILABILITY AND IMPLEMENTATION The R package RENDOR is provided at https://github.com/Wu-Lab/RENDOR and other source code and data are available at https://github.com/Wu-Lab/RENDOR-reproduce.
Collapse
Affiliation(s)
- Jiating Yu
- School of Mathematics and Statistics, Nanjing University of Information Science & Technology, Nanjing 210044, China
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiacheng Leng
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Zhejiang Lab, Hangzhou 311121, China
| | - Fan Yuan
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Duanchen Sun
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Ling-Yun Wu
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
5
|
Häusler S. Correlations reveal the hierarchical organization of biological networks with latent variables. Commun Biol 2024; 7:678. [PMID: 38831002 PMCID: PMC11148204 DOI: 10.1038/s42003-024-06342-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 05/16/2024] [Indexed: 06/05/2024] Open
Abstract
Deciphering the functional organization of large biological networks is a major challenge for current mathematical methods. A common approach is to decompose networks into largely independent functional modules, but inferring these modules and their organization from network activity is difficult, given the uncertainties and incompleteness of measurements. Typically, some parts of the overall functional organization, such as intermediate processing steps, are latent. We show that the hidden structure can be determined from the statistical moments of observable network components alone, as long as the functional relevance of the network components lies in their mean values and the mean of each latent variable maps onto a scaled expectation of a binary variable. Whether the function of biological networks permits a hierarchical modularization can be falsified by a correlation-based statistical test that we derive. We apply the test to gene regulatory networks, dendrites of pyramidal neurons, and networks of spiking neurons.
Collapse
Affiliation(s)
- Stefan Häusler
- Faculty of Biology and Bernstein Center for Computational Neuroscience, Ludwig-Maximilians-Universität München, Munich, Germany.
| |
Collapse
|
6
|
Zhang D, Gao S, Liu ZP, Gao R. LogicGep: Boolean networks inference using symbolic regression from time-series transcriptomic profiling data. Brief Bioinform 2024; 25:bbae286. [PMID: 38886006 PMCID: PMC11182660 DOI: 10.1093/bib/bbae286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/09/2024] [Accepted: 06/06/2024] [Indexed: 06/20/2024] Open
Abstract
Reconstructing the topology of gene regulatory network from gene expression data has been extensively studied. With the abundance functional transcriptomic data available, it is now feasible to systematically decipher regulatory interaction dynamics in a logic form such as a Boolean network (BN) framework, which qualitatively indicates how multiple regulators aggregated to affect a common target gene. However, inferring both the network topology and gene interaction dynamics simultaneously is still a challenging problem since gene expression data are typically noisy and data discretization is prone to information loss. We propose a new method for BN inference from time-series transcriptional profiles, called LogicGep. LogicGep formulates the identification of Boolean functions as a symbolic regression problem that learns the Boolean function expression and solve it efficiently through multi-objective optimization using an improved gene expression programming algorithm. To avoid overly emphasizing dynamic characteristics at the expense of topology structure ones, as traditional methods often do, a set of promising Boolean formulas for each target gene is evolved firstly, and a feed-forward neural network trained with continuous expression data is subsequently employed to pick out the final solution. We validated the efficacy of LogicGep using multiple datasets including both synthetic and real-world experimental data. The results elucidate that LogicGep adeptly infers accurate BN models, outperforming other representative BN inference algorithms in both network topology reconstruction and the identification of Boolean functions. Moreover, the execution of LogicGep is hundreds of times faster than other methods, especially in the case of large network inference.
Collapse
Affiliation(s)
- Dezhen Zhang
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Shuhua Gao
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Rui Gao
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
7
|
Malekpour SA, Haghverdi L, Sadeghi M. Single-cell multi-omics analysis identifies context-specific gene regulatory gates and mechanisms. Brief Bioinform 2024; 25:bbae180. [PMID: 38653489 PMCID: PMC11036345 DOI: 10.1093/bib/bbae180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 01/29/2024] [Accepted: 04/02/2024] [Indexed: 04/25/2024] Open
Abstract
There is a growing interest in inferring context specific gene regulatory networks from single-cell RNA sequencing (scRNA-seq) data. This involves identifying the regulatory relationships between transcription factors (TFs) and genes in individual cells, and then characterizing these relationships at the level of specific cell types or cell states. In this study, we introduce scGATE (single-cell gene regulatory gate) as a novel computational tool for inferring TF-gene interaction networks and reconstructing Boolean logic gates involving regulatory TFs using scRNA-seq data. In contrast to current Boolean models, scGATE eliminates the need for individual formulations and likelihood calculations for each Boolean rule (e.g. AND, OR, XOR). By employing a Bayesian framework, scGATE infers the Boolean rule after fitting the model to the data, resulting in significant reductions in time-complexities for logic-based studies. We have applied assay for transposase-accessible chromatin with sequencing (scATAC-seq) data and TF DNA binding motifs to filter out non-relevant TFs in gene regulations. By integrating single-cell clustering with these external cues, scGATE is able to infer context specific networks. The performance of scGATE is evaluated using synthetic and real single-cell multi-omics data from mouse tissues and human blood, demonstrating its superiority over existing tools for reconstructing TF-gene networks. Additionally, scGATE provides a flexible framework for understanding the complex combinatorial and cooperative relationships among TFs regulating target genes by inferring Boolean logic gates among them.
Collapse
Affiliation(s)
- Seyed Amir Malekpour
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), 19395-5746, Tehran, Iran
| | - Laleh Haghverdi
- Berlin Institute for Medical Systems Biology, Max Delbrück Center (BIMSB-MDC) in the Helmholtz Association, Berlin, Germany
| | - Mehdi Sadeghi
- Department of Medical Genetics, National Institute of Genetic Engineering and Biotechnology, 1497716316, Tehran, Iran
| |
Collapse
|