1
|
Roy S, Sheikh SZ, Furey TS. CoVar: A generalizable machine learning approach to identify the coordinated regulators driving variational gene expression. PLoS Comput Biol 2024; 20:e1012016. [PMID: 38630807 PMCID: PMC11057768 DOI: 10.1371/journal.pcbi.1012016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 04/29/2024] [Accepted: 03/22/2024] [Indexed: 04/19/2024] Open
Abstract
Network inference is used to model transcriptional, signaling, and metabolic interactions among genes, proteins, and metabolites that identify biological pathways influencing disease pathogenesis. Advances in machine learning (ML)-based inference models exhibit the predictive capabilities of capturing latent patterns in genomic data. Such models are emerging as an alternative to the statistical models identifying causative factors driving complex diseases. We present CoVar, an ML-based framework that builds upon the properties of existing inference models, to find the central genes driving perturbed gene expression across biological states. Unlike differentially expressed genes (DEGs) that capture changes in individual gene expression across conditions, CoVar focuses on identifying variational genes that undergo changes in their expression network interaction profiles, providing insights into changes in the regulatory dynamics, such as in disease pathogenesis. Subsequently, it finds core genes from among the nearest neighbors of these variational genes, which are central to the variational activity and influence the coordinated regulatory processes underlying the observed changes in gene expression. Through the analysis of simulated as well as yeast expression data perturbed by the deletion of the mitochondrial genome, we show that CoVar captures the intrinsic variationality and modularity in the expression data, identifying key driver genes not found through existing differential analysis methodologies.
Collapse
Affiliation(s)
- Satyaki Roy
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Shehzad Z. Sheikh
- Departments of Medicine and Genetics, Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Terrence S. Furey
- Departments of Genetics and Biology, Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
2
|
Leng J, Wu LY. Interaction-based transcriptome analysis via differential network inference. Brief Bioinform 2022; 23:6768051. [PMID: 36274239 PMCID: PMC9677477 DOI: 10.1093/bib/bbac466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/13/2022] [Accepted: 09/28/2022] [Indexed: 12/14/2022] Open
Abstract
Gene-based transcriptome analysis, such as differential expression analysis, can identify the key factors causing disease production, cell differentiation and other biological processes. However, this is not enough because basic life activities are mainly driven by the interactions between genes. Although there have been already many differential network inference methods for identifying the differential gene interactions, currently, most studies still only use the information of nodes in the network for downstream analyses. To investigate the insight into differential gene interactions, we should perform interaction-based transcriptome analysis (IBTA) instead of gene-based analysis after obtaining the differential networks. In this paper, we illustrated a workflow of IBTA by developing a Co-hub Differential Network inference (CDN) algorithm, and a novel interaction-based metric, pivot APC2. We confirmed the superior performance of CDN through simulation experiments compared with other popular differential network inference algorithms. Furthermore, three case studies are given using colorectal cancer, COVID-19 and triple-negative breast cancer datasets to demonstrate the ability of our interaction-based analytical process to uncover causative mechanisms.
Collapse
Affiliation(s)
- Jiacheng Leng
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ling-Yun Wu
- Corresponding author. Ling-Yun Wu, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China. E-mail:
| |
Collapse
|
3
|
Zhang Y, Chang X, Liu X. Inference of gene regulatory networks using pseudo-time series data. Bioinformatics 2021; 37:2423-2431. [PMID: 33576787 DOI: 10.1093/bioinformatics/btab099] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 01/18/2021] [Accepted: 02/10/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Inferring gene regulatory networks (GRNs) from high-throughput data is an important and challenging problem in systems biology. Although numerous GRN methods have been developed, most have focused on the verification of the specific data set. However, it is difficult to establish directed topological networks that are both suitable for time-series and non-time-series datasets due to the complexity and diversity of biological networks. RESULTS Here, we proposed a novel method, GNIPLR (Gene networks inference based on projection and lagged regression) to infer GRNs from time-series or non-time-series gene expression data. GNIPLR projected gene data twice using the LASSO projection (LSP) algorithm and the linear projection (LP) approximation to produce a linear and monotonous pseudo-time series, and then determined the direction of regulation in combination with lagged regression analyses. The proposed algorithm was validated using simulated and real biological data. Moreover, we also applied the GNIPLR algorithm to the liver hepatocellular carcinoma (LIHC) and bladder urothelial carcinoma (BLCA) cancer expression datasets. These analyses revealed significantly higher accuracy and AUC values than other popular methods. AVAILABILITY The GNIPLR tool is freely available at https://github.com/zyllluck/GNIPLR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuelei Zhang
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| | - Xiao Chang
- Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China
| | - Xiaoping Liu
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| |
Collapse
|
4
|
|
5
|
Ou-Yang L, Zhang XF, Hu X, Yan H. Differential Network Analysis via Weighted Fused Conditional Gaussian Graphical Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2162-2169. [PMID: 31247559 DOI: 10.1109/tcbb.2019.2924418] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The development and prognosis of complex diseases usually involves changes in regulatory relationships among biomolecules. Understanding how the regulatory relationships change with genetic alterations can help to reveal the underlying biological mechanisms for complex diseases. Although several models have been proposed to estimate the differential network between two different states, they are not suitable to deal with situations where the molecules of interest are affected by other covariates. Nor can they make use of prior information that provides insights about the structures of biomolecular networks. In this study, we introduce a novel weighted fused conditional Gaussian graphical model to jointly estimate two state-specific biomolecular regulatory networks and their difference between two different states. Unlike previous differential network estimation methods, our model can take into account the related covariates and the prior network information when inferring differential networks. The effectiveness of our proposed model is first evaluated based on simulation studies. Experiment results demonstrate that our model outperforms other state-of-the-art differential networks estimation models in all cases. We then apply our model to identify the differential gene network between two subtypes of glioblastoma based on gene expression and miRNA expression data. Our model is able to discover known mechanisms of glioblastoma and provide interesting predictions.
Collapse
|
6
|
Abstract
Network theory provides an intuitively appealing framework for studying relationships among interconnected brain mechanisms and their relevance to behaviour. As the space of its applications grows, so does the diversity of meanings of the term network model. This diversity can cause confusion, complicate efforts to assess model validity and efficacy, and hamper interdisciplinary collaboration. In this Review, we examine the field of network neuroscience, focusing on organizing principles that can help overcome these challenges. First, we describe the fundamental goals in constructing network models. Second, we review the most common forms of network models, which can be described parsimoniously along the following three primary dimensions: from data representations to first-principles theory; from biophysical realism to functional phenomenology; and from elementary descriptions to coarse-grained approximations. Third, we draw on biology, philosophy and other disciplines to establish validation principles for these models. We close with a discussion of opportunities to bridge model types and point to exciting frontiers for future pursuits.
Collapse
Affiliation(s)
- Danielle S Bassett
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Physics and Astronomy, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Neurology, University of Pennsylvania, Philadelphia, PA, USA.
| | - Perry Zurn
- Department of Philosophy, American University, Washington, DC, USA
| | - Joshua I Gold
- Department of Neuroscience, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
7
|
Schlauch D, Glass K, Hersh CP, Silverman EK, Quackenbush J. Estimating drivers of cell state transitions using gene regulatory network models. BMC SYSTEMS BIOLOGY 2017; 11:139. [PMID: 29237467 PMCID: PMC5729420 DOI: 10.1186/s12918-017-0517-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 11/21/2017] [Indexed: 12/12/2022]
Abstract
Background Specific cellular states are often associated with distinct gene expression patterns. These states are plastic, changing during development, or in the transition from health to disease. One relatively simple extension of this concept is to recognize that we can classify different cell-types by their active gene regulatory networks and that, consequently, transitions between cellular states can be modeled by changes in these underlying regulatory networks. Results Here we describe MONSTER, MOdeling Network State Transitions from Expression and Regulatory data, a regression-based method for inferring transcription factor drivers of cell state conditions at the gene regulatory network level. As a demonstration, we apply MONSTER to four different studies of chronic obstructive pulmonary disease to identify transcription factors that alter the network structure as the cell state progresses toward the disease-state. Conclusions We demonstrate that MONSTER can find strong regulatory signals that persist across studies and tissues of the same disease and that are not detectable using conventional analysis methods based on differential expression. An R package implementing MONSTER is available at github.com/QuackenbushLab/MONSTER. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0517-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Daniel Schlauch
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, 02115, MA, USA.,Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, 02115, MA, USA
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, 02115, MA, USA.,Department of Medicine, Harvard Medical School, Boston, 02115, MA, USA
| | - Craig P Hersh
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, 02115, MA, USA.,Department of Medicine, Harvard Medical School, Boston, 02115, MA, USA.,Pulmonary and Critical Care Division, Brigham and Women's Hospital, Boston, 02115, MA, USA
| | - Edwin K Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, 02115, MA, USA.,Department of Medicine, Harvard Medical School, Boston, 02115, MA, USA.,Pulmonary and Critical Care Division, Brigham and Women's Hospital, Boston, 02115, MA, USA
| | - John Quackenbush
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, 02115, MA, USA. .,Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, 02115, MA, USA.
| |
Collapse
|