1
|
Zhang J, Hu C, Zhang Q. Gene regulatory network inference based on a nonhomogeneous dynamic Bayesian network model with an improved Markov Monte Carlo sampling. BMC Bioinformatics 2023; 24:264. [PMID: 37355560 DOI: 10.1186/s12859-023-05381-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Accepted: 06/07/2023] [Indexed: 06/26/2023] Open
Abstract
A nonhomogeneous dynamic Bayesian network model, which combines the dynamic Bayesian network and the multi-change point process, solves the limitations of the dynamic Bayesian network in modeling non-stationary gene expression data to a certain extent. However, certain problems persist, such as the low network reconstruction accuracy and poor model convergence. Therefore, we propose an MD-birth move based on the Manhattan distance of the data points to increase the rationality of the multi-change point process. The underlying concept of the MD-birth move is that the direction of movement of the change point is assumed to have a larger Manhattan distance between the variance and the mean of its left and right data points. Considering the data instability characteristics, we propose a Markov chain Monte Carlo sampling method based on node-dependent particle filtering in addition to the multi-change point process. The candidate parent nodes to be sampled, which are close to the real state, are pushed to the high probability area through the particle filter, and the candidate parent node set to be sampled that is far from the real state is pushed to the low probability area and then sampled. In terms of reconstructing the gene regulatory network, the model proposed in this paper (FC-DBN) has better network reconstruction accuracy and model convergence speed than other corresponding models on the Saccharomyces cerevisiae data and RAF data.
Collapse
Affiliation(s)
- Jiayao Zhang
- College of Artificial Intelligence and Big Data, Hefei University, Hefei, 230031, China
| | - Chunling Hu
- College of Artificial Intelligence and Big Data, Hefei University, Hefei, 230031, China.
| | - Qianqian Zhang
- College of Artificial Intelligence and Big Data, Hefei University, Hefei, 230031, China
| |
Collapse
|
2
|
Song Q, Ruffalo M, Bar-Joseph Z. Using single cell atlas data to reconstruct regulatory networks. Nucleic Acids Res 2023; 51:e38. [PMID: 36762475 PMCID: PMC10123116 DOI: 10.1093/nar/gkad053] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Revised: 12/16/2022] [Accepted: 01/19/2023] [Indexed: 02/11/2023] Open
Abstract
Inference of global gene regulatory networks from omics data is a long-term goal of systems biology. Most methods developed for inferring transcription factor (TF)-gene interactions either relied on a small dataset or used snapshot data which is not suitable for inferring a process that is inherently temporal. Here, we developed a new computational method that combines neural networks and multi-task learning to predict RNA velocity rather than gene expression values. This allows our method to overcome many of the problems faced by prior methods leading to more accurate and more comprehensive set of identified regulatory interactions. Application of our method to atlas scale single cell data from 6 HuBMAP tissues led to several validated and novel predictions and greatly improved on prior methods proposed for this task.
Collapse
Affiliation(s)
- Qi Song
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Matthew Ruffalo
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Ziv Bar-Joseph
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
3
|
Imani M, Ghoreishi SF. Graph-Based Bayesian Optimization for Large-Scale Objective-Based Experimental Design. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5913-5925. [PMID: 33877989 DOI: 10.1109/tnnls.2021.3071958] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Design is an inseparable part of most scientific and engineering tasks, including real and simulation-based experimental design processes and parameter/hyperparameter tuning/optimization. Several model-based experimental design techniques have been developed for design in domains with partial available knowledge about the underlying process. This article focuses on a powerful class of model-based experimental design called the mean objective cost of uncertainty (MOCU). The MOCU-based techniques are objective-based, meaning that they take the main objective of the process into account during the experimental design process. However, the lack of scalability of MOCU-based techniques prevents their application to most practical problems, including large discrete or combinatorial spaces. To achieve a scalable objective-based experimental design, this article proposes a graph-based MOCU-based Bayesian optimization framework. The correlations among samples in the large design space are accounted for using a graph-based Gaussian process, and an efficient closed-form sequential selection is achieved through the well-known expected improvement policy. The proposed framework's performance is assessed through the structural intervention in gene regulatory networks, aiming to make the network away from the states associated with cancer.
Collapse
|
4
|
Ajmal HB, Madden MG. Dynamic Bayesian Network Learning to Infer Sparse Models From Time Series Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2794-2805. [PMID: 34181549 DOI: 10.1109/tcbb.2021.3092879] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
One of the key challenges in systems biology is to derive gene regulatory networks (GRNs) from complex high-dimensional sparse data. Bayesian networks (BNs) and dynamic Bayesian networks (DBNs) have been widely applied to infer GRNs from gene expression data. GRNs are typically sparse but traditional approaches of BN structure learning to elucidate GRNs often produce many spurious (false positive) edges. We present two new BN scoring functions, which are extensions to the Bayesian Information Criterion (BIC) score, with additional penalty terms and use them in conjunction with DBN structure search methods to find a graph structure that maximises the proposed scores. Our BN scoring functions offer better solutions for inferring networks with fewer spurious edges compared to the BIC score. The proposed methods are evaluated extensively on auto regressive and DREAM4 benchmarks. We found that they significantly improve the precision of the learned graphs, relative to the BIC score. The proposed methods are also evaluated on three real time series gene expression datasets. The results demonstrate that our algorithms are able to learn sparse graphs from high-dimensional time series data. The implementation of these algorithms is open source and is available in form of an R package on GitHub at https://github.com/HamdaBinteAjmal/DBN4GRN, along with the documentation and tutorials.
Collapse
|
5
|
Dirmeier S, Beerenwinkel N. Structured hierarchical models for probabilistic inference from perturbation screening data. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Simon Dirmeier
- Department of Biosystems Science and Engineering, ETH Zurich
| | | |
Collapse
|
6
|
Computational Phosphorylation Network Reconstruction: An Update on Methods and Resources. Methods Mol Biol 2021. [PMID: 34270057 DOI: 10.1007/978-1-0716-1625-3_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
Most proteins undergo some form of modification after translation, and phosphorylation is one of the most relevant and ubiquitous post-translational modifications. The succession of protein phosphorylation and dephosphorylation catalyzed by protein kinase and phosphatase, respectively, constitutes a key mechanism of molecular information flow in cellular systems. The protein interactions of kinases, phosphatases, and their regulatory subunits and substrates are the main part of phosphorylation networks. To elucidate the landscape of phosphorylation events has been a central goal pursued by both experimental and computational approaches. Substrate specificity (e.g., sequence, structure) or the phosphoproteome has been utilized in an array of different statistical learning methods to infer phosphorylation networks. In this chapter, different computational phosphorylation network inference-related methods and resources are summarized and discussed.
Collapse
|
7
|
Mahmoodi SH, Aghdam R, Eslahchi C. An order independent algorithm for inferring gene regulatory network using quantile value for conditional independence tests. Sci Rep 2021; 11:7605. [PMID: 33828122 PMCID: PMC8027014 DOI: 10.1038/s41598-021-87074-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 03/24/2021] [Indexed: 10/31/2022] Open
Abstract
In recent years, due to the difficulty and inefficiency of experimental methods, numerous computational methods have been introduced for inferring the structure of Gene Regulatory Networks (GRNs). The Path Consistency (PC) algorithm is one of the popular methods to infer the structure of GRNs. However, this group of methods still has limitations and there is a potential for improvements in this field. For example, the PC-based algorithms are still sensitive to the ordering of nodes i.e. different node orders results in different network structures. The second is that the networks inferred by these methods are highly dependent on the threshold used for independence testing. Also, it is still a challenge to select the set of conditional genes in an optimal way, which affects the performance and computation complexity of the PC-based algorithm. We introduce a novel algorithm, namely Order Independent PC-based algorithm using Quantile value (OIPCQ), which improves the accuracy of the learning process of GRNs and solves the order dependency issue. The quantile-based thresholds are considered for different orders of CMI tests. For conditional gene selection, we consider the paths between genes with length equal or greater than 2 while other well-known PC-based methods only consider the paths of length 2. We applied OIPCQ on the various networks of the DREAM3 and DREAM4 in silico challenges. As a real-world case study, we used OIPCQ to reconstruct SOS DNA network obtained from Escherichia coli and GRN for acute myeloid leukemia based on the RNA sequencing data from The Cancer Genome Atlas. The results show that OIPCQ produces the same network structure for all the permutations of the genes and improves the resulted GRN through accurately quantifying the causal regulation strength in comparison with other well-known PC-based methods. According to the GRN constructed by OIPCQ, for acute myeloid leukemia, two regulators BCLAF1 and NRSF reported previously are significantly important. However, the highest degree nodes in this GRN are ZBTB7A and PU1 which play a significant role in cancer, especially in leukemia. OIPCQ is freely accessible at https://github.com/haammim/OIPCQ-and-OIPCQ2 .
Collapse
Affiliation(s)
- Sayyed Hadi Mahmoodi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| | - Rosa Aghdam
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran. .,School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran. .,School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| |
Collapse
|
8
|
Ajmal HB, Madden MG. Inferring dynamic gene regulatory networks with low-order conditional independencies – an evaluation of the method. Stat Appl Genet Mol Biol 2020. [DOI: 10.1515/sagmb-2020-0051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractOver a decade ago, Lèbre (2009) proposed an inference method, G1DBN, to learn the structure of gene regulatory networks (GRNs) from high dimensional, sparse time-series gene expression data. Their approach is based on concept of low-order conditional independence graphs that they extend to dynamic Bayesian networks (DBNs). They present results to demonstrate that their method yields better structural accuracy compared to the related Lasso and Shrinkage methods, particularly where the data is sparse, that is, the number of time measurements n is much smaller than the number of genes p. This paper challenges these claims using a careful experimental analysis, to show that the GRNs reverse engineered from time-series data using the G1DBN approach are less accurate than claimed by Lèbre (2009). We also show that the Lasso method yields higher structural accuracy for graphs learned from the simulated data, compared to the G1DBN method, particularly when the data is sparse ($n{< }{< }p$). The Lasso method is also better than G1DBN at identifying the transcription factors (TFs) involved in the cell cycle of Saccharomyces cerevisiae.
Collapse
Affiliation(s)
- Hamda B. Ajmal
- School of Computer Science, National University of Ireland, Galway, Ireland
| | - Michael G. Madden
- School of Computer Science, National University of Ireland, Galway, Ireland
| |
Collapse
|
9
|
Nasrin S, Drobitch J, Shukla P, Tulabandhula T, Bandyopadhyay S, Trivedi AR. Bayesian reasoning machine on a magneto-tunneling junction network. NANOTECHNOLOGY 2020; 31:484001. [PMID: 32936787 DOI: 10.1088/1361-6528/abae97] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The recent trend in adapting ultra-energy-efficient (but error-prone) nanomagnetic devices to non-Boolean computing and information processing (e.g. stochastic/probabilistic computing, neuromorphic, belief networks, etc) has resulted in rapid strides in new computing modalities. Of particular interest are Bayesian networks (BN) which may see revolutionary advances when adapted to a specific type of nanomagnetic devices. Here, we develop a novel nanomagnet-based computing substrate for BN that allows high-speed sampling from an arbitrary Bayesian graph. We show that magneto-tunneling junctions (MTJs) can be used for electrically programmable 'sub-nanosecond' probability sample generation by co-optimizing voltage-controlled magnetic anisotropy and spin transfer torque. We also discuss that just by engineering local magnetostriction in the soft layers of MTJs, one can stochastically couple them for programmable conditional sample generation as well. This obviates the need for extensive energy-inefficient hardware like OP-AMPS, gates, shift-registers, etc to generate the correlations. Based on the above findings, we present an architectural design and computation flow of the MTJ network to map an arbitrary Bayesian graph where we develop circuits to program and induce switching and interactions among MTJs. Our discussed framework can lead to a new generation of stochastic computing hardware for various other computing models, such as stochastic programming and Bayesian deep learning. This can spawn a novel genre of ultra-energy-efficient, extremely powerful computing paradigms, which is a transformational advance.
Collapse
|
10
|
Liu W, Sun X, Peng L, Zhou L, Lin H, Jiang Y. RWRNET: A Gene Regulatory Network Inference Algorithm Using Random Walk With Restart. Front Genet 2020; 11:591461. [PMID: 33101398 PMCID: PMC7545090 DOI: 10.3389/fgene.2020.591461] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 09/02/2020] [Indexed: 11/30/2022] Open
Abstract
Inferring gene regulatory networks from expression data is essential in identifying complex regulatory relationships among genes and revealing the mechanism of certain diseases. Various computation methods have been developed for inferring gene regulatory networks. However, these methods focus on the local topology of the network rather than on the global topology. From network optimisation standpoint, emphasising the global topology of the network also reduces redundant regulatory relationships. In this study, we propose a novel network inference algorithm using Random Walk with Restart (RWRNET) that combines local and global topology relationships. The method first captures the local topology through three elements of random walk and then combines the local topology with the global topology by Random Walk with Restart. The Markov Blanket discovery algorithm is then used to deal with isolated genes. The proposed method is compared with several state-of-the-art methods on the basis of six benchmark datasets. Experimental results demonstrated the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Wei Liu
- School of Computer Science, Xiangtan University, Xiangtan, China.,Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, China
| | - Xingen Sun
- School of Computer Science, Xiangtan University, Xiangtan, China
| | - Li Peng
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China
| | - Lili Zhou
- School of Computer Science, Xiangtan University, Xiangtan, China
| | - Hui Lin
- School of Computer Science, Xiangtan University, Xiangtan, China
| | - Yi Jiang
- School of Computer Science, Xiangtan University, Xiangtan, China
| |
Collapse
|
11
|
Che D, Guo S, Jiang Q, Chen L. PFBNet: a priori-fused boosting method for gene regulatory network inference. BMC Bioinformatics 2020; 21:308. [PMID: 32664870 PMCID: PMC7362553 DOI: 10.1186/s12859-020-03639-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Accepted: 07/02/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Inferring gene regulatory networks (GRNs) from gene expression data remains a challenge in system biology. In past decade, numerous methods have been developed for the inference of GRNs. It remains a challenge due to the fact that the data is noisy and high dimensional, and there exists a large number of potential interactions. RESULTS We present a novel method, namely priori-fused boosting network inference method (PFBNet), to infer GRNs from time-series expression data by using the non-linear model of Boosting and the prior information (e.g., the knockout data) fusion scheme. Specifically, PFBNet first calculates the confidences of the regulation relationships using the boosting-based model, where the information about the accumulation impact of the gene expressions at previous time points is taken into account. Then, a newly defined strategy is applied to fuse the information from the prior data by elevating the confidences of the regulation relationships from the corresponding regulators. CONCLUSIONS The experiments on the benchmark datasets from DREAM challenge as well as the E.coli datasets show that PFBNet achieves significantly better performance than other state-of-the-art methods (Jump3, GEINE3-lag, HiDi, iRafNet and BiXGBoost).
Collapse
Affiliation(s)
- Dandan Che
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Shun Guo
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Qingshan Jiang
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Lifei Chen
- School of Mathematics and Computer Science, Fujian Normal University, Fujian, 350117 China
| |
Collapse
|
12
|
Bakhteh S, Ghaffari-Hadigheh A, Chaparzadeh N. Identification of Minimum Set of Master Regulatory Genes in Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:999-1009. [PMID: 30334767 DOI: 10.1109/tcbb.2018.2875692] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Identification of master regulatory genes is one of the primary challenges in systems biology. The minimum dominating set problem is a powerful paradigm in analyzing such complex networks. In these models, genes stand as nodes and their interactions are assumed as edges. Here, members of a minimal dominating set could be regarded as master genes. As finitely many minimum dominating sets may exist in a network, it is difficult to identify which one represents the most appropriate set of master genes. In this paper, we develop a weighted gene regulatory network problem with two objectives as a version of the dominating set problem. Collective influence of each gene is considered as its weight. The first objective aims to find a master regulatory genes set with minimum cardinality, and the second objective identifies the one with maximum weight. The model is converted to a single objective using a parameter varying between zero and one. The model is implemented on three human networks, and the results are reported and compared with the existing model of weighted network. Parametric programming in linear optimization and logistic regression are also implemented on the arisen relaxed problem to provide a deeper understanding of the results. Learned from computational results in parametric analysis, for some ranges of priorities in objectives, the identified master regulatory genes are invariant, while some of them are identified for all priorities. This would be an indication that such genes have higher degree of being master regulatory ones, specially on the noisy networks.
Collapse
|
13
|
Fan A, Wang H, Xiang H, Zou X. Inferring Large-Scale Gene Regulatory Networks Using a Randomized Algorithm Based on Singular Value Decomposition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1997-2008. [PMID: 29993839 DOI: 10.1109/tcbb.2018.2825446] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Reconstructing large-scale gene regulatory networks (GRNs) is a challenging problem in the field of computational biology. Various methods for inferring GRNs have been developed, but they fail to accurately infer GRNs with a large number of genes. Additionally, the existing evaluation indexes for evaluating the constructed networks have obvious disadvantages because GRNs in most biological systems are sparse. In this paper, we develop a new method for inferring GRNs based on randomized singular value decomposition (RSVD) and ordinary differential equation (ODE)-based optimization, denoted as IGRSVD, from large-scale time series data with noise. The three major contributions of this paper are as follows. First, the IGRSVD algorithm uses the RSVD to handle the noise and reduce the original large-scale data into small-scale problems. Second, we propose two new evaluated indexes, the expected value accuracy (EVA) and the expected value error (EVE), to evaluate the performance of inferred networks by considering the sparse features in the network. Finally, the proposed IGRSVD algorithm is compared with the existing SVD algorithm and PCA_CMI algorithm using four subsets from E. coli and datasets from DREAM challenge. The experimental results demonstrate that the IGRSVD algorithm is effective and more suitable for reconstructing large-scale networks.
Collapse
|
14
|
Muldoon JJ, Yu JS, Fassia MK, Bagheri N. Network inference performance complexity: a consequence of topological, experimental and algorithmic determinants. Bioinformatics 2019; 35:3421-3432. [PMID: 30932143 PMCID: PMC6748731 DOI: 10.1093/bioinformatics/btz105] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 01/24/2019] [Accepted: 02/11/2019] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Network inference algorithms aim to uncover key regulatory interactions governing cellular decision-making, disease progression and therapeutic interventions. Having an accurate blueprint of this regulation is essential for understanding and controlling cell behavior. However, the utility and impact of these approaches are limited because the ways in which various factors shape inference outcomes remain largely unknown. RESULTS We identify and systematically evaluate determinants of performance-including network properties, experimental design choices and data processing-by developing new metrics that quantify confidence across algorithms in comparable terms. We conducted a multifactorial analysis that demonstrates how stimulus target, regulatory kinetics, induction and resolution dynamics, and noise differentially impact widely used algorithms in significant and previously unrecognized ways. The results show how even if high-quality data are paired with high-performing algorithms, inferred models are sometimes susceptible to giving misleading conclusions. Lastly, we validate these findings and the utility of the confidence metrics using realistic in silico gene regulatory networks. This new characterization approach provides a way to more rigorously interpret how algorithms infer regulation from biological datasets. AVAILABILITY AND IMPLEMENTATION Code is available at http://github.com/bagherilab/networkinference/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joseph J Muldoon
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, IL, USA
| | - Jessica S Yu
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
| | - Mohammad-Kasim Fassia
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Department of Biomedical Engineering, Northwestern University, Evanston, IL, USA
| | - Neda Bagheri
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, IL, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, USA
- Chemistry of Life Processes Institute, Northwestern University, Evanston, IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
| |
Collapse
|
15
|
Kizhakkethil Youseph AS, Chetty M, Karmakar G. Reverse engineering genetic networks using nonlinear saturation kinetics. Biosystems 2019; 182:30-41. [PMID: 31185246 DOI: 10.1016/j.biosystems.2019.103977] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 04/25/2019] [Accepted: 05/27/2019] [Indexed: 01/01/2023]
Abstract
A gene regulatory network (GRN) represents a set of genes along with their regulatory interactions. Cellular behavior is driven by genetic level interactions. Dynamics of such systems show nonlinear saturation kinetics which can be best modeled by Michaelis-Menten (MM) and Hill equations. Although MM equation is being widely used for modeling biochemical processes, it has been applied rarely for reverse engineering GRNs. In this paper, we develop a complete framework for a novel model for GRN inference using MM kinetics. A set of coupled equations is first proposed for modeling GRNs. In the coupled model, Michaelis-Menten constant associated with regulation by a gene is made invariant irrespective of the gene being regulated. The parameter estimation of the proposed model is carried out using an evolutionary optimization method, namely, trigonometric differential evolution (TDE). Subsequently, the model is further improved and the regulations of different genes by a given gene are made distinct by allowing varying values of Michaelis-Menten constants for each regulation. Apart from making the model more relevant biologically, the improvement results in a decoupled GRN model with fast estimation of model parameters. Further, to enhance exploitation of the search, we propose a local search algorithm based on hill climbing heuristics. A novel mutation operation is also proposed to avoid population stagnation and premature convergence. Real life benchmark data sets generated in vivo are used for validating the proposed model. Further, we also analyze realistic in silico datasets generated using GeneNetweaver. The comparison of the performance of proposed model with other existing methods shows the potential of the proposed model.
Collapse
Affiliation(s)
| | - Madhu Chetty
- School of Science, Engineering and Information Technology, Federation University Australia, Gippsland 3842, Australia
| | - Gour Karmakar
- School of Science, Engineering and Information Technology, Federation University Australia, Gippsland 3842, Australia
| |
Collapse
|
16
|
Maeda K. Gene grouping strategy for network modeling from a small time-series dataset: An illustrative analysis of human organogenesis. Biosystems 2019; 179:24-29. [PMID: 30797967 DOI: 10.1016/j.biosystems.2019.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 02/19/2019] [Accepted: 02/20/2019] [Indexed: 10/27/2022]
Abstract
Several algorithms have been proposed for modeling a gene regulatory network from a time-series expression dataset, but these have been used in relatively few studies because experimental cost often restricts the number of sampling time points to less than that of genes by more than one order of magnitude. In order to reduce the number of parameters for network modeling, we propose a method for grouping genes by both temporal expression pattern and biological function, modeling interactions between the gene groups by a dynamic Bayesian network approach. Results from applying the method to a gene expression dataset on human organogenesis demonstrate that more biologically plausible results can be obtained by modeling an interaction network for groups of genes than by modeling that for single genes.
Collapse
Affiliation(s)
- Kiyohiro Maeda
- Imaging Technology Center, Fujifilm Corporation, 798 Miyanodai, Kaisei-machi, Ashigarakami-gun, Kanagawa, 258-8538, Japan.
| |
Collapse
|
17
|
Wu L, Qiu X, Yuan YX, Wu H. Parameter Estimation and Variable Selection for Big Systems of Linear Ordinary Differential Equations: A Matrix-Based Approach. J Am Stat Assoc 2019; 114:657-667. [PMID: 34385718 DOI: 10.1080/01621459.2017.1423074] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Ordinary differential equations (ODEs) are widely used to model the dynamic behavior of a complex system. Parameter estimation and variable selection for a "Big System" with linear ODEs are very challenging due to the need of nonlinear optimization in an ultra-high dimensional parameter space. In this article, we develop a parameter estimation and variable selection method based on the ideas of similarity transformation and separable least squares (SLS). Simulation studies demonstrate that the proposed matrix-based SLS method could be used to estimate the coefficient matrix more accurately and perform variable selection for a linear ODE system with thousands of dimensions and millions of parameters much better than the direct least squares (LS) method and the vector-based two-stage method that are currently available. We applied this new method to two real data sets: a yeast cell cycle gene expression data set with 30 dimensions and 930 unknown parameters and the Standard & Poor 1500 index stock price data with 1250 dimensions and 1,563,750 unknown parameters, to illustrate the utility and numerical performance of the proposed parameter estimation and variable selection method for big systems in practice.
Collapse
Affiliation(s)
- Leqin Wu
- Department of Mathematics, Jinan University, Guangzhou, China
| | - Xing Qiu
- Department of Biostatistics and Computational Biology University of Rochester, Rochester, New York, U.S.A
| | - Ya-Xiang Yuan
- Academy of Mathematics and System Sciences Chinese Academy of Sciences, Beijing, China
| | - Hulin Wu
- Department of Biostatistics, University of Texas Health Science Center at Houston, Houston, TX, U.S.A
| |
Collapse
|
18
|
Wang Z, Gudibanda A, Ugwuowo U, Trail F, Townsend JP. Using evolutionary genomics, transcriptomics, and systems biology to reveal gene networks underlying fungal development. FUNGAL BIOL REV 2018. [DOI: 10.1016/j.fbr.2018.02.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
19
|
Xing L, Guo M, Liu X, Wang C, Zhang L. Gene Regulatory Networks Reconstruction Using the Flooding-Pruning Hill-Climbing Algorithm. Genes (Basel) 2018; 9:E342. [PMID: 29986472 PMCID: PMC6071145 DOI: 10.3390/genes9070342] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 06/28/2018] [Accepted: 07/02/2018] [Indexed: 11/17/2022] Open
Abstract
The explosion of genomic data provides new opportunities to improve the task of gene regulatory network reconstruction. Because of its inherent probability character, the Bayesian network is one of the most promising methods. However, excessive computation time and the requirements of a large number of biological samples reduce its effectiveness and application to gene regulatory network reconstruction. In this paper, Flooding-Pruning Hill-Climbing algorithm (FPHC) is proposed as a novel hybrid method based on Bayesian networks for gene regulatory networks reconstruction. On the basis of our previous work, we propose the concept of DPI Level based on data processing inequality (DPI) to better identify neighbors of each gene on the lack of enough biological samples. Then, we use the search-and-score approach to learn the final network structure in the restricted search space. We first analyze and validate the effectiveness of FPHC in theory. Then, extensive comparison experiments are carried out on known Bayesian networks and biological networks from the DREAM (Dialogue on Reverse Engineering Assessment and Methods) challenge. The results show that the FPHC algorithm, under recommended parameters, outperforms, on average, the original hill climbing and Max-Min Hill-Climbing (MMHC) methods with respect to the network structure and running time. In addition, our results show that FPHC is more suitable for gene regulatory network reconstruction with limited data.
Collapse
Affiliation(s)
- Linlin Xing
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
| | - Maozu Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China.
- Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing 100044, China.
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
| | - Lei Zhang
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China.
| |
Collapse
|
20
|
Nedungadi P, Iyer A, Gutjahr G, Bhaskar J, Pillai AB. Data-Driven Methods for Advancing Precision Oncology. CURRENT PHARMACOLOGY REPORTS 2018; 4:145-156. [PMID: 33520605 PMCID: PMC7845924 DOI: 10.1007/s40495-018-0127-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
PURPOSE OF REVIEW This article discusses the advances, methods, challenges, and future directions of data-driven methods in advancing precision oncology for biomedical research, drug discovery, clinical research, and practice. RECENT FINDINGS Precision oncology provides individually tailored cancer treatment by considering an individual's genetic makeup, clinical, environmental, social, and lifestyle information. Challenges include voluminous, heterogeneous, and disparate data generated by different technologies with multiple modalities such as Omics, electronic health records, clinical registries and repositories, medical imaging, demographics, wearables, and sensors. Statistical and machine learning methods have been continuously adapting to the ever-increasing size and complexity of data. Precision Oncology supportive analytics have improved turnaround time in biomarker discovery and time-to-application of new and repurposed drugs. Precision oncology additionally seeks to identify target patient populations based on genomic alterations that are sensitive or resistant to conventional or experimental treatments. Predictive models have been developed for cancer progression and survivorship, drug sensitivity and resistance, and identification of the most suitable combination treatments for individual patient scenarios. In the future, clinical decision support systems need to be revamped to better incorporate knowledge from precision oncology, thus enabling clinical practitioners to provide precision cancer care. SUMMARY Open Omics datasets, machine learning algorithms, and predictive models have enabled the advancement of precision oncology. Clinical decision support systems with integrated electronic health record and Omics data are needed to provide data-driven recommendations to assist clinicians in disease prevention, early identification, and individualized treatment. Additionally, as cancer is a constantly evolving disorder, clinical decision systems will need to be continually updated based on more recent knowledge and datasets.
Collapse
Affiliation(s)
- Prema Nedungadi
- Center for Research in Analytics & Technology in Education, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
- Department of Computer Science, School of Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
| | - Akshay Iyer
- Center for Research in Analytics & Technology in Education, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
| | - Georg Gutjahr
- Center for Research in Analytics & Technology in Education, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
| | - Jasmine Bhaskar
- Center for Research in Analytics & Technology in Education, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
- Department of Computer Science, School of Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
| | - Asha B. Pillai
- Division of Pediatric Hematology/Oncology, Departments of Pediatrics and Microbiology and Immunology, University of Miami Miller School of Medicine, Miami, FL, USA
| |
Collapse
|
21
|
Improving Gene Regulatory Network Inference by Incorporating Rates of Transcriptional Changes. Sci Rep 2017; 7:17244. [PMID: 29222512 PMCID: PMC5722905 DOI: 10.1038/s41598-017-17143-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 11/22/2017] [Indexed: 11/18/2022] Open
Abstract
Organisms respond to changes in their environment through transcriptional regulatory networks (TRNs). The regulatory hierarchy of these networks can be inferred from expression data. Computational approaches to identify TRNs can be applied in any species where quality RNA can be acquired, However, ChIP-Seq and similar validation methods are challenging to employ in non-model species. Improving the accuracy of computational inference methods can significantly reduce the cost and time of subsequent validation experiments. We have developed ExRANGES, an approach that improves the ability to computationally infer TRN from time series expression data. ExRANGES utilizes both the rate of change in expression and the absolute expression level to identify TRN connections. We evaluated ExRANGES in five data sets from different model systems. ExRANGES improved the identification of experimentally validated transcription factor targets for all species tested, even in unevenly spaced and sparse data sets. This improved ability to predict known regulator-target relationships enhances the utility of network inference approaches in non-model species where experimental validation is challenging. We integrated ExRANGES with two different network construction approaches and it has been implemented as an R package available here: http://github.com/DohertyLab/ExRANGES. To install the package type: devtools::install_github(“DohertyLab/ExRANGES”).
Collapse
|
22
|
Inferring Gene Regulatory Networks Based on a Hybrid Parallel Genetic Algorithm and the Threshold Restriction Method. Interdiscip Sci 2017; 10:221-232. [DOI: 10.1007/s12539-017-0269-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Revised: 09/24/2017] [Accepted: 10/19/2017] [Indexed: 12/14/2022]
|
23
|
Xing L, Guo M, Liu X, Wang C, Wang L, Zhang Y. An improved Bayesian network method for reconstructing gene regulatory network based on candidate auto selection. BMC Genomics 2017; 18:844. [PMID: 29219084 PMCID: PMC5773867 DOI: 10.1186/s12864-017-4228-y] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The reconstruction of gene regulatory network (GRN) from gene expression data can discover regulatory relationships among genes and gain deep insights into the complicated regulation mechanism of life. However, it is still a great challenge in systems biology and bioinformatics. During the past years, numerous computational approaches have been developed for this goal, and Bayesian network (BN) methods draw most of attention among these methods because of its inherent probability characteristics. However, Bayesian network methods are time consuming and cannot handle large-scale networks due to their high computational complexity, while the mutual information-based methods are highly effective but directionless and have a high false-positive rate. RESULTS To solve these problems, we propose a Candidate Auto Selection algorithm (CAS) based on mutual information and breakpoint detection to restrict the search space in order to accelerate the learning process of Bayesian network. First, the proposed CAS algorithm automatically selects the neighbor candidates of each node before searching the best structure of GRN. Then based on CAS algorithm, we propose a globally optimal greedy search method (CAS + G), which focuses on finding the highest rated network structure, and a local learning method (CAS + L), which focuses on faster learning the structure with little loss of quality. CONCLUSION Results show that the proposed CAS algorithm can effectively reduce the search space of Bayesian networks through identifying the neighbor candidates of each node. In our experiments, the CAS + G method outperforms the state-of-the-art method on simulation data for inferring GRNs, and the CAS + L method is significantly faster than the state-of-the-art method with little loss of accuracy. Hence, the CAS based methods effectively decrease the computational complexity of Bayesian network and are more suitable for GRN inference.
Collapse
Affiliation(s)
- Linlin Xing
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Lei Wang
- Institute of Health Service and Medical Information, Academy of Military Medical Sciences, Beijing, China
| | - Yin Zhang
- Institute of Health Service and Medical Information, Academy of Military Medical Sciences, Beijing, China
| |
Collapse
|
24
|
Yu B, Xu JM, Li S, Chen C, Chen RX, Wang L, Zhang Y, Wang MH. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method. Oncotarget 2017; 8:80373-80392. [PMID: 29113310 PMCID: PMC5655205 DOI: 10.18632/oncotarget.21268] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 08/27/2017] [Indexed: 01/31/2023] Open
Abstract
Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli, and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.
Collapse
Affiliation(s)
- Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- CAS Key Laboratory of Geospace Environment, Department of Geophysics and Planetary Science, University of Science and Technology of China, Hefei 230026, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Jia-Meng Xu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Shan Li
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Cheng Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Rui-Xin Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Lei Wang
- Key Laboratory of Eco-chemical Engineering, Ministry of Education, Laboratory of Inorganic Synthesis and Applied Chemistry, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao 266042, China
| | - Yan Zhang
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
- College of Electromechanical Engineering, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Ming-Hui Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| |
Collapse
|
25
|
Ko S, Lim H, Kim DW. Reverse engineering for causal discovery based on monotonic characteristic of causal structure. Pattern Recognit Lett 2017. [DOI: 10.1016/j.patrec.2017.06.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
26
|
Khanteymoori AR, Olyaee MH, Abbaszadeh O, Valian M. A novel method for Bayesian networks structure learning based on Breeding Swarm algorithm. Soft comput 2017. [DOI: 10.1007/s00500-017-2557-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
27
|
Liu W, Zhu W, Liao B, Chen H, Ren S, Cai L. Improving gene regulatory network structure using redundancy reduction in the MRNET algorithm. RSC Adv 2017. [DOI: 10.1039/c7ra01557g] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Inferring gene regulatory networks from expression data is a central problem in systems biology.
Collapse
Affiliation(s)
- Wei Liu
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Wen Zhu
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Bo Liao
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Haowen Chen
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Siqi Ren
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Lijun Cai
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| |
Collapse
|
28
|
Young WC, Raftery AE, Yeung KY. A posterior probability approach for gene regulatory network inference in genetic perturbation data. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2016; 13:1241-1251. [PMID: 27775378 DOI: 10.3934/mbe.2016041] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Inferring gene regulatory networks is an important problem in systems biology. However, these networks can be hard to infer from experimental data because of the inherent variability in biological data as well as the large number of genes involved. We propose a fast, simple method for inferring regulatory relationships between genes from knockdown experiments in the NIH LINCS dataset by calculating posterior probabilities, incorporating prior information. We show that the method is able to find previously identified edges from TRANSFAC and JASPAR and discuss the merits and limitations of this approach.
Collapse
Affiliation(s)
- William Chad Young
- University of Washington, Department of Statistics, Box 354322, Seattle, WA 98195-4322, United States.
| | | | | |
Collapse
|
29
|
BGRMI: A method for inferring gene regulatory networks from time-course gene expression data and its application in breast cancer research. Sci Rep 2016; 6:37140. [PMID: 27876826 PMCID: PMC5120305 DOI: 10.1038/srep37140] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Accepted: 10/24/2016] [Indexed: 02/06/2023] Open
Abstract
Reconstructing gene regulatory networks (GRNs) from gene expression data is a challenging problem. Existing GRN reconstruction algorithms can be broadly divided into model-free and model–based methods. Typically, model-free methods have high accuracy but are computation intensive whereas model-based methods are fast but less accurate. We propose Bayesian Gene Regulation Model Inference (BGRMI), a model-based method for inferring GRNs from time-course gene expression data. BGRMI uses a Bayesian framework to calculate the probability of different models of GRNs and a heuristic search strategy to scan the model space efficiently. Using benchmark datasets, we show that BGRMI has higher/comparable accuracy at a fraction of the computational cost of competing algorithms. Additionally, it can incorporate prior knowledge of potential gene regulation mechanisms and TF hetero-dimerization processes in the GRN reconstruction process. We incorporated existing ChIP-seq data and known protein interactions between TFs in BGRMI as sources of prior knowledge to reconstruct transcription regulatory networks of proliferating and differentiating breast cancer (BC) cells from time-course gene expression data. The reconstructed networks revealed key driver genes of proliferation and differentiation in BC cells. Some of these genes were not previously studied in the context of BC, but may have clinical relevance in BC treatment.
Collapse
|
30
|
Li Y, Chen H, Zheng J, Ngom A. The Max-Min High-Order Dynamic Bayesian Network for Learning Gene Regulatory Networks with Time-Delayed Regulations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:792-803. [PMID: 26336144 DOI: 10.1109/tcbb.2015.2474409] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Accurately reconstructing gene regulatory network (GRN) from gene expression data is a challenging task in systems biology. Although some progresses have been made, the performance of GRN reconstruction still has much room for improvement. Because many regulatory events are asynchronous, learning gene interactions with multiple time delays is an effective way to improve the accuracy of GRN reconstruction. Here, we propose a new approach, called Max-Min high-order dynamic Bayesian network (MMHO-DBN) by extending the Max-Min hill-climbing Bayesian network technique originally devised for learning a Bayesian network's structure from static data. Our MMHO-DBN can explicitly model the time lags between regulators and targets in an efficient manner. It first uses constraint-based ideas to limit the space of potential structures, and then applies search-and-score ideas to search for an optimal HO-DBN structure. The performance of MMHO-DBN to GRN reconstruction was evaluated using both synthetic and real gene expression time-series data. Results show that MMHO-DBN is more accurate than current time-delayed GRN learning methods, and has an intermediate computing performance. Furthermore, it is able to learn long time-delayed relationships between genes. We applied sensitivity analysis on our model to study the performance variation along different parameter settings. The result provides hints on the setting of parameters of MMHO-DBN.
Collapse
|
31
|
Effective gene expression data generation framework based on multi-model approach. Artif Intell Med 2016; 70:41-61. [PMID: 27431036 DOI: 10.1016/j.artmed.2016.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Accepted: 05/27/2016] [Indexed: 11/20/2022]
Abstract
OBJECTIVE Overcome the lack of enough samples in gene expression data sets having thousands of genes but a small number of samples challenging the computational methods using them. METHODS AND MATERIAL This paper introduces a multi-model artificial gene expression data generation framework where different gene regulatory network (GRN) models contribute to the final set of samples based on the characteristics of their underlying paradigms. In the first stage, we build different GRN models, and sample data from each of them separately. Then, we pool the generated samples into a rich set of gene expression samples, and finally try to select the best of the generated samples based on a multi-objective selection method measuring the quality of the generated samples from three different aspects such as compatibility, diversity and coverage. We use four alternative GRN models, namely, ordinary differential equations, probabilistic Boolean networks, multi-objective genetic algorithm and hierarchical Markov model. RESULTS We conducted a comprehensive set of experiments based on both real-life biological and synthetic gene expression data sets. We show that our multi-objective sample selection mechanism effectively combines samples from different models having up to 95% compatibility, 10% diversity and 50% coverage. We show that the samples generated by our framework has up to 1.5x higher compatibility, 2x higher diversity and 2x higher coverage than the samples generated by the individual models that the multi-model framework uses. Moreover, the results show that the GRNs inferred from the samples generated by our framework can have 2.4x higher precision, 12x higher recall, and 5.4x higher f-measure values than the GRNs inferred from the original gene expression samples. CONCLUSIONS Therefore, we show that, we can significantly improve the quality of generated gene expression samples by integrating different computational models into one unified framework without dealing with complex internal details of each individual model. Moreover, the rich set of artificial gene expression samples is able to capture some biological relations that can even not be captured by the original gene expression data set.
Collapse
|
32
|
Pikovsky A. Reconstruction of a neural network from a time series of firing rates. Phys Rev E 2016; 93:062313. [PMID: 27415286 DOI: 10.1103/physreve.93.062313] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2016] [Indexed: 06/06/2023]
Abstract
Randomly coupled neural fields demonstrate irregular variation of firing rates, if the coupling is strong enough, as has been shown by Sompolinsky et al. [Phys. Rev. Lett. 61, 259 (1988)]PRLTAO0031-900710.1103/PhysRevLett.61.259. We present a method for reconstruction of the coupling matrix from a time series of irregular firing rates. The approach is based on the particular property of the nonlinearity in the coupling, as the latter is determined by a sigmoidal gain function. We demonstrate that for a large enough data set and a small measurement noise, the method gives an accurate estimation of the coupling matrix and of other parameters of the system, including the gain function.
Collapse
Affiliation(s)
- A Pikovsky
- Institute for Physics and Astronomy, University of Potsdam, Karl-Liebknecht-Strasse 24/25, 14476 Potsdam-Golm, Germany and Department of Control Theory, Nizhni Novgorod State University, Gagarin Avenue 23, 606950 Nizhni Novgorod, Russia
| |
Collapse
|
33
|
Lo LY, Wong ML, Lee KH, Leung KS. High-order dynamic Bayesian Network learning with hidden common causes for causal gene regulatory network. BMC Bioinformatics 2015; 16:395. [PMID: 26608050 PMCID: PMC4659244 DOI: 10.1186/s12859-015-0823-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2015] [Accepted: 11/11/2015] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Inferring gene regulatory network (GRN) has been an important topic in Bioinformatics. Many computational methods infer the GRN from high-throughput expression data. Due to the presence of time delays in the regulatory relationships, High-Order Dynamic Bayesian Network (HO-DBN) is a good model of GRN. However, previous GRN inference methods assume causal sufficiency, i.e. no unobserved common cause. This assumption is convenient but unrealistic, because it is possible that relevant factors have not even been conceived of and therefore un-measured. Therefore an inference method that also handles hidden common cause(s) is highly desirable. Also, previous methods for discovering hidden common causes either do not handle multi-step time delays or restrict that the parents of hidden common causes are not observed genes. RESULTS We have developed a discrete HO-DBN learning algorithm that can infer also hidden common cause(s) from discrete time series expression data, with some assumptions on the conditional distribution, but is less restrictive than previous methods. We assume that each hidden variable has only observed variables as children and parents, with at least two children and possibly no parents. We also make the simplifying assumption that children of hidden variable(s) are not linked to each other. Moreover, our proposed algorithm can also utilize multiple short time series (not necessarily of the same length), as long time series are difficult to obtain. CONCLUSIONS We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. Experiment results show that our proposed algorithm can recover the causal GRNs adequately given the incomplete data. Using the limited real expression data and small subnetworks of the YEASTRACT network, we have also demonstrated the potential of our algorithm on real data, though more time series expression data is needed.
Collapse
Affiliation(s)
- Leung-Yau Lo
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong.
| | - Man-Leung Wong
- Department of Computing and Decision Sciences, Lingnan University, Tuen Mun, Hong Kong.
| | - Kin-Hong Lee
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong.
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong.
| |
Collapse
|
34
|
Isewon I, Oyelade J, Brors B, Adebiyi E. In Silico Gene Regulatory Network of the Maurer's Cleft Pathway in Plasmodium falciparum. Evol Bioinform Online 2015; 11:231-8. [PMID: 26526876 PMCID: PMC4620995 DOI: 10.4137/ebo.s25585] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2015] [Revised: 07/28/2015] [Accepted: 08/03/2015] [Indexed: 11/15/2022] Open
Abstract
The Maurer's clefts (MCs) are very important for the survival of Plasmodium falciparum within an infected cell as they are induced by the parasite itself in the erythrocyte for protein trafficking. The MCs form an interesting part of the parasite's biology as they shed more light on how the parasite remodels the erythrocyte leading to host pathogenesis and death. Here, we predicted and analyzed the genetic regulatory network of genes identified to belong to the MCs using regularized graphical Gaussian model. Our network shows four major activators, their corresponding target genes, and predicted binding sites. One of these master activators is the serine repeat antigen 5 (SERA5), predominantly expressed among the SERA multigene family of P. falciparum, which is one of the blood-stage malaria vaccine candidates. Our results provide more details about functional interactions and the regulation of the genes in the MCs’ pathway of P. falciparum.
Collapse
Affiliation(s)
- Itunuoluwa Isewon
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Jelili Oyelade
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Benedikt Brors
- Department of Applied Bioinformatics, German Cancer Research Centre (DKFZ), Heidelberg, Germany
| | - Ezekiel Adebiyi
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
- Department of Applied Bioinformatics, German Cancer Research Centre (DKFZ), Heidelberg, Germany
| |
Collapse
|
35
|
Time-Delayed Models of Gene Regulatory Networks. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:347273. [PMID: 26576197 PMCID: PMC4632181 DOI: 10.1155/2015/347273] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Revised: 08/31/2015] [Accepted: 09/14/2015] [Indexed: 11/17/2022]
Abstract
We discuss different mathematical models of gene regulatory networks as relevant to the onset and development of cancer. After discussion of alternative modelling approaches, we use a paradigmatic two-gene network to focus on the role played by time delays in the dynamics of gene regulatory networks. We contrast the dynamics of the reduced model arising in the limit of fast mRNA dynamics with that of the full model. The review concludes with the discussion of some open problems.
Collapse
|
36
|
Lo LY, Wong ML, Lee KH, Leung KS. Time Delayed Causal Gene Regulatory Network Inference with Hidden Common Causes. PLoS One 2015; 10:e0138596. [PMID: 26394325 PMCID: PMC4578777 DOI: 10.1371/journal.pone.0138596] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Accepted: 09/01/2015] [Indexed: 01/07/2023] Open
Abstract
Inferring the gene regulatory network (GRN) is crucial to understanding the working of the cell. Many computational methods attempt to infer the GRN from time series expression data, instead of through expensive and time-consuming experiments. However, existing methods make the convenient but unrealistic assumption of causal sufficiency, i.e. all the relevant factors in the causal network have been observed and there are no unobserved common cause. In principle, in the real world, it is impossible to be certain that all relevant factors or common causes have been observed, because some factors may not have been conceived of, and therefore are impossible to measure. In view of this, we have developed a novel algorithm named HCC-CLINDE to infer an GRN from time series data allowing the presence of hidden common cause(s). We assume there is a sparse causal graph (possibly with cycles) of interest, where the variables are continuous and each causal link has a delay (possibly more than one time step). A small but unknown number of variables are not observed. Each unobserved variable has only observed variables as children and parents, with at least two children, and the children are not linked to each other. Since it is difficult to obtain very long time series, our algorithm is also capable of utilizing multiple short time series, which is more realistic. To our knowledge, our algorithm is far less restrictive than previous works. We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. The results show that our algorithm can adequately recover the true causal GRN and is robust to slight deviation from Gaussian distribution in the error terms. We have also demonstrated the potential of our algorithm on small YEASTRACT subnetworks using limited real data.
Collapse
Affiliation(s)
- Leung-Yau Lo
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, Hong Kong
- * E-mail:
| | - Man-Leung Wong
- Department of Computing and Decision Sciences, Lingnan University, Tuen Mun, Hong Kong
| | - Kin-Hong Lee
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, Hong Kong
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, Hong Kong
| |
Collapse
|
37
|
Lo LY, Leung KS, Lee KH. Inferring Time-Delayed Causal Gene Network Using Time-Series Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:1169-1182. [PMID: 26451828 DOI: 10.1109/tcbb.2015.2394442] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Inferring gene regulatory network (GRN) from the microarray expression data is an important problem in Bioinformatics, because knowing the GRN is an essential first step in understanding the inner workings of the cell and the related diseases. Time delays exist in the regulatory effects from one gene to another due to the time needed for transcription, translation, and to accumulate a sufficient number of needed proteins. Also, it is known that the delays are important for oscillatory phenomenon. Therefore, it is crucial to develop a causal gene network model, preferably as a function of time. In this paper, we propose an algorithm CLINDE to infer causal directed links in GRN with time delays and regulatory effects in the links from time-series microarray gene expression data. It is one of the most comprehensive in terms of features compared to the state-of-the-art discrete gene network models. We have tested CLINDE on synthetic data, the in vivo IRMA (On and Off) datasets and the [1] yeast expression data validated using KEGG pathways. Results show that CLINDE can effectively recover the links, the time delays and the regulatory effects in the synthetic data, and outperforms other algorithms in the IRMA in vivo datasets.
Collapse
|
38
|
Kim JR, Choo SM, Choi HS, Cho KH. Identification of Gene Networks with Time Delayed Regulation Based on Temporal Expression Profiles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:1161-1168. [PMID: 26451827 DOI: 10.1109/tcbb.2015.2394312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
There are fundamental limitations in inferring the functional interaction structure of a gene (regulatory) network only from sequence information such as binding motifs. To overcome such limitations, various approaches have been developed to infer the functional interaction structure from expression profiles. However, most of them have not been so successful due to the experimental limitations and computational complexity. Hence, there is a pressing need to develop a simple but effective methodology that can systematically identify the functional interaction structure of a gene network from time-series expression profiles. In particular, we need to take into account the different time delay effects in gene regulation since they are ubiquitously present. We have considered a new experiment that measures the overall expression changes after a perturbation on a specific gene. Based on this experiment, we have proposed a new inference method that can take account of the time delay induced while the perturbation affects its primary target genes. Specifically, we have developed an algebraic equation from which we can identify the subnetwork structure around the perturbed gene. We have also analyzed the influence of time delay on the inferred network structure. The proposed method is particularly useful for identification of a gene network with small variations in the time delay of gene regulation.
Collapse
|
39
|
Chen C, Yao Y, Zhang L, Xu M, Jiang J, Dou T, Lin W, Zhao G, Huang M, Zhou Y. A Comprehensive Analysis of the Transcriptomes of Marssonina brunnea and Infected Poplar Leaves to Capture Vital Events in Host-Pathogen Interactions. PLoS One 2015. [PMID: 26222429 PMCID: PMC4519268 DOI: 10.1371/journal.pone.0134246] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Background Understanding host-pathogen interaction mechanisms helps to elucidate the entire infection process and focus on important events, and it is a promising approach for improvement of disease control and selection of treatment strategy. Time-course host-pathogen transcriptome analyses and network inference have been applied to unravel the direct or indirect relationships of gene expression alterations. However, time series analyses can suffer from absent time points due to technical problems such as RNA degradation, which limits the application of algorithms that require strict sequential sampling. Here, we introduce an efficient method using independence test to infer an independent network that is exclusively concerned with the frequency of gene expression changes. Results Highly resistant NL895 poplar leaves and weakly resistant NL214 leaves were infected with highly active and weakly active Marssonina brunnea, respectively, and were harvested at different time points. The independent network inference illustrated the top 1,000 vital fungus-poplar relationships, which contained 768 fungal genes and 54 poplar genes. These genes could be classified into three categories: a fungal gene surrounded by many poplar genes; a poplar gene connected to many fungal genes; and other genes (possessing low degrees of connectivity). Notably, the fungal gene M6_08342 (a metalloprotease) was connected to 10 poplar genes, particularly including two disease-resistance genes. These core genes, which are surrounded by other genes, may be of particular importance in complicated infection processes and worthy of further investigation. Conclusions We provide a clear framework of the interaction network and identify a number of candidate key effectors in this process, which might assist in functional tests, resistant clone selection, and disease control in the future.
Collapse
Affiliation(s)
- Chengwen Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, People’s Republic of China
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, People's Republic of China
- Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China
| | - Ye Yao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, People’s Republic of China
- Center for Computational Systems Biology and School of Mathematical Sciences, Fudan University, Shanghai, People’s Republic of China
| | - Liang Zhang
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, People's Republic of China
| | - Minjie Xu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, People’s Republic of China
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, People's Republic of China
| | - Jianping Jiang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, People’s Republic of China
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, People's Republic of China
| | - Tonghai Dou
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, People’s Republic of China
| | - Wei Lin
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, People’s Republic of China
- Center for Computational Systems Biology and School of Mathematical Sciences, Fudan University, Shanghai, People’s Republic of China
| | - Guoping Zhao
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, People's Republic of China
| | - Minren Huang
- Jiangsu Key Laboratory for Poplar Germplasm Enhancement and Variety Improvement, Nanjing Forestry University, Nanjing, People’s Republic of China
| | - Yan Zhou
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, People’s Republic of China
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, People's Republic of China
- * E-mail:
| |
Collapse
|
40
|
Zhang W, Zhou T. A Sparse Reconstruction Approach for Identifying Gene Regulatory Networks Using Steady-State Experiment Data. PLoS One 2015. [PMID: 26207991 PMCID: PMC4514654 DOI: 10.1371/journal.pone.0130979] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Motivation Identifying gene regulatory networks (GRNs) which consist of a large number of interacting units has become a problem of paramount importance in systems biology. Situations exist extensively in which causal interacting relationships among these units are required to be reconstructed from measured expression data and other a priori information. Though numerous classical methods have been developed to unravel the interactions of GRNs, these methods either have higher computing complexities or have lower estimation accuracies. Note that great similarities exist between identification of genes that directly regulate a specific gene and a sparse vector reconstruction, which often relates to the determination of the number, location and magnitude of nonzero entries of an unknown vector by solving an underdetermined system of linear equations y = Φx. Based on these similarities, we propose a novel framework of sparse reconstruction to identify the structure of a GRN, so as to increase accuracy of causal regulation estimations, as well as to reduce their computational complexity. Results In this paper, a sparse reconstruction framework is proposed on basis of steady-state experiment data to identify GRN structure. Different from traditional methods, this approach is adopted which is well suitable for a large-scale underdetermined problem in inferring a sparse vector. We investigate how to combine the noisy steady-state experiment data and a sparse reconstruction algorithm to identify causal relationships. Efficiency of this method is tested by an artificial linear network, a mitogen-activated protein kinase (MAPK) pathway network and the in silico networks of the DREAM challenges. The performance of the suggested approach is compared with two state-of-the-art algorithms, the widely adopted total least-squares (TLS) method and those available results on the DREAM project. Actual results show that, with a lower computational cost, the proposed method can significantly enhance estimation accuracy and greatly reduce false positive and negative errors. Furthermore, numerical calculations demonstrate that the proposed algorithm may have faster convergence speed and smaller fluctuation than other methods when either estimate error or estimate bias is considered.
Collapse
Affiliation(s)
- Wanhong Zhang
- School of Chemical Machinery, Qinghai University, Qinghai, China
- Department of Automation, Tsinghua University, Beijing, China
- * E-mail:
| | - Tong Zhou
- School of Chemical Machinery, Qinghai University, Qinghai, China
- Tsinghua National Laboratory for Information Science and Technology(TNList), Tsinghua University, Beijing, China
| |
Collapse
|
41
|
Inferring Broad Regulatory Biology from Time Course Data: Have We Reached an Upper Bound under Constraints Typical of In Vivo Studies? PLoS One 2015; 10:e0127364. [PMID: 25984725 PMCID: PMC4435750 DOI: 10.1371/journal.pone.0127364] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 04/13/2015] [Indexed: 12/21/2022] Open
Abstract
There is a growing appreciation for the network biology that regulates the coordinated expression of molecular and cellular markers however questions persist regarding the identifiability of these networks. Here we explore some of the issues relevant to recovering directed regulatory networks from time course data collected under experimental constraints typical of in vivo studies. NetSim simulations of sparsely connected biological networks were used to evaluate two simple feature selection techniques used in the construction of linear Ordinary Differential Equation (ODE) models, namely truncation of terms versus latent vector projection. Performance was compared with ODE-based Time Series Network Identification (TSNI) integral, and the information-theoretic Time-Delay ARACNE (TD-ARACNE). Projection-based techniques and TSNI integral outperformed truncation-based selection and TD-ARACNE on aggregate networks with edge densities of 10-30%, i.e. transcription factor, protein-protein cliques and immune signaling networks. All were more robust to noise than truncation-based feature selection. Performance was comparable on the in silico 10-node DREAM 3 network, a 5-node Yeast synthetic network designed for In vivo Reverse-engineering and Modeling Assessment (IRMA) and a 9-node human HeLa cell cycle network of similar size and edge density. Performance was more sensitive to the number of time courses than to sample frequency and extrapolated better to larger networks by grouping experiments. In all cases performance declined rapidly in larger networks with lower edge density. Limited recovery and high false positive rates obtained overall bring into question our ability to generate informative time course data rather than the design of any particular reverse engineering algorithm.
Collapse
|
42
|
Zhang X, Zhao J, Hao JK, Zhao XM, Chen L. Conditional mutual inclusive information enables accurate quantification of associations in gene regulatory networks. Nucleic Acids Res 2015; 43:e31. [PMID: 25539927 PMCID: PMC4357691 DOI: 10.1093/nar/gku1315] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Revised: 12/03/2014] [Accepted: 12/05/2014] [Indexed: 11/13/2022] Open
Abstract
Mutual information (MI), a quantity describing the nonlinear dependence between two random variables, has been widely used to construct gene regulatory networks (GRNs). Despite its good performance, MI cannot separate the direct regulations from indirect ones among genes. Although the conditional mutual information (CMI) is able to identify the direct regulations, it generally underestimates the regulation strength, i.e. it may result in false negatives when inferring gene regulations. In this work, to overcome the problems, we propose a novel concept, namely conditional mutual inclusive information (CMI2), to describe the regulations between genes. Furthermore, with CMI2, we develop a new approach, namely CMI2NI (CMI2-based network inference), for reverse-engineering GRNs. In CMI2NI, CMI2 is used to quantify the mutual information between two genes given a third one through calculating the Kullback-Leibler divergence between the postulated distributions of including and excluding the edge between the two genes. The benchmark results on the GRNs from DREAM challenge as well as the SOS DNA repair network in Escherichia coli demonstrate the superior performance of CMI2NI. Specifically, even for gene expression data with small sample size, CMI2NI can not only infer the correct topology of the regulation networks but also accurately quantify the regulation strength between genes. As a case study, CMI2NI was also used to reconstruct cancer-specific GRNs using gene expression data from The Cancer Genome Atlas (TCGA). CMI2NI is freely accessible at http://www.comp-sysbio.org/cmi2ni.
Collapse
Affiliation(s)
- Xiujun Zhang
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China Department of Mathematics, Xinyang Normal University, Xinyang 464000, China School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore 637459, Singapore
| | - Juan Zhao
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Jin-Kao Hao
- LERIA, Department of Computer Science, University of Angers, Angers 49045, France
| | - Xing-Ming Zhao
- Department of Computer Science, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China Collaborative Research Center for Innovative Mathematical Modelling, Institute of Industrial Science, University of Tokyo, Tokyo 153-8505, Japan
| |
Collapse
|
43
|
Yang C, Wei H. Designing microarray and RNA-Seq experiments for greater systems biology discovery in modern plant genomics. MOLECULAR PLANT 2015; 8:196-206. [PMID: 25680773 DOI: 10.1016/j.molp.2014.11.012] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2014] [Revised: 10/31/2014] [Accepted: 11/02/2014] [Indexed: 05/07/2023]
Abstract
Microarray and RNA-seq experiments have become an important part of modern genomics and systems biology. Obtaining meaningful biological data from these experiments is an arduous task that demands close attention to many details. Negligence at any step can lead to gene expression data containing inadequate or composite information that is recalcitrant for pattern extraction. Therefore, it is imperative to carefully consider experimental design before launching a time-consuming and costly experiment. Contemporarily, most genomics experiments have two objectives: (1) to generate two or more groups of comparable data for identifying differentially expressed genes, gene families, biological processes, or metabolic pathways under experimental conditions; (2) to build local gene regulatory networks and identify hierarchically important regulators governing biological processes and pathways of interest. Since the first objective aims to identify the active molecular identities and the second provides a basis for understanding the underlying molecular mechanisms through inferring causality relationships mediated by treatment, an optimal experiment is to produce biologically relevant and extractable data to meet both objectives without substantially increasing the cost. This review discusses the major issues that researchers commonly face when embarking on microarray or RNA-seq experiments and summarizes important aspects of experimental design, which aim to help researchers deliberate how to generate gene expression profiles with low background noise but with more interaction to facilitate novel biological discoveries in modern plant genomics.
Collapse
Affiliation(s)
- Chuanping Yang
- State Key Laboratory of Forest Tree Genetics and Breeding, Northeast Forestry University, Harbin, Heilongjiang 150040, China
| | - Hairong Wei
- State Key Laboratory of Forest Tree Genetics and Breeding, Northeast Forestry University, Harbin, Heilongjiang 150040, China; Biotechnology Research Center, School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA.
| |
Collapse
|
44
|
Abstract
The succession of protein activation and deactivation mediated by phosphorylation and dephosphorylation events constitutes a key mechanism of molecular information transfer in cellular systems. To deduce the details of those molecular information cascades and networks has been a central goal pursued by both experimental and computational approaches. Many computational network reconstruction methods employing an array of different statistical learning methods have been developed to infer phosphorylation networks based on different types of molecular data sets such as protein sequence, protein structure, or phosphoproteomics data. In this chapter, different computational network inference methods and resources for biological network reconstruction with a particular focus on phosphorylation networks are surveyed.
Collapse
|
45
|
Astola L, Molenaar J. A New Modified Histogram Matching Normalization for Time Series Microarray Analysis. MICROARRAYS 2014; 3:203-11. [PMID: 27600344 PMCID: PMC4996360 DOI: 10.3390/microarrays3030203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Revised: 06/19/2014] [Accepted: 06/25/2014] [Indexed: 11/16/2022]
Abstract
Microarray data is often utilized in inferring regulatory networks. Quantile normalization (QN) is a popular method to reduce array-to-array variation. We show that in the context of time series measurements QN may not be the best choice for this task, especially not if the inference is based on continuous time ODE model. We propose an alternative normalization method that is better suited for network inference from time series data.
Collapse
Affiliation(s)
- Laura Astola
- Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven 5612 AZ,The Netherlands.
| | - Jaap Molenaar
- Biometris, Wageningen University and Research Centre, Wageningen 6708 PB, The Netherlands.
- Wageningen Centre for Systems Biology, Wageningen 6700 AC, The Netherlands.
| |
Collapse
|
46
|
Young WC, Raftery AE, Yeung KY. Fast Bayesian inference for gene regulatory networks using ScanBMA. BMC SYSTEMS BIOLOGY 2014; 8:47. [PMID: 24742092 PMCID: PMC4006459 DOI: 10.1186/1752-0509-8-47] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2013] [Accepted: 04/04/2014] [Indexed: 11/22/2022]
Abstract
Background Genome-wide time-series data provide a rich set of information for discovering gene regulatory relationships. As genome-wide data for mammalian systems are being generated, it is critical to develop network inference methods that can handle tens of thousands of genes efficiently, provide a systematic framework for the integration of multiple data sources, and yield robust, accurate and compact gene-to-gene relationships. Results We developed and applied ScanBMA, a Bayesian inference method that incorporates external information to improve the accuracy of the inferred network. In particular, we developed a new strategy to efficiently search the model space, applied data transformations to reduce the effect of spurious relationships, and adopted the g-prior to guide the search for candidate regulators. Our method is highly computationally efficient, thus addressing the scalability issue with network inference. The method is implemented as the ScanBMA function in the networkBMA Bioconductor software package. Conclusions We compared ScanBMA to other popular methods using time series yeast data as well as time-series simulated data from the DREAM competition. We found that ScanBMA produced more compact networks with a greater proportion of true positives than the competing methods. Specifically, ScanBMA generally produced more favorable areas under the Receiver-Operating Characteristic and Precision-Recall curves than other regression-based methods and mutual-information based methods. In addition, ScanBMA is competitive with other network inference methods in terms of running time.
Collapse
Affiliation(s)
| | | | - Ka Yee Yeung
- Department of Microbiology, University of Washington, Box 357735, 98195-7735, Seattle WA, USA.
| |
Collapse
|
47
|
Zheng Z, Christley S, Chiu WT, Blitz IL, Xie X, Cho KWY, Nie Q. Inference of the Xenopus tropicalis embryonic regulatory network and spatial gene expression patterns. BMC SYSTEMS BIOLOGY 2014; 8:3. [PMID: 24397936 PMCID: PMC3896677 DOI: 10.1186/1752-0509-8-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Accepted: 12/19/2013] [Indexed: 11/10/2022]
Abstract
BACKGROUND During embryogenesis, signaling molecules produced by one cell population direct gene regulatory changes in neighboring cells and influence their developmental fates and spatial organization. One of the earliest events in the development of the vertebrate embryo is the establishment of three germ layers, consisting of the ectoderm, mesoderm and endoderm. Attempts to measure gene expression in vivo in different germ layers and cell types are typically complicated by the heterogeneity of cell types within biological samples (i.e., embryos), as the responses of individual cell types are intermingled into an aggregate observation of heterogeneous cell types. Here, we propose a novel method to elucidate gene regulatory circuits from these aggregate measurements in embryos of the frog Xenopus tropicalis using gene network inference algorithms and then test the ability of the inferred networks to predict spatial gene expression patterns. RESULTS We use two inference models with different underlying assumptions that incorporate existing network information, an ODE model for steady-state data and a Markov model for time series data, and contrast the performance of the two models. We apply our method to both control and knockdown embryos at multiple time points to reconstruct the core mesoderm and endoderm regulatory circuits. Those inferred networks are then used in combination with known dorsal-ventral spatial expression patterns of a subset of genes to predict spatial expression patterns for other genes. Both models are able to predict spatial expression patterns for some of the core mesoderm and endoderm genes, but interestingly of different gene subsets, suggesting that neither model is sufficient to recapitulate all of the spatial patterns, yet they are complementary for the patterns that they do capture. CONCLUSION The presented methodology of gene network inference combined with spatial pattern prediction provides an additional layer of validation to elucidate the regulatory circuits controlling the spatial-temporal dynamics in embryonic development.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Qing Nie
- Department of Mathematics, University of California, Irvine, CA 92697, USA.
| |
Collapse
|
48
|
Extrapolating In Vitro Results to Predict Human Toxicity. METHODS IN PHARMACOLOGY AND TOXICOLOGY 2014. [DOI: 10.1007/978-1-4939-0521-8_24] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
49
|
Using gene expression programming to infer gene regulatory networks from time-series data. Comput Biol Chem 2013; 47:198-206. [DOI: 10.1016/j.compbiolchem.2013.09.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2013] [Revised: 09/19/2013] [Accepted: 09/21/2013] [Indexed: 11/22/2022]
|
50
|
Zaslavsky E, Nudelman G, Marquez S, Hershberg U, Hartmann BM, Thakar J, Sealfon SC, Kleinstein SH. Reconstruction of regulatory networks through temporal enrichment profiling and its application to H1N1 influenza viral infection. BMC Bioinformatics 2013; 14 Suppl 6:S1. [PMID: 23734902 PMCID: PMC3633009 DOI: 10.1186/1471-2105-14-s6-s1] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Background H1N1 influenza viruses were responsible for the 1918 pandemic that caused millions of deaths worldwide and the 2009 pandemic that caused approximately twenty thousand deaths. The cellular response to such virus infections involves extensive genetic reprogramming resulting in an antiviral state that is critical to infection control. Identifying the underlying transcriptional network driving these changes, and how this program is altered by virally-encoded immune antagonists, is a fundamental challenge in systems immunology. Results Genome-wide gene expression patterns were measured in human monocyte-derived dendritic cells (DCs) infected in vitro with seasonal H1N1 influenza A/New Caledonia/20/1999. To provide a mechanistic explanation for the timing of gene expression changes over the first 12 hours post-infection, we developed a statistically rigorous enrichment approach integrating genome-wide expression kinetics and time-dependent promoter analysis. Our approach, TIme-Dependent Activity Linker (TIDAL), generates a regulatory network that connects transcription factors associated with each temporal phase of the response into a coherent linked cascade. TIDAL infers 12 transcription factors and 32 regulatory connections that drive the antiviral response to influenza. To demonstrate the generality of this approach, TIDAL was also used to generate a network for the DC response to measles infection. The software implementation of TIDAL is freely available at http://tsb.mssm.edu/primeportal/?q=tidal_prog. Conclusions We apply TIDAL to reconstruct the transcriptional programs activated in monocyte-derived human dendritic cells in response to influenza and measles infections. The application of this time-centric network reconstruction method in each case produces a single transcriptional cascade that recapitulates the known biology of the response with high precision and recall, in addition to identifying potentially novel antiviral factors. The ability to reconstruct antiviral networks with TIDAL enables comparative analysis of antiviral responses, such as the differences between pandemic and seasonal influenza infections.
Collapse
Affiliation(s)
- Elena Zaslavsky
- Center for Translational Systems Biology and Department of Neurology, Mount Sinai School of Medicine, New York, NY 10029, USA.
| | | | | | | | | | | | | | | |
Collapse
|