1
|
Yang G, Lei S, Yang G. Robust Model-Free Identification of the Causal Networks Underlying Complex Nonlinear Systems. ENTROPY (BASEL, SWITZERLAND) 2024; 26:1063. [PMID: 39766692 PMCID: PMC11675911 DOI: 10.3390/e26121063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2024] [Revised: 11/28/2024] [Accepted: 11/30/2024] [Indexed: 01/11/2025]
Abstract
Inferring causal networks from noisy observations is of vital importance in various fields. Due to the complexity of system modeling, the way in which universal and feasible inference algorithms are studied is a key challenge for network reconstruction. In this study, without any assumptions, we develop a novel model-free framework to uncover only the direct relationships in networked systems from observations of their nonlinear dynamics. Our proposed methods are termed multiple-order Polynomial Conditional Granger Causality (PCGC) and sparse PCGC (SPCGC). PCGC mainly adopts polynomial functions to approximate the whole system model, which can be used to judge the interactions among nodes through subsequent nonlinear Granger causality analysis. For SPCGC, Lasso optimization is first used for dimension reduction, and then PCGC is executed to obtain the final network. Specifically, the conditional variables are fused in this general, model-free framework regardless of their formulations in the system model, which could effectively reconcile the inference of direct interactions with an indirect influence. Based on many classical dynamical systems, the performances of PCGC and SPCGC are analyzed and verified. Generally, the proposed framework could be quite promising for the provision of certain guidance for data-driven modeling with an unknown model.
Collapse
Affiliation(s)
- Guanxue Yang
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China;
| | - Shimin Lei
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China;
| | - Guanxiao Yang
- College of Automation, Jiangsu University of Science and Technology, Zhenjiang 212100, China;
| |
Collapse
|
2
|
Wu Y, Zhou D, Hu J. Reconstruction of gene regulatory networks for Caenorhabditis elegans using tree-shaped gene expression data. Brief Bioinform 2024; 25:bbae396. [PMID: 39133097 PMCID: PMC11318059 DOI: 10.1093/bib/bbae396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 06/11/2024] [Accepted: 08/07/2024] [Indexed: 08/13/2024] Open
Abstract
Constructing gene regulatory networks is a widely adopted approach for investigating gene regulation, offering diverse applications in biology and medicine. A great deal of research focuses on using time series data or single-cell RNA-sequencing data to infer gene regulatory networks. However, such gene expression data lack either cellular or temporal information. Fortunately, the advent of time-lapse confocal laser microscopy enables biologists to obtain tree-shaped gene expression data of Caenorhabditis elegans, achieving both cellular and temporal resolution. Although such tree-shaped data provide abundant knowledge, they pose challenges like non-pairwise time series, laying the inaccuracy of downstream analysis. To address this issue, a comprehensive framework for data integration and a novel Bayesian approach based on Boolean network with time delay are proposed. The pre-screening process and Markov Chain Monte Carlo algorithm are applied to obtain the parameter estimates. Simulation studies show that our method outperforms existing Boolean network inference algorithms. Leveraging the proposed approach, gene regulatory networks for five subtrees are reconstructed based on the real tree-shaped datatsets of Caenorhabditis elegans, where some gene regulatory relationships confirmed in previous genetic studies are recovered. Also, heterogeneity of regulatory relationships in different cell lineage subtrees is detected. Furthermore, the exploration of potential gene regulatory relationships that bear importance in human diseases is undertaken. All source code is available at the GitHub repository https://github.com/edawu11/BBTD.git.
Collapse
Affiliation(s)
- Yida Wu
- School of Mathematical Sciences, Xiamen University, Zengcuo'an West Road, Siming District, Xiamen 361000, China
| | - Da Zhou
- School of Mathematical Sciences, Xiamen University, Zengcuo'an West Road, Siming District, Xiamen 361000, China
| | - Jie Hu
- School of Mathematical Sciences, Xiamen University, Zengcuo'an West Road, Siming District, Xiamen 361000, China
| |
Collapse
|
3
|
Yang G, Hu W, He L, Dou L. Nonlinear causal network learning via Granger causality based on extreme support vector regression. CHAOS (WOODBURY, N.Y.) 2024; 34:023127. [PMID: 38377295 DOI: 10.1063/5.0183537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 01/22/2024] [Indexed: 02/22/2024]
Abstract
For complex networked systems, based on the consideration of nonlinearity and causality, a novel general method of nonlinear causal network learning, termed extreme support vector regression Granger causality (ESVRGC), is proposed. The nonuniform time-delayed influence of the driving nodes on the target node is particularly considered. Then, the restricted model and the unrestricted model of Granger causality are, respectively, formulated based on extreme support vector regression, which uses the selected time-delayed components of system variables as the inputs of kernel functions. The nonlinear conditional Granger causality index is finally calculated to confirm the strength of a causal interaction. Generally, based on the simulation of a nonlinear vector autoregressive model and nonlinear discrete time-delayed dynamic systems, ESVRGC demonstrates better performance than other popular methods. Also, the validity and robustness of ESVRGC are also verified by the different cases of network types, sample sizes, noise intensities, and coupling strengths. Finally, the superiority of ESVRGC is successful verified by the experimental study on real benchmark datasets.
Collapse
Affiliation(s)
- Guanxue Yang
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China
| | - Weiwei Hu
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China
| | - Lidong He
- School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Liya Dou
- Department of Automation, Beijing University of Chemical Technology, Beijing 100029, China
| |
Collapse
|
4
|
Das P, Babadi B. Non-Asymptotic Guarantees for Reliable Identification of Granger Causality via the LASSO. IEEE TRANSACTIONS ON INFORMATION THEORY 2023; 69:7439-7460. [PMID: 38646067 PMCID: PMC11025718 DOI: 10.1109/tit.2023.3296336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Granger causality is among the widely used data-driven approaches for causal analysis of time series data with applications in various areas including economics, molecular biology, and neuroscience. Two of the main challenges of this methodology are: 1) over-fitting as a result of limited data duration, and 2) correlated process noise as a confounding factor, both leading to errors in identifying the causal influences. Sparse estimation via the LASSO has successfully addressed these challenges for parameter estimation. However, the classical statistical tests for Granger causality resort to asymptotic analysis of ordinary least squares, which require long data duration to be useful and are not immune to confounding effects. In this work, we address this disconnect by introducing a LASSO-based statistic and studying its non-asymptotic properties under the assumption that the true models admit sparse autoregressive representations. We establish fundamental limits for reliable identification of Granger causal influences using the proposed LASSO-based statistic. We further characterize the false positive error probability and test power of a simple thresholding rule for identifying Granger causal effects and provide two methods to set the threshold in a data-driven fashion. We present simulation studies and application to real data to compare the performance of our proposed method to ordinary least squares and existing LASSO-based methods in detecting Granger causal influences, which corroborate our theoretical results.
Collapse
Affiliation(s)
- Proloy Das
- Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, Boston, MA, 02114 USA
| | - Behtash Babadi
- Department of Electrical and Computer Engineering and the Institute for Systems Research, University of Maryland, College Park, MD, 20742 USA
| |
Collapse
|
5
|
Kathpalia A, Nagaraj N. Granger causality for compressively sensed sparse signals. Phys Rev E 2023; 107:034308. [PMID: 37072975 DOI: 10.1103/physreve.107.034308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 02/26/2023] [Indexed: 04/20/2023]
Abstract
Compressed sensing is a scheme that allows for sparse signals to be acquired, transmitted, and stored using far fewer measurements than done by conventional means employing the Nyquist sampling theorem. Since many naturally occurring signals are sparse (in some domain), compressed sensing has rapidly seen popularity in a number of applied physics and engineering applications, particularly in designing signal and image acquisition strategies, e.g., magnetic resonance imaging, quantum state tomography, scanning tunneling microscopy, and analog to digital conversion technologies. Contemporaneously, causal inference has become an important tool for the analysis and understanding of processes and their interactions in many disciplines of science, especially those dealing with complex systems. Direct causal analysis for compressively sensed data is required to avoid the task of reconstructing the compressed data. Also, for some sparse signals, such as for sparse temporal data, it may be difficult to discover causal relations directly using available data-driven or model-free causality estimation techniques. In this work, we provide a mathematical proof that structured compressed sensing matrices, specifically circulant and Toeplitz, preserve causal relationships in the compressed signal domain, as measured by Granger causality (GC). We then verify this theorem on a number of bivariate and multivariate coupled sparse signal simulations which are compressed using these matrices. We also demonstrate a real world application of network causal connectivity estimation from sparse neural spike train recordings from rat prefrontal cortex. In addition to demonstrating the effectiveness of structured matrices for GC estimation from sparse signals, we also show a computational time advantage of the proposed strategy for causal inference from compressed signals of both sparse and regular autoregressive processes as compared to standard GC estimation from original signals.
Collapse
Affiliation(s)
- Aditi Kathpalia
- Department of Complex Systems, Institute of Computer Science of the Czech Academy of Sciences, Prague 18200, Czech Republic
| | - Nithin Nagaraj
- Consciousness Studies Programme, National Institute of Advanced Studies, Bengaluru 560012, India
| |
Collapse
|
6
|
Tank A, Covert I, Foti N, Shojaie A, Fox EB. Neural Granger Causality. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4267-4279. [PMID: 33705309 PMCID: PMC9739174 DOI: 10.1109/tpami.2021.3065601] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
While most classical approaches to Granger causality detection assume linear dynamics, many interactions in real-world applications, like neuroscience and genomics, are inherently nonlinear. In these cases, using linear models may lead to inconsistent estimation of Granger causal interactions. We propose a class of nonlinear methods by applying structured multilayer perceptrons (MLPs) or recurrent neural networks (RNNs) combined with sparsity-inducing penalties on the weights. By encouraging specific sets of weights to be zero-in particular, through the use of convex group-lasso penalties-we can extract the Granger causal structure. To further contrast with traditional approaches, our framework naturally enables us to efficiently capture long-range dependencies between series either via our RNNs or through an automatic lag selection in the MLP. We show that our neural Granger causality methods outperform state-of-the-art nonlinear Granger causality methods on the DREAM3 challenge data. This data consists of nonlinear gene expression and regulation time courses with only a limited number of time points. The successes we show in this challenging dataset provide a powerful example of how deep learning can be useful in cases that go beyond prediction on large datasets. We likewise illustrate our methods in detecting nonlinear interactions in a human motion capture dataset.
Collapse
|
7
|
Abstract
AbstractInterpretable machine learning has become a very active area of research due to the rising popularity of machine learning algorithms and their inherently challenging interpretability. Most work in this area has been focused on the interpretation of single features in a model. However, for researchers and practitioners, it is often equally important to quantify the importance or visualize the effect of feature groups. To address this research gap, we provide a comprehensive overview of how existing model-agnostic techniques can be defined for feature groups to assess the grouped feature importance, focusing on permutation-based, refitting, and Shapley-based methods. We also introduce an importance-based sequential procedure that identifies a stable and well-performing combination of features in the grouped feature space. Furthermore, we introduce the combined features effect plot, which is a technique to visualize the effect of a group of features based on a sparse, interpretable linear combination of features. We used simulation studies and real data examples to analyze, compare, and discuss these methods.
Collapse
|
8
|
Shojaie A, Fox EB. Granger Causality: A Review and Recent Advances. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2022; 9:289-319. [PMID: 37840549 PMCID: PMC10571505 DOI: 10.1146/annurev-statistics-040120-010930] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2023]
Abstract
Introduced more than a half-century ago, Granger causality has become a popular tool for analyzing time series data in many application domains, from economics and finance to genomics and neuroscience. Despite this popularity, the validity of this framework for inferring causal relationships among time series has remained the topic of continuous debate. Moreover, while the original definition was general, limitations in computational tools have constrained the applications of Granger causality to primarily simple bivariate vector autoregressive processes. Starting with a review of early developments and debates, this article discusses recent advances that address various shortcomings of the earlier approaches, from models for high-dimensional time series to more recent developments that account for nonlinear and non-Gaussian observations and allow for subsampled and mixed-frequency time series.
Collapse
Affiliation(s)
- Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle, Washington 98195-4322, USA
| | - Emily B Fox
- Department of Statistics, Stanford University, Stanford, California 94305-4020, USA
| |
Collapse
|
9
|
Deshpande A, Chu LF, Stewart R, Gitter A. Network inference with Granger causality ensembles on single-cell transcriptomics. Cell Rep 2022; 38:110333. [PMID: 35139376 PMCID: PMC9093087 DOI: 10.1016/j.celrep.2022.110333] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 02/19/2021] [Accepted: 01/12/2022] [Indexed: 12/20/2022] Open
Abstract
Cellular gene expression changes throughout a dynamic biological process, such as differentiation. Pseudotimes estimate cells' progress along a dynamic process based on their individual gene expression states. Ordering the expression data by pseudotime provides information about the underlying regulator-gene interactions. Because the pseudotime distribution is not uniform, many standard mathematical methods are inapplicable for analyzing the ordered gene expression states. Here we present single-cell inference of networks using Granger ensembles (SINGE), an algorithm for gene regulatory network inference from ordered single-cell gene expression data. SINGE uses kernel-based Granger causality regression to smooth irregular pseudotimes and missing expression values. It aggregates predictions from an ensemble of regression analyses to compile a ranked list of candidate interactions between transcriptional regulators and target genes. In two mouse embryonic stem cell differentiation datasets, SINGE outperforms other contemporary algorithms. However, a more detailed examination reveals caveats about poor performance for individual regulators and uninformative pseudotimes.
Collapse
Affiliation(s)
- Atul Deshpande
- Department of Electrical and Computer Engineering, University of Wisconsin - Madison, Madison, WI 53706, USA; Morgridge Institute for Research, Madison, WI 53715, USA
| | - Li-Fang Chu
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Ron Stewart
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Anthony Gitter
- Morgridge Institute for Research, Madison, WI 53715, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI 53792, USA.
| |
Collapse
|
10
|
Joint learning of multiple Granger causal networks via non-convex regularizations: Inference of group-level brain connectivity. Neural Netw 2022; 149:157-171. [DOI: 10.1016/j.neunet.2022.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 01/09/2022] [Accepted: 02/06/2022] [Indexed: 11/23/2022]
|
11
|
Tozzo V, Azencott CA, Fiorini S, Fava E, Trucco A, Barla A. Where Do We Stand in Regularization for Life Science Studies? J Comput Biol 2021; 29:213-232. [PMID: 33926217 PMCID: PMC8968832 DOI: 10.1089/cmb.2019.0371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
More and more biologists and bioinformaticians turn to machine learning to analyze large amounts of data. In this context, it is crucial to understand which is the most suitable data analysis pipeline for achieving reliable results. This process may be challenging, due to a variety of factors, the most crucial ones being the data type and the general goal of the analysis (e.g., explorative or predictive). Life science data sets require further consideration as they often contain measures with a low signal-to-noise ratio, high-dimensional observations, and relatively few samples. In this complex setting, regularization, which can be defined as the introduction of additional information to solve an ill-posed problem, is the tool of choice to obtain robust models. Different regularization practices may be used depending both on characteristics of the data and of the question asked, and different choices may lead to different results. In this article, we provide a comprehensive description of the impact and importance of regularization techniques in life science studies. In particular, we provide an intuition of what regularization is and of the different ways it can be implemented and exploited. We propose four general life sciences problems in which regularization is fundamental and should be exploited for robustness. For each of these large families of problems, we enumerate different techniques as well as examples and case studies. Lastly, we provide a unified view of how to approach each data type with various regularization techniques.
Collapse
Affiliation(s)
- Veronica Tozzo
- Department of Informatics, Bioengineering, Robotics and System Engineering-DIBRIS, University of Genoa, Genoa, Italy
| | - Chloé-Agathe Azencott
- Centre for Computational Biology-CBIO, MINES ParisTech, PSL Research University, Paris, France.,Institut Curie, PSL Research University, Paris, France.,INSERM, U900, Paris, France
| | | | - Emanuele Fava
- Departiment of Electrical, Electronic, Telecommunications Engineering, and Naval Architecture (DITEN), University of Genoa, Genoa, Italy
| | - Andrea Trucco
- Departiment of Electrical, Electronic, Telecommunications Engineering, and Naval Architecture (DITEN), University of Genoa, Genoa, Italy
| | - Annalisa Barla
- Department of Informatics, Bioengineering, Robotics and System Engineering-DIBRIS, University of Genoa, Genoa, Italy
| |
Collapse
|
12
|
Lu J, Dumitrascu B, McDowell IC, Jo B, Barrera A, Hong LK, Leichter SM, Reddy TE, Engelhardt BE. Causal network inference from gene transcriptional time-series response to glucocorticoids. PLoS Comput Biol 2021; 17:e1008223. [PMID: 33513136 PMCID: PMC7875426 DOI: 10.1371/journal.pcbi.1008223] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 02/10/2021] [Accepted: 08/07/2020] [Indexed: 11/19/2022] Open
Abstract
Gene regulatory network inference is essential to uncover complex relationships among gene pathways and inform downstream experiments, ultimately enabling regulatory network re-engineering. Network inference from transcriptional time-series data requires accurate, interpretable, and efficient determination of causal relationships among thousands of genes. Here, we develop Bootstrap Elastic net regression from Time Series (BETS), a statistical framework based on Granger causality for the recovery of a directed gene network from transcriptional time-series data. BETS uses elastic net regression and stability selection from bootstrapped samples to infer causal relationships among genes. BETS is highly parallelized, enabling efficient analysis of large transcriptional data sets. We show competitive accuracy on a community benchmark, the DREAM4 100-gene network inference challenge, where BETS is one of the fastest among methods of similar performance and additionally infers whether causal effects are activating or inhibitory. We apply BETS to transcriptional time-series data of differentially-expressed genes from A549 cells exposed to glucocorticoids over a period of 12 hours. We identify a network of 2768 genes and 31,945 directed edges (FDR ≤ 0.2). We validate inferred causal network edges using two external data sources: Overexpression experiments on the same glucocorticoid system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project. BETS is available as an open source software package at https://github.com/lujonathanh/BETS.
Collapse
Affiliation(s)
- Jonathan Lu
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
| | - Bianca Dumitrascu
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Ian C. McDowell
- Element Genomics, A UCB Company, Durham, North Carolina, United States of America
| | - Brian Jo
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Alejandro Barrera
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States of America
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Linda K. Hong
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States of America
| | - Sarah M. Leichter
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States of America
| | - Timothy E. Reddy
- Department of Genome Sciences, Duke University, Durham, North Carolina, United States of America
| | - Barbara E. Engelhardt
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Center for Statistics and Machine Learning, Princeton University, Princeton, New Jersey, United States of America
| |
Collapse
|
13
|
|
14
|
Mainali K, Bewick S, Vecchio-Pagan B, Karig D, Fagan WF. Detecting interaction networks in the human microbiome with conditional Granger causality. PLoS Comput Biol 2019; 15:e1007037. [PMID: 31107866 PMCID: PMC6544333 DOI: 10.1371/journal.pcbi.1007037] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Revised: 05/31/2019] [Accepted: 04/22/2019] [Indexed: 12/29/2022] Open
Abstract
Human microbiome research is rife with studies attempting to deduce microbial correlation networks from sequencing data. Standard correlation and/or network analyses may be misleading when taken as an indication of taxon interactions because "correlation is neither necessary nor sufficient to establish causation"; environmental filtering can lead to correlation between non-interacting taxa. Unfortunately, microbial ecologists have generally used correlation as a proxy for causality although there is a general consensus about what constitutes a causal relationship: causes both precede and predict effects. We apply one of the first causal models for detecting interactions in human microbiome samples. Specifically, we analyze a long duration, high resolution time series of the human microbiome to decipher the networks of correlation and causation of human-associated microbial genera. We show that correlation is not a good proxy for biological interaction; we observed a weak negative relationship between correlation and causality. Strong interspecific interactions are disproportionately positive, whereas almost all strong intraspecific interactions are negative. Interestingly, intraspecific interactions also appear to act at a short timescale causing vast majority of the effects within 1-3 days. We report how different taxa are involved in causal relationships with others, and show that strong interspecific interactions are rarely conserved across two body sites whereas strong intraspecific interactions are much more conserved, ranging from 33% between the gut and right-hand to 70% between the two hands. Therefore, in the absence of guiding assumptions about ecological interactions, Granger causality and related techniques may be particularly helpful for understanding the driving factors governing microbiome composition and structure.
Collapse
Affiliation(s)
- Kumar Mainali
- Department of Biology, University of Maryland, College Park, Maryland, United States of America
| | - Sharon Bewick
- Department of Biology, University of Maryland, College Park, Maryland, United States of America
| | - Briana Vecchio-Pagan
- Research and Exploratory Development Department, Johns Hopkins University Applied Physics Laboratory, Laurel, Maryland, United States of America
| | - David Karig
- Research and Exploratory Development Department, Johns Hopkins University Applied Physics Laboratory, Laurel, Maryland, United States of America
| | - William F. Fagan
- Department of Biology, University of Maryland, College Park, Maryland, United States of America
| |
Collapse
|
15
|
Nguyen P, Braun R. Time-lagged Ordered Lasso for network inference. BMC Bioinformatics 2018; 19:545. [PMID: 30594121 PMCID: PMC6311035 DOI: 10.1186/s12859-018-2558-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 12/04/2018] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Accurate gene regulatory networks can be used to explain the emergence of different phenotypes, disease mechanisms, and other biological functions. Many methods have been proposed to infer networks from gene expression data but have been hampered by problems such as low sample size, inaccurate constraints, and incomplete characterizations of regulatory dynamics. Since expression regulation is dynamic, time-course data can be used to infer causality, but these datasets tend to be short or sparsely sampled. In addition, temporal methods typically assume that the expression of a gene at a time point depends on the expression of other genes at only the immediately preceding time point, while other methods include additional time points without any constraints to account for their temporal distance. These limitations can contribute to inaccurate networks with many missing and anomalous links. RESULTS We adapted the time-lagged Ordered Lasso, a regularized regression method with temporal monotonicity constraints, for de novo reconstruction. We also developed a semi-supervised method that embeds prior network information into the Ordered Lasso to discover novel regulatory dependencies in existing pathways. R code is available at https://github.com/pn51/laggedOrderedLassoNetwork . CONCLUSIONS We evaluated these approaches on simulated data for a repressilator, time-course data from past DREAM challenges, and a HeLa cell cycle dataset to show that they can produce accurate networks subject to the dynamics and assumptions of the time-lagged Ordered Lasso regression.
Collapse
Affiliation(s)
- Phan Nguyen
- Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, IL USA
| | - Rosemary Braun
- Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, IL USA
- Biostatistics Division, Feinberg School of Medicine, Northwestern University, Chicago, IL USA
| |
Collapse
|
16
|
Zhang Y, Topham DJ, Thakar J, Qiu X. FUNNEL-GSEA: FUNctioNal ELastic-net regression in time-course gene set enrichment analysis. Bioinformatics 2018; 33:1944-1952. [PMID: 28334094 PMCID: PMC5939227 DOI: 10.1093/bioinformatics/btx104] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Accepted: 02/17/2017] [Indexed: 01/26/2023] Open
Abstract
Motivation Gene set enrichment analyses (GSEAs) are widely used in genomic research to identify underlying biological mechanisms (defined by the gene sets), such as Gene Ontology terms and molecular pathways. There are two caveats in the currently available methods: (i) they are typically designed for group comparisons or regression analyses, which do not utilize temporal information efficiently in time-series of transcriptomics measurements; and (ii) genes overlapping in multiple molecular pathways are considered multiple times in hypothesis testing. Results We propose an inferential framework for GSEA based on functional data analysis, which utilizes the temporal information based on functional principal component analysis, and disentangles the effects of overlapping genes by a functional extension of the elastic-net regression. Furthermore, the hypothesis testing for the gene sets is performed by an extension of Mann-Whitney U test which is based on weighted rank sums computed from correlated observations. By using both simulated datasets and a large-scale time-course gene expression data on human influenza infection, we demonstrate that our method has uniformly better receiver operating characteristic curves, and identifies more pathways relevant to immune-response to human influenza infection than the competing approaches. Availability and Implementation The methods are implemented in R package FUNNEL, freely and publicly available at: https://github.com/yunzhang813/FUNNEL-GSEA-R-Package. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yun Zhang
- Department of Biostatistics and Computational Biology
| | - David J Topham
- Department of Microbiology and Immunology, University of Rochester, Rochester, NY 14642, USA
| | - Juilee Thakar
- Department of Biostatistics and Computational Biology.,Department of Microbiology and Immunology, University of Rochester, Rochester, NY 14642, USA
| | - Xing Qiu
- Department of Biostatistics and Computational Biology
| |
Collapse
|
17
|
Windowed Granger causal inference strategy improves discovery of gene regulatory networks. Proc Natl Acad Sci U S A 2018; 115:2252-2257. [PMID: 29440433 DOI: 10.1073/pnas.1710936115] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Accurate inference of regulatory networks from experimental data facilitates the rapid characterization and understanding of biological systems. High-throughput technologies can provide a wealth of time-series data to better interrogate the complex regulatory dynamics inherent to organisms, but many network inference strategies do not effectively use temporal information. We address this limitation by introducing Sliding Window Inference for Network Generation (SWING), a generalized framework that incorporates multivariate Granger causality to infer network structure from time-series data. SWING moves beyond existing Granger methods by generating windowed models that simultaneously evaluate multiple upstream regulators at several potential time delays. We demonstrate that SWING elucidates network structure with greater accuracy in both in silico and experimentally validated in vitro systems. We estimate the apparent time delays present in each system and demonstrate that SWING infers time-delayed, gene-gene interactions that are distinct from baseline methods. By providing a temporal framework to infer the underlying directed network topology, SWING generates testable hypotheses for gene-gene influences.
Collapse
|
18
|
Liang Y, Kelemen A. Computational dynamic approaches for temporal omics data with applications to systems medicine. BioData Min 2017. [PMID: 28638442 PMCID: PMC5473988 DOI: 10.1186/s13040-017-0140-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Modeling and predicting biological dynamic systems and simultaneously estimating the kinetic structural and functional parameters are extremely important in systems and computational biology. This is key for understanding the complexity of the human health, drug response, disease susceptibility and pathogenesis for systems medicine. Temporal omics data used to measure the dynamic biological systems are essentials to discover complex biological interactions and clinical mechanism and causations. However, the delineation of the possible associations and causalities of genes, proteins, metabolites, cells and other biological entities from high throughput time course omics data is challenging for which conventional experimental techniques are not suited in the big omics era. In this paper, we present various recently developed dynamic trajectory and causal network approaches for temporal omics data, which are extremely useful for those researchers who want to start working in this challenging research area. Moreover, applications to various biological systems, health conditions and disease status, and examples that summarize the state-of-the art performances depending on different specific mining tasks are presented. We critically discuss the merits, drawbacks and limitations of the approaches, and the associated main challenges for the years ahead. The most recent computing tools and software to analyze specific problem type, associated platform resources, and other potentials for the dynamic trajectory and interaction methods are also presented and discussed in detail.
Collapse
Affiliation(s)
- Yulan Liang
- Department of Family and Community Health, University of Maryland, Baltimore, MD 21201 USA
| | - Arpad Kelemen
- Department of Organizational Systems and Adult Health, University of Maryland, Baltimore, MD 21201 USA
| |
Collapse
|
19
|
Yang G, Wang L, Wang X. Reconstruction of Complex Directional Networks with Group Lasso Nonlinear Conditional Granger Causality. Sci Rep 2017; 7:2991. [PMID: 28592807 PMCID: PMC5462833 DOI: 10.1038/s41598-017-02762-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 04/18/2017] [Indexed: 12/19/2022] Open
Abstract
Reconstruction of networks underlying complex systems is one of the most crucial problems in many areas of engineering and science. In this paper, rather than identifying parameters of complex systems governed by pre-defined models or taking some polynomial and rational functions as a prior information for subsequent model selection, we put forward a general framework for nonlinear causal network reconstruction from time-series with limited observations. With obtaining multi-source datasets based on the data-fusion strategy, we propose a novel method to handle nonlinearity and directionality of complex networked systems, namely group lasso nonlinear conditional granger causality. Specially, our method can exploit different sets of radial basis functions to approximate the nonlinear interactions between each pair of nodes and integrate sparsity into grouped variables selection. The performance characteristic of our approach is firstly assessed with two types of simulated datasets from nonlinear vector autoregressive model and nonlinear dynamic models, and then verified based on the benchmark datasets from DREAM3 Challenge4. Effects of data size and noise intensity are also discussed. All of the results demonstrate that the proposed method performs better in terms of higher area under precision-recall curve.
Collapse
Affiliation(s)
- Guanxue Yang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, P. R. China
| | - Lin Wang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, P. R. China
| | - Xiaofan Wang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, P. R. China.
| |
Collapse
|
20
|
Carrizosa E, Olivares-Nadal AV, Ramírez-Cobo P. A sparsity-controlled vector autoregressive model. Biostatistics 2017; 18:244-259. [PMID: 27655816 DOI: 10.1093/biostatistics/kxw042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 08/07/2016] [Indexed: 11/13/2022] Open
Abstract
Vector autoregressive (VAR) models constitute a powerful and well studied tool to analyze multivariate time series. Since sparseness, crucial to identify and visualize joint dependencies and relevant causalities, is not expected to happen in the standard VAR model, several sparse variants have been introduced in the literature. However, in some cases it might be of interest to control some dimensions of the sparsity, as e.g. the number of causal features allowed in the prediction. To authors extent none of the existent methods endows the user with full control over the different aspects of the sparsity of the solution. In this article, we propose a versatile sparsity-controlled VAR model which enables a proper visualization of potential causalities while allows the user to control different dimensions of the sparsity if she holds some preferences regarding the sparsity of the outcome. The model coefficients are found as the solution to an optimization problem, solvable by standard numerical optimization routines. The tests performed on both simulated and real-life time series show that our approach may outperform a greedy algorithm and different Lasso approaches in terms of prediction errors and sparsity.
Collapse
|
21
|
Inference of Gene Regulatory Networks Using Bayesian Nonparametric Regression and Topology Information. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2017; 2017:8307530. [PMID: 28133490 PMCID: PMC5241943 DOI: 10.1155/2017/8307530] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Accepted: 11/24/2016] [Indexed: 11/17/2022]
Abstract
Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.
Collapse
|
22
|
Fujii C, Kuwahara H, Yu G, Guo L, Gao X. Learning gene regulatory networks from gene expression data using weighted consensus. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.02.087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
23
|
|
24
|
Furqan MS, Siyal MY. Elastic-Net Copula Granger Causality for Inference of Biological Networks. PLoS One 2016; 11:e0165612. [PMID: 27792750 PMCID: PMC5085021 DOI: 10.1371/journal.pone.0165612] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 10/15/2016] [Indexed: 12/13/2022] Open
Abstract
AIM In bioinformatics, the inference of biological networks is one of the most active research areas. It involves decoding various complex biological networks that are responsible for performing diverse functions in human body. Among these networks analysis, most of the research focus is towards understanding effective brain connectivity and gene networks in order to cure and prevent related diseases like Alzheimer and cancer respectively. However, with recent advances in data procurement technology, such as DNA microarray analysis and fMRI that can simultaneously process a large amount of data, it yields high-dimensional data sets. These high dimensional dataset analyses possess challenges for the analyst. BACKGROUND Traditional methods of Granger causality inference use ordinary least-squares methods for structure estimation, which confront dimensionality issues when applied to high-dimensional data. Apart from dimensionality issues, most existing methods were designed to capture only the linear inferences from time series data. METHOD AND CONCLUSION In this paper, we address the issues involved in assessing Granger causality for both linear and nonlinear high-dimensional data by proposing an elegant form of the existing LASSO-based method that we call "Elastic-Net Copula Granger causality". This method provides a more stable way to infer biological networks which has been verified using rigorous experimentation. We have compared the proposed method with the existing method and demonstrated that this new strategy outperforms the existing method on all measures: precision, false detection rate, recall, and F1 score. We have also applied both methods to real HeLa cell data and StarPlus fMRI datasets and presented a comparison of the effectiveness of both methods.
Collapse
Affiliation(s)
- Mohammad Shaheryar Furqan
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
- INFINITUS, Infocomm Centre of Excellence, Nanyang Technological University, Singapore, Singapore
| | - Mohammad Yakoob Siyal
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
25
|
Hira S, Deshpande PS. Mining precise cause and effect rules in large time series data of socio-economic indicators. SPRINGERPLUS 2016; 5:1625. [PMID: 27722044 PMCID: PMC5031588 DOI: 10.1186/s40064-016-3292-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Accepted: 09/11/2016] [Indexed: 11/21/2022]
Abstract
Discovery of cause–effect relationships, particularly in large databases of time-series is challenging because of continuous data of different characteristics and complex lagged relationships. In this paper, we have proposed a novel approach, to extract cause–effect relationships in large time series data set of socioeconomic indicators. The method enhances the scope of relationship discovery to cause–effect relationships by identifying multiple causal structures such as binary, transitive, many to one and cyclic. We use temporal association and temporal odds ratio to exclude noncausal association and to ensure the high reliability of discovered causal rules. We assess the method with both synthetic and real-world datasets. Our proposed method will help to build quantitative models to analyze socioeconomic processes by generating a precise cause–effect relationship between different economic indicators. The outcome shows that the proposed method can effectively discover existing causality structure in large time series databases.
Collapse
Affiliation(s)
- Swati Hira
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, 440010 Nagpur, India
| | - P S Deshpande
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, 440010 Nagpur, India
| |
Collapse
|
26
|
Furqan MS, Siyal MY. Inference of biological networks using Bi-directional Random Forest Granger causality. SPRINGERPLUS 2016; 5:514. [PMID: 27186478 PMCID: PMC4844585 DOI: 10.1186/s40064-016-2156-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2016] [Accepted: 04/13/2016] [Indexed: 11/10/2022]
Abstract
The standard ordinary least squares based Granger causality is one of the widely used methods for detecting causal interactions between time series data. However, recent developments in technology limit the utilization of some existing implementations due to the availability of high dimensional data. In this paper, we are proposing a technique called Bi-directional Random Forest Granger causality. This technique uses the random forest regularization together with the idea of reusing the time series data by reversing the time stamp to extract more causal information. We have demonstrated the effectiveness of our proposed method by applying it to simulated data and then applied it to two real biological datasets, i.e., fMRI and HeLa cell. fMRI data was used to map brain network involved in deductive reasoning while HeLa cell dataset was used to map gene network involved in cancer.
Collapse
Affiliation(s)
- Mohammad Shaheryar Furqan
- INFINITUS, Infocomm Centre of Excellence, Nanyang Technological University, Singapore, Singapore ; School of Electrical and Electronics Engineering, Nanyang Technological University, Singapore, Singapore
| | - Mohammad Yakoob Siyal
- School of Electrical and Electronics Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
27
|
Zhang Y, Xiao Y, Zhou D, Cai D. Granger causality analysis with nonuniform sampling and its application to pulse-coupled nonlinear dynamics. Phys Rev E 2016; 93:042217. [PMID: 27176303 DOI: 10.1103/physreve.93.042217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2015] [Indexed: 11/07/2022]
Abstract
The Granger causality (GC) analysis is an effective approach to infer causal relations for time series. However, for data obtained by uniform sampling (i.e., with an equal sampling time interval), it is known that GC can yield unreliable causal inference due to aliasing if the sampling rate is not sufficiently high. To solve this unreliability issue, we consider the nonuniform sampling scheme as it can mitigate against aliasing. By developing an unbiased estimation of power spectral density of nonuniformly sampled time series, we establish a framework of spectrum-based nonparametric GC analysis. Applying this framework to a general class of pulse-coupled nonlinear networks and utilizing some particular spectral structure possessed by these nonlinear network data, we demonstrate that, for such nonlinear networks with nonuniformly sampled data, reliable GC inference can be achieved at a low nonuniform mean sampling rate at which the traditional uniform sampling GC may lead to spurious causal inference.
Collapse
Affiliation(s)
- Yaoyu Zhang
- Department of Mathematics, MOE-LSC, and Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China
| | - Yanyang Xiao
- Department of Mathematics, MOE-LSC, and Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China
| | - Douglas Zhou
- Department of Mathematics, MOE-LSC, and Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China
| | - David Cai
- Department of Mathematics, MOE-LSC, and Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China.,Courant Institute of Mathematical Sciences and Center for Neural Science, New York University, New York, New York 10012, USA.,NYUAD Institute, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
| |
Collapse
|
28
|
Lee SG, Baek C. Adaptive lasso in sparse vector autoregressive models. KOREAN JOURNAL OF APPLIED STATISTICS 2016. [DOI: 10.5351/kjas.2016.29.1.027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
29
|
Tran HM, Bukkapatnam ST. Inferring sparse networks for noisy transient processes. Sci Rep 2016; 6:21963. [PMID: 26916813 PMCID: PMC4768174 DOI: 10.1038/srep21963] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Accepted: 02/03/2016] [Indexed: 12/29/2022] Open
Abstract
Inferring causal structures of real world complex networks from measured time series signals remains an open issue. The current approaches are inadequate to discern between direct versus indirect influences (i.e., the presence or absence of a directed arc connecting two nodes) in the presence of noise, sparse interactions, as well as nonlinear and transient dynamics of real world processes. We report a sparse regression (referred to as the l1-min) approach with theoretical bounds on the constraints on the allowable perturbation to recover the network structure that guarantees sparsity and robustness to noise. We also introduce averaging and perturbation procedures to further enhance prediction scores (i.e., reduce inference errors), and the numerical stability of l1-min approach. Extensive investigations have been conducted with multiple benchmark simulated genetic regulatory network and Michaelis-Menten dynamics, as well as real world data sets from DREAM5 challenge. These investigations suggest that our approach can significantly improve, oftentimes by 5 orders of magnitude over the methods reported previously for inferring the structure of dynamic networks, such as Bayesian network, network deconvolution, silencing and modular response analysis methods based on optimizing for sparsity, transients, noise and high dimensionality issues.
Collapse
Affiliation(s)
- Hoang M. Tran
- Department of Industrial & Systems Engineering, Texas A&M University, College Station, TX 77840, USA
- School of Applied Mathematics & Informatics, Hanoi University of Science & Technology, Hanoi, Vietnam
| | - Satish T.S. Bukkapatnam
- Department of Industrial & Systems Engineering, Texas A&M University, College Station, TX 77840, USA
| |
Collapse
|
30
|
Quantitative Analysis of Global Proteome and Lysine Acetylome Reveal the Differential Impacts of VPA and SAHA on HL60 Cells. Sci Rep 2016; 6:19926. [PMID: 26822725 PMCID: PMC4731804 DOI: 10.1038/srep19926] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 12/21/2015] [Indexed: 01/08/2023] Open
Abstract
Valproic acid (VPA) and suberoylanilide hydroxamic acid (SAHA) are both HDAC inhibitors (HDACi). Previous studies indicated that both inhibitors show therapeutic effects on acute myeloid leukaemia (AML), while the differential impacts of the two different HDACi on AML treatment still remains elusive. In this study, using 3-plex SILAC based quantitative proteomics technique, anti-acetyllysine antibody based affinity enrichment, high resolution LC-MS/MS and intensive bioinformatic analysis, the quantitative proteome and acetylome in SAHA and VPA treated AML HL60 cells were extensively studied. In total, 5,775 proteins and 1,124 lysine acetylation sites were successfully obtained in response to VAP and SAHA treatment. It is found that VPA and SAHA treatment differently induced proteome and acetylome profiling in AML HL60 cells. This study revealed the differential impacts of VPA and SAHA on proteome/acetylome in AML cells, deepening our understanding of HDAC inhibitor mediated AML therapeutics.
Collapse
|
31
|
Petralia F, Wang P, Yang J, Tu Z. Integrative random forest for gene regulatory network inference. Bioinformatics 2015; 31:i197-205. [PMID: 26072483 PMCID: PMC4542785 DOI: 10.1093/bioinformatics/btv268] [Citation(s) in RCA: 113] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Motivation: Gene regulatory network (GRN) inference based on genomic data is one of the most actively pursued computational biological problems. Because different types of biological data usually provide complementary information regarding the underlying GRN, a model that integrates big data of diverse types is expected to increase both the power and accuracy of GRN inference. Towards this goal, we propose a novel algorithm named iRafNet: integrative random forest for gene regulatory network inference. Results: iRafNet is a flexible, unified integrative framework that allows information from heterogeneous data, such as protein–protein interactions, transcription factor (TF)-DNA-binding, gene knock-down, to be jointly considered for GRN inference. Using test data from the DREAM4 and DREAM5 challenges, we demonstrate that iRafNet outperforms the original random forest based network inference algorithm (GENIE3), and is highly comparable to the community learning approach. We apply iRafNet to construct GRN in Saccharomyces cerevisiae and demonstrate that it improves the performance in predicting TF-target gene regulations and provides additional functional insights to the predicted gene regulations. Availability and implementation: The R code of iRafNet implementation and a tutorial are available at: http://research.mssm.edu/tulab/software/irafnet.html Contact:zhidong.tu@mssm.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Francesca Petralia
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Pei Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jialiang Yang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Zhidong Tu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
32
|
Pourzanjani A, Herzog ED, Petzold LR. On the Inference of Functional Circadian Networks Using Granger Causality. PLoS One 2015; 10:e0137540. [PMID: 26413748 PMCID: PMC4586144 DOI: 10.1371/journal.pone.0137540] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Accepted: 08/19/2015] [Indexed: 01/22/2023] Open
Abstract
Being able to infer one way direct connections in an oscillatory network such as the suprachiastmatic nucleus (SCN) of the mammalian brain using time series data is difficult but crucial to understanding network dynamics. Although techniques have been developed for inferring networks from time series data, there have been no attempts to adapt these techniques to infer directional connections in oscillatory time series, while accurately distinguishing between direct and indirect connections. In this paper an adaptation of Granger Causality is proposed that allows for inference of circadian networks and oscillatory networks in general called Adaptive Frequency Granger Causality (AFGC). Additionally, an extension of this method is proposed to infer networks with large numbers of cells called LASSO AFGC. The method was validated using simulated data from several different networks. For the smaller networks the method was able to identify all one way direct connections without identifying connections that were not present. For larger networks of up to twenty cells the method shows excellent performance in identifying true and false connections; this is quantified by an area-under-the-curve (AUC) 96.88%. We note that this method like other Granger Causality-based methods, is based on the detection of high frequency signals propagating between cell traces. Thus it requires a relatively high sampling rate and a network that can propagate high frequency signals.
Collapse
Affiliation(s)
- Arya Pourzanjani
- Department of Computer Science, University of California, Santa Barbara, Santa Barbara, California, United States of America
| | - Erik D. Herzog
- Department of Biology, Washington University, St. Louis, Missouri, United States of America
| | - Linda R. Petzold
- Department of Computer Science, University of California, Santa Barbara, Santa Barbara, California, United States of America
| |
Collapse
|
33
|
Yao S, Yoo S, Yu D. Prior knowledge driven Granger causality analysis on gene regulatory network discovery. BMC Bioinformatics 2015; 16:273. [PMID: 26316173 PMCID: PMC4551367 DOI: 10.1186/s12859-015-0710-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 08/17/2015] [Indexed: 12/20/2022] Open
Abstract
Background Our study focuses on discovering gene regulatory networks from time series gene expression data using the Granger causality (GC) model. However, the number of available time points (T) usually is much smaller than the number of target genes (n) in biological datasets. The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>>T. Results In this study, we proposed a new method, viz., CGC-2SPR (CGC using two-step prior Ridge regularization) to resolve the problem by incorporating prior biological knowledge about a target gene data set. In our simulation experiments, the propose new methodology CGC-2SPR showed significant performance improvement in terms of accuracy over other widely used GC modeling (PGC, Ridge and Lasso) and MI-based (MRNET and ARACNE) methods. In addition, we applied CGC-2SPR to a real biological dataset, i.e., the yeast metabolic cycle, and discovered more true positive edges with CGC-2SPR than with the other existing methods. Conclusions In our research, we noticed a “ 1+1>2” effect when we combined prior knowledge and gene expression data to discover regulatory networks. Based on causality networks, we made a functional prediction that the Abm1 gene (its functions previously were unknown) might be related to the yeast’s responses to different levels of glucose. Our research improves causality modeling by combining heterogeneous knowledge, which is well aligned with the future direction in system biology. Furthermore, we proposed a method of Monte Carlo significance estimation (MCSE) to calculate the edge significances which provide statistical meanings to the discovered causality networks. All of our data and source codes will be available under the link https://bitbucket.org/dtyu/granger-causality/wiki/Home.
Collapse
Affiliation(s)
- Shun Yao
- Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, 11790, NY, USA. .,Computational Science Center, Brookhaven National Laboratory, Upton, 11793, NY, USA.
| | - Shinjae Yoo
- Computational Science Center, Brookhaven National Laboratory, Upton, 11793, NY, USA.
| | - Dantong Yu
- Computational Science Center, Brookhaven National Laboratory, Upton, 11793, NY, USA.
| |
Collapse
|
34
|
Ip EH, Zhang Q, Sowinski T, Simpson SL. Analysis of Feedback Mechanisms with Unknown Delay Using Sparse Multivariate Autoregressive Method. PLoS One 2015; 10:e0131371. [PMID: 26252637 PMCID: PMC4529169 DOI: 10.1371/journal.pone.0131371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Accepted: 06/01/2015] [Indexed: 11/19/2022] Open
Abstract
This paper discusses the study of two interacting processes in which a feedback mechanism exists between the processes. The study was motivated by problems such as the circadian oscillation of gene expression where two interacting protein transcriptions form both negative and positive feedback loops with long delays to equilibrium. Traditionally, data of this type could be examined using autoregressive analysis. However, in circadian oscillation the order of an autoregressive model cannot be determined a priori. We propose a sparse multivariate autoregressive method that incorporates mixed linear effects into regression analysis, and uses a forward-backward greedy search algorithm to select non-zero entries in the regression coefficients, the number of which is constrained not to exceed a pre-specified number. A small simulation study provides preliminary evidence of the validity of the method. Besides the circadian oscillation example, an additional example of blood pressure variations using data from an intervention study is used to illustrate the method and the interpretation of the results obtained from the sparse matrix method. These applications demonstrate how sparse representation can be used for handling high dimensional variables that feature dynamic, reciprocal relationships.
Collapse
Affiliation(s)
- Edward H. Ip
- Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Qiang Zhang
- Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Tomasz Sowinski
- School of Information Sciences, University of Pittsburgh, Pennsylvania, United States of America
| | - Sean L. Simpson
- Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| |
Collapse
|
35
|
Inferring Broad Regulatory Biology from Time Course Data: Have We Reached an Upper Bound under Constraints Typical of In Vivo Studies? PLoS One 2015; 10:e0127364. [PMID: 25984725 PMCID: PMC4435750 DOI: 10.1371/journal.pone.0127364] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 04/13/2015] [Indexed: 12/21/2022] Open
Abstract
There is a growing appreciation for the network biology that regulates the coordinated expression of molecular and cellular markers however questions persist regarding the identifiability of these networks. Here we explore some of the issues relevant to recovering directed regulatory networks from time course data collected under experimental constraints typical of in vivo studies. NetSim simulations of sparsely connected biological networks were used to evaluate two simple feature selection techniques used in the construction of linear Ordinary Differential Equation (ODE) models, namely truncation of terms versus latent vector projection. Performance was compared with ODE-based Time Series Network Identification (TSNI) integral, and the information-theoretic Time-Delay ARACNE (TD-ARACNE). Projection-based techniques and TSNI integral outperformed truncation-based selection and TD-ARACNE on aggregate networks with edge densities of 10-30%, i.e. transcription factor, protein-protein cliques and immune signaling networks. All were more robust to noise than truncation-based feature selection. Performance was comparable on the in silico 10-node DREAM 3 network, a 5-node Yeast synthetic network designed for In vivo Reverse-engineering and Modeling Assessment (IRMA) and a 9-node human HeLa cell cycle network of similar size and edge density. Performance was more sensitive to the number of time courses than to sample frequency and extrapolated better to larger networks by grouping experiments. In all cases performance declined rapidly in larger networks with lower edge density. Limited recovery and high false positive rates obtained overall bring into question our ability to generate informative time course data rather than the design of any particular reverse engineering algorithm.
Collapse
|
36
|
Liu ZP. Reverse Engineering of Genome-wide Gene Regulatory Networks from Gene Expression Data. Curr Genomics 2015; 16:3-22. [PMID: 25937810 PMCID: PMC4412962 DOI: 10.2174/1389202915666141110210634] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2014] [Revised: 09/05/2014] [Accepted: 09/05/2014] [Indexed: 12/17/2022] Open
Abstract
Transcriptional regulation plays vital roles in many fundamental biological processes. Reverse engineering of genome-wide regulatory networks from high-throughput transcriptomic data provides a promising way to characterize the global scenario of regulatory relationships between regulators and their targets. In this review, we summarize and categorize the main frameworks and methods currently available for inferring transcriptional regulatory networks from microarray gene expression profiling data. We overview each of strategies and introduce representative methods respectively. Their assumptions, advantages, shortcomings, and possible improvements and extensions are also clarified and commented.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
37
|
Cassidy B, Rae C, Solo V. Brain activity: connectivity, sparsity, and mutual information. IEEE TRANSACTIONS ON MEDICAL IMAGING 2015; 34:846-860. [PMID: 25252277 DOI: 10.1109/tmi.2014.2358681] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
We develop a new approach to functional brain connectivity analysis, which deals with four fundamental aspects of connectivity not previously jointly treated. These are: temporal correlation, spurious spatial correlation, sparsity, and network construction using trajectory (as opposed to marginal) Mutual Information. We call the new method Sparse Conditional Trajectory Mutual Information (SCoTMI). We demonstrate SCoTMI on simulated and real fMRI data, showing that SCoTMI gives more accurate and more repeatable detection of network links than competing network estimation methods.
Collapse
|
38
|
Vinck M, Huurdeman L, Bosman CA, Fries P, Battaglia FP, Pennartz CM, Tiesinga PH. How to detect the Granger-causal flow direction in the presence of additive noise? Neuroimage 2015; 108:301-18. [DOI: 10.1016/j.neuroimage.2014.12.017] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Revised: 11/19/2014] [Accepted: 12/05/2014] [Indexed: 10/24/2022] Open
|
39
|
Basu S, Shojaie A, Michailidis G. Network Granger Causality with Inherent Grouping Structure. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2015; 16:417-453. [PMID: 34267606 PMCID: PMC8278320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The problem of estimating high-dimensional network models arises naturally in the analysis of many biological and socio-economic systems. In this work, we aim to learn a network structure from temporal panel data, employing the framework of Granger causal models under the assumptions of sparsity of its edges and inherent grouping structure among its nodes. To that end, we introduce a group lasso regression regularization framework, and also examine a thresholded variant to address the issue of group misspecification. Further, the norm consistency and variable selection consistency of the estimates are established, the latter under the novel concept of direction consistency. The performance of the proposed methodology is assessed through an extensive set of simulation studies and comparisons with existing techniques. The study is illustrated on two motivating examples coming from functional genomics and financial econometrics.
Collapse
Affiliation(s)
- Sumanta Basu
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1092, USA
| | - Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - George Michailidis
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1092, USA
| |
Collapse
|
40
|
CaSPIAN: a causal compressive sensing algorithm for discovering directed interactions in gene networks. PLoS One 2014; 9:e90781. [PMID: 24622336 PMCID: PMC3951243 DOI: 10.1371/journal.pone.0090781] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2013] [Accepted: 02/05/2014] [Indexed: 11/21/2022] Open
Abstract
We introduce a novel algorithm for inference of causal gene interactions, termed CaSPIAN (Causal Subspace Pursuit for Inference and Analysis of Networks), which is based on coupling compressive sensing and Granger causality techniques. The core of the approach is to discover sparse linear dependencies between shifted time series of gene expressions using a sequential list-version of the subspace pursuit reconstruction algorithm and to estimate the direction of gene interactions via Granger-type elimination. The method is conceptually simple and computationally efficient, and it allows for dealing with noisy measurements. Its performance as a stand-alone platform without biological side-information was tested on simulated networks, on the synthetic IRMA network in Saccharomyces cerevisiae, and on data pertaining to the human HeLa cell network and the SOS network in E. coli. The results produced by CaSPIAN are compared to the results of several related algorithms, demonstrating significant improvements in inference accuracy of documented interactions. These findings highlight the importance of Granger causality techniques for reducing the number of false-positives, as well as the influence of noise and sampling period on the accuracy of the estimates. In addition, the performance of the method was tested in conjunction with biological side information of the form of sparse “scaffold networks”, to which new edges were added using available RNA-seq or microarray data. These biological priors aid in increasing the sensitivity and precision of the algorithm in the small sample regime.
Collapse
|
41
|
Integrative genomics with mediation analysis in a survival context. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013; 2013:413783. [PMID: 24454535 PMCID: PMC3878392 DOI: 10.1155/2013/413783] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Accepted: 09/23/2013] [Indexed: 12/25/2022]
Abstract
DNA copy number aberrations (DCNA) and subsequent altered gene expression profiles may have a major impact on tumor initiation, on development, and eventually on recurrence and cancer-specific mortality. However, most methods employed in integrative genomic analysis of the two biological levels, DNA and RNA, do not consider survival time. In the present note, we propose the adoption of a survival analysis-based framework for the integrative analysis of DCNA and mRNA levels to reveal their implication on patient clinical outcome with the prerequisite that the effect of DCNA on survival is mediated by mRNA levels. The specific aim of the paper is to offer a feasible framework to test the DCNA-mRNA-survival pathway. We provide statistical inference algorithms for mediation based on asymptotic results. Furthermore, we illustrate the applicability of the method in an integrative genomic analysis setting by using a breast cancer data set consisting of 141 invasive breast tumors. In addition, we provide implementation in R.
Collapse
|
42
|
Michailidis G, d'Alché-Buc F. Autoregressive models for gene regulatory network inference: sparsity, stability and causality issues. Math Biosci 2013; 246:326-34. [PMID: 24176667 DOI: 10.1016/j.mbs.2013.10.003] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2013] [Revised: 10/09/2013] [Accepted: 10/14/2013] [Indexed: 10/26/2022]
Abstract
Reconstructing gene regulatory networks from high-throughput measurements represents a key problem in functional genomics. It also represents a canonical learning problem and thus has attracted a lot of attention in both the informatics and the statistical learning literature. Numerous approaches have been proposed, ranging from simple clustering to rather involved dynamic Bayesian network modeling, as well as hybrid ones that combine a number of modeling steps, such as employing ordinary differential equations coupled with genome annotation. These approaches are tailored to the type of data being employed. Available data sources include static steady state data and time course data obtained either for wild type phenotypes or from perturbation experiments. This review focuses on the class of autoregressive models using time course data for inferring gene regulatory networks. The central themes of sparsity, stability and causality are discussed as well as the ability to integrate prior knowledge for successful use of these models for the learning task at hand.
Collapse
Affiliation(s)
- George Michailidis
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1107, USA
| | | |
Collapse
|
43
|
Tam GHF, Chang C, Hung YS. Gene regulatory network discovery using pairwise Granger causality. IET Syst Biol 2013; 7:195-204. [PMID: 24067420 PMCID: PMC8687252 DOI: 10.1049/iet-syb.2012.0063] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2012] [Revised: 05/16/2013] [Accepted: 06/19/2013] [Indexed: 08/12/2023] Open
Abstract
Discovery of gene regulatory network from gene expression data can yield a useful insight to drug development. Among the methods applied to time‐series data, Granger causality (GC) has emerged as a powerful tool with several merits. Since gene expression data usually have a much larger number of genes than time points therefore a full model cannot be applied in a straightforward manner, GC is often applied to genes pairwisely. In this study, the authors first investigate with synthetic data how spurious causalities (false discoveries) may arise because of the use of pairwise rather than full‐model GC detection. Furthermore, spurious causalities may also arise if the order of the vector autoregressive model is not high enough. As a remedy, the authors demonstrate that model validation techniques can effectively reduce the number of false discoveries. Then, they apply pairwise GC with model validation to the real human HeLa cell‐cycle dataset. They find that Akaike information criterion is generally most suitable for determining model order, but precaution should be taken for extremely short time series. With the authors proposed implementation, degree distributions and network hubs are obtained and compared with existing results, giving a new observation that the hubs tend to act as sources rather than receivers of interactions.
Collapse
Affiliation(s)
- Gary Hak Fui Tam
- Department of Electrical and Electronic EngineeringThe University of Hong KongPokfulam RoadHong KongChina
| | - Chunqi Chang
- School of Electronic and Information Engineering, Soochow UniversitySuzhouJiangsu ProvinceChina
| | - Yeung Sam Hung
- Department of Electrical and Electronic EngineeringThe University of Hong KongPokfulam RoadHong KongChina
| |
Collapse
|
44
|
ElBakry O, Ahmad MO, Swamy MNS. Inference of gene regulatory networks with variable time delay from time-series microarray data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:671-687. [PMID: 24091400 DOI: 10.1109/tcbb.2013.73] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Regulatory interactions among genes and gene products are dynamic processes and hence modeling these processes is of great interest. Since genes work in a cascade of networks, reconstruction of gene regulatory network (GRN) is a crucial process for a thorough understanding of the underlying biological interactions. We present here an approach based on pairwise correlations and lasso to infer the GRN, taking into account the variable time delays between various genes. The proposed method is applied to both synthetic and real data sets, and the results on synthetic data show that the proposed approach outperforms the current methods. Further, the results using real data are more consistent with the existing knowledge concerning the possible gene interactions.
Collapse
|
45
|
Cai R, Zhang Z, Hao Z. Causal gene identification using combinatorial V-structure search. Neural Netw 2013; 43:63-71. [PMID: 23500501 DOI: 10.1016/j.neunet.2013.01.025] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Revised: 01/23/2013] [Accepted: 01/31/2013] [Indexed: 10/27/2022]
Abstract
With the advances of biomedical techniques in the last decade, the costs of human genomic sequencing and genomic activity monitoring are coming down rapidly. To support the huge genome-based business in the near future, researchers are eager to find killer applications based on human genome information. Causal gene identification is one of the most promising applications, which may help the potential patients to estimate the risk of certain genetic diseases and locate the target gene for further genetic therapy. Unfortunately, existing pattern recognition techniques, such as Bayesian networks, cannot be directly applied to find the accurate causal relationship between genes and diseases. This is mainly due to the insufficient number of samples and the extremely high dimensionality of the gene space. In this paper, we present the first practical solution to causal gene identification, utilizing a new combinatorial formulation over V-Structures commonly used in conventional Bayesian networks, by exploring the combinations of significant V-Structures. We prove the NP-hardness of the combinatorial search problem under a general settings on the significance measure on the V-Structures, and present a greedy algorithm to find sub-optimal results. Extensive experiments show that our proposal is both scalable and effective, particularly with interesting findings on the causal genes over real human genome data.
Collapse
Affiliation(s)
- Ruichu Cai
- Faculty of Computer Science, Guangdong University of Technology, Guangzhou, PR China.
| | | | | |
Collapse
|
46
|
|
47
|
Functional clustering of time series gene expression data by Granger causality. BMC SYSTEMS BIOLOGY 2012; 6:137. [PMID: 23107425 PMCID: PMC3573927 DOI: 10.1186/1752-0509-6-137] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Accepted: 10/17/2012] [Indexed: 12/04/2022]
Abstract
Background A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.
Collapse
|
48
|
Curtis RE, Xiang J, Parikh A, Kinnaird P, Xing EP. Enabling dynamic network analysis through visualization in TVNViewer. BMC Bioinformatics 2012; 13:204. [PMID: 22897913 PMCID: PMC3447684 DOI: 10.1186/1471-2105-13-204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2012] [Accepted: 07/20/2012] [Indexed: 11/20/2022] Open
Abstract
Background Many biological processes are context-dependent or temporally specific. As a result, relationships between molecular constituents evolve across time and environments. While cutting-edge machine learning techniques can recover these networks, exploring and interpreting the rewiring behavior is challenging. Information visualization shines in this type of exploratory analysis, motivating the development ofTVNViewer (http://sailing.cs.cmu.edu/tvnviewer), a visualization tool for dynamic network analysis. Results In this paper, we demonstrate visualization techniques for dynamic network analysis by using TVNViewer to analyze yeast cell cycle and breast cancer progression datasets. Conclusions TVNViewer is a powerful new visualization tool for the analysis of biological networks that change across time or space.
Collapse
Affiliation(s)
- Ross E Curtis
- Joint Carnegie Mellon, University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA
| | | | | | | | | |
Collapse
|
49
|
Abstract
Estimating conditional dependence between two random variables given the knowledge of a third random variable is essential in neuroscientific applications to understand the causal architecture of a distributed network. However, existing methods of assessing conditional dependence, such as the conditional mutual information, are computationally expensive, involve free parameters, and are difficult to understand in the context of realizations. In this letter, we discuss a novel approach to this problem and develop a computationally simple and parameter-free estimator. The difference between the proposed approach and the existing ones is that the former expresses conditional dependence in terms of a finite set of realizations, whereas the latter use random variables, which are not available in practice. We call this approach conditional association, since it is based on a generalization of the concept of association to arbitrary metric spaces. We also discuss a novel and computationally efficient approach of generating surrogate data for evaluating the significance of the acquired association value.
Collapse
Affiliation(s)
- Sohan Seth
- Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32608, USA.
| | | |
Collapse
|
50
|
Shojaie A, Basu S, Michailidis G. Adaptive Thresholding for Reconstructing Regulatory Networks from Time-Course Gene Expression Data. STATISTICS IN BIOSCIENCES 2011. [DOI: 10.1007/s12561-011-9050-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|