1
|
Wang Y, Zheng P, Cheng YC, Wang Z, Aravkin A. WENDY: Covariance dynamics based gene regulatory network inference. Math Biosci 2024; 377:109284. [PMID: 39168402 DOI: 10.1016/j.mbs.2024.109284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/25/2024] [Accepted: 08/16/2024] [Indexed: 08/23/2024]
Abstract
Determining gene regulatory network (GRN) structure is a central problem in biology, with a variety of inference methods available for different types of data. For a widely prevalent and challenging use case, namely single-cell gene expression data measured after intervention at multiple time points with unknown joint distributions, there is only one known specifically developed method, which does not fully utilize the rich information contained in this data type. We develop an inference method for the GRN in this case, netWork infErence by covariaNce DYnamics, dubbed WENDY. The core idea of WENDY is to model the dynamics of the covariance matrix, and solve this dynamics as an optimization problem to determine the regulatory relationships. To evaluate its effectiveness, we compare WENDY with other inference methods using synthetic data and experimental data. Our results demonstrate that WENDY performs well across different data sets.
Collapse
Affiliation(s)
- Yue Wang
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, 10027, NY, USA.
| | - Peng Zheng
- Institute for Health Metrics and Evaluation, Seattle, 98195, WA, USA; Department of Health Metrics Sciences, University of Washington, Seattle, 98195, WA, USA
| | - Yu-Chen Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA; Center for Cancer Evolution, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Zikun Wang
- Laboratory of Genetics, The Rockefeller University, New York, 10065, NY, USA
| | - Aleksandr Aravkin
- Department of Applied Mathematics, University of Washington, Seattle, 98195, WA, USA
| |
Collapse
|
2
|
Ribeiro AH, Vidal MC, Sato JR, Fujita A. Granger Causality among Graphs and Application to Functional Brain Connectivity in Autism Spectrum Disorder. ENTROPY (BASEL, SWITZERLAND) 2021; 23:1204. [PMID: 34573829 PMCID: PMC8465687 DOI: 10.3390/e23091204] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 09/06/2021] [Accepted: 09/08/2021] [Indexed: 11/28/2022]
Abstract
Graphs/networks have become a powerful analytical approach for data modeling. Besides, with the advances in sensor technology, dynamic time-evolving data have become more common. In this context, one point of interest is a better understanding of the information flow within and between networks. Thus, we aim to infer Granger causality (G-causality) between networks' time series. In this case, the straightforward application of the well-established vector autoregressive model is not feasible. Consequently, we require a theoretical framework for modeling time-varying graphs. One possibility would be to consider a mathematical graph model with time-varying parameters (assumed to be random variables) that generates the network. Suppose we identify G-causality between the graph models' parameters. In that case, we could use it to define a G-causality between graphs. Here, we show that even if the model is unknown, the spectral radius is a reasonable estimate of some random graph model parameters. We illustrate our proposal's application to study the relationship between brain hemispheres of controls and children diagnosed with Autism Spectrum Disorder (ASD). We show that the G-causality intensity from the brain's right to the left hemisphere is different between ASD and controls.
Collapse
Affiliation(s)
| | - Maciel Calebe Vidal
- Insper Institute of Education and Research, São Paulo 04546-042, SP, Brazil;
| | - João Ricardo Sato
- Center of Mathematics, Computing and Cognition, Universidade Federal do ABC, Santo André 09210-580, SP, Brazil;
| | - André Fujita
- Institute of Mathematics and Statistics, University of São Paulo, São Paulo 05508-090, SP, Brazil
| |
Collapse
|
3
|
Furqan MS, Siyal MY. Elastic-Net Copula Granger Causality for Inference of Biological Networks. PLoS One 2016; 11:e0165612. [PMID: 27792750 PMCID: PMC5085021 DOI: 10.1371/journal.pone.0165612] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 10/15/2016] [Indexed: 12/13/2022] Open
Abstract
AIM In bioinformatics, the inference of biological networks is one of the most active research areas. It involves decoding various complex biological networks that are responsible for performing diverse functions in human body. Among these networks analysis, most of the research focus is towards understanding effective brain connectivity and gene networks in order to cure and prevent related diseases like Alzheimer and cancer respectively. However, with recent advances in data procurement technology, such as DNA microarray analysis and fMRI that can simultaneously process a large amount of data, it yields high-dimensional data sets. These high dimensional dataset analyses possess challenges for the analyst. BACKGROUND Traditional methods of Granger causality inference use ordinary least-squares methods for structure estimation, which confront dimensionality issues when applied to high-dimensional data. Apart from dimensionality issues, most existing methods were designed to capture only the linear inferences from time series data. METHOD AND CONCLUSION In this paper, we address the issues involved in assessing Granger causality for both linear and nonlinear high-dimensional data by proposing an elegant form of the existing LASSO-based method that we call "Elastic-Net Copula Granger causality". This method provides a more stable way to infer biological networks which has been verified using rigorous experimentation. We have compared the proposed method with the existing method and demonstrated that this new strategy outperforms the existing method on all measures: precision, false detection rate, recall, and F1 score. We have also applied both methods to real HeLa cell data and StarPlus fMRI datasets and presented a comparison of the effectiveness of both methods.
Collapse
Affiliation(s)
- Mohammad Shaheryar Furqan
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
- INFINITUS, Infocomm Centre of Excellence, Nanyang Technological University, Singapore, Singapore
| | - Mohammad Yakoob Siyal
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
4
|
A tutorial to identify nonlinear associations in gene expression time series data. Methods Mol Biol 2014; 1164:87-95. [PMID: 24927837 DOI: 10.1007/978-1-4939-0805-9_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2023]
Abstract
The study of gene regulatory networks is the basis to understand the biological complexity of several diseases and/or cell states. It has become the core of research in the field of systems biology. Several mathematical methods have been developed in the last decade, especially in the analysis of time series gene expression data derived from microarrays and sequencing-based methods. Most of the models available in the literature assumes linear associations among genes and do not infer directionality in these connections or uses a priori biological knowledge to set the directionality. However, in several cases, a priori biological information is not available. In this context, we describe a statistical method, namely nonlinear vector autoregressive model to estimate nonlinear relationships and also to infer directionality at the edges of the network by using the temporal information of the time series gene expression data without a priori biological information.
Collapse
|
5
|
Functional clustering of time series gene expression data by Granger causality. BMC SYSTEMS BIOLOGY 2012; 6:137. [PMID: 23107425 PMCID: PMC3573927 DOI: 10.1186/1752-0509-6-137] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Accepted: 10/17/2012] [Indexed: 12/04/2022]
Abstract
Background A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.
Collapse
|
6
|
Kojima K, Imoto S, Yamaguchi R, Fujita A, Yamauchi M, Gotoh N, Miyano S. Identifying regulational alterations in gene regulatory networks by state space representation of vector autoregressive models and variational annealing. BMC Genomics 2012; 13 Suppl 1:S6. [PMID: 22369122 PMCID: PMC3587380 DOI: 10.1186/1471-2164-13-s1-s6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background In the analysis of effects by cell treatment such as drug dosing, identifying changes on gene network structures between normal and treated cells is a key task. A possible way for identifying the changes is to compare structures of networks estimated from data on normal and treated cells separately. However, this approach usually fails to estimate accurate gene networks due to the limited length of time series data and measurement noise. Thus, approaches that identify changes on regulations by using time series data on both conditions in an efficient manner are demanded. Methods We propose a new statistical approach that is based on the state space representation of the vector autoregressive model and estimates gene networks on two different conditions in order to identify changes on regulations between the conditions. In the mathematical model of our approach, hidden binary variables are newly introduced to indicate the presence of regulations on each condition. The use of the hidden binary variables enables an efficient data usage; data on both conditions are used for commonly existing regulations, while for condition specific regulations corresponding data are only applied. Also, the similarity of networks on two conditions is automatically considered from the design of the potential function for the hidden binary variables. For the estimation of the hidden binary variables, we derive a new variational annealing method that searches the configuration of the binary variables maximizing the marginal likelihood. Results For the performance evaluation, we use time series data from two topologically similar synthetic networks, and confirm that our proposed approach estimates commonly existing regulations as well as changes on regulations with higher coverage and precision than other existing approaches in almost all the experimental settings. For a real data application, our proposed approach is applied to time series data from normal Human lung cells and Human lung cells treated by stimulating EGF-receptors and dosing an anticancer drug termed Gefitinib. In the treated lung cells, a cancer cell condition is simulated by the stimulation of EGF-receptors, but the effect would be counteracted due to the selective inhibition of EGF-receptors by Gefitinib. However, gene expression profiles are actually different between the conditions, and the genes related to the identified changes are considered as possible off-targets of Gefitinib. Conclusions From the synthetically generated time series data, our proposed approach can identify changes on regulations more accurately than existing methods. By applying the proposed approach to the time series data on normal and treated Human lung cells, candidates of off-target genes of Gefitinib are found. According to the published clinical information, one of the genes can be related to a factor of interstitial pneumonia, which is known as a side effect of Gefitinib.
Collapse
Affiliation(s)
- Kaname Kojima
- Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | | | | | | | | | | | | |
Collapse
|
7
|
Wong L. BRIEF INTRODUCTION TO SOME NEW RESULTS IN GENE EXPRESSION ANALYSIS, SYSTEMS BIOLOGY MODELING, MOTIF IDENTIFICATION, AND (NONCODING) RNA ANALYSIS. J Bioinform Comput Biol 2011. [DOI: 10.1142/s0219720010005026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
8
|
Wu GR, Chen F, Kang D, Zhang X, Marinazzo D, Chen H. Multiscale causal connectivity analysis by canonical correlation: theory and application to epileptic brain. IEEE Trans Biomed Eng 2011; 58:3088-96. [PMID: 21788178 DOI: 10.1109/tbme.2011.2162669] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Multivariate Granger causality is a well-established approach for inferring information flow in complex systems, and it is being increasingly applied to map brain connectivity. Traditional Granger causality is based on vector autoregressive (AR) or mixed autoregressive moving average (ARMA) model, which are potentially affected by errors in parameter estimation and may be contaminated by zero-lag correlation, notably when modeling neuroimaging data. To overcome this issue, we present here an extended canonical correlation approach to measure multivariate Granger causal interactions among time series. The procedure includes a reduced rank step for calculating canonical correlation analysis (CCA), and extends the definition of causality including instantaneous effects, thus avoiding the potential estimation problems of AR (or ARMA) models. We tested this approach on simulated data and confirmed its practical utility by exploring local network connectivity at different scales in the epileptic brain analyzing scalp and depth-EEG data during an interictal period.
Collapse
Affiliation(s)
- Guo Rong Wu
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.
| | | | | | | | | | | |
Collapse
|
9
|
Fujita A, Kojima K, Patriota AG, Sato JR, Severino P, Miyano S. A fast and robust statistical test based on likelihood ratio with Bartlett correction to identify Granger causality between gene sets. ACTA ACUST UNITED AC 2010; 26:2349-51. [PMID: 20660295 DOI: 10.1093/bioinformatics/btq427] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
UNLABELLED We propose a likelihood ratio test (LRT) with Bartlett correction in order to identify Granger causality between sets of time series gene expression data. The performance of the proposed test is compared to a previously published bootstrap-based approach. LRT is shown to be significantly faster and statistically powerful even within non-Normal distributions. An R package named gGranger containing an implementation for both Granger causality identification tests is also provided. AVAILABILITY http://dnagarden.ims.u-tokyo.ac.jp/afujita/en/doku.php?id=ggranger.
Collapse
Affiliation(s)
- André Fujita
- Computational Science Research Program, RIKEN, Wako, Saitama, Japan.
| | | | | | | | | | | |
Collapse
|
10
|
Granger Causality in Systems Biology: Modeling Gene Networks in Time Series Microarray Data Using Vector Autoregressive Models. ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY 2010. [DOI: 10.1007/978-3-642-15060-9_2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|