1
|
Gill JK, Chetty M, Lim S, Hallinan J. BioBERT based text mining for incorporating prior knowledge in the inference of genetic network models. Comput Biol Med 2025; 186:109623. [PMID: 39753024 DOI: 10.1016/j.compbiomed.2024.109623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Revised: 12/03/2024] [Accepted: 12/23/2024] [Indexed: 02/20/2025]
Abstract
Reconstruction of Gene Regulatory Networks (GRNs) is essential for understanding gene interactions, their impact on cellular processes, and manifestation of diseases, including drug discovery. Among various mathematical and dynamic models used for GRN reconstruction, S-system model, comprising non-linear differential equations, is widely utilised to capture the behaviour of complex biological systems with non-linear and time-dependent interactions. However, as the network size increases, computational demand for network inference grows due to a greater number of estimation parameters, significantly impacting the performance of optimisation algorithms. Incorporating biologically relevant prior knowledge using advanced Natural Language Processing methods can effectively address this limitation by reducing the need for computing large parameters, thereby enhancing speed and accuracy. In this study, we introduce PRESS, an integrated Prior Knowledge Enhanced S-system model for accurate GRN reconstructions, which seamlessly automates the incorporation of prior knowledge obtained through systematic extraction from published literature. PRESS exploits our recently reported BioBERT-based Gene Interaction Extraction Framework with enhanced targeted genetic relation extraction and the prediction of regulatory genes. Effectiveness of the optimisation algorithm in learning model parameters is further enhanced through a novel fitness evaluation, which limits the maximum number of regulatory genes to mimic real GRNs. This integrated method, combining a robust relation extraction framework for automated prior knowledge with a GRN reconstruction model, is novel and has not been reported previously. Experimental results obtained using Escherichia coli subnetworks and the benchmark SOS dataset demonstrate substantial reductions in computational cost while simultaneously improving prediction accuracy.
Collapse
Affiliation(s)
- Jaskaran Kaur Gill
- Health Innovation and Transformation Centre, Federation University, Victoria, 3842, Australia.
| | - Madhu Chetty
- Health Innovation and Transformation Centre, Federation University, Victoria, 3842, Australia
| | - Suryani Lim
- Health Innovation and Transformation Centre, Federation University, Victoria, 3842, Australia
| | - Jennifer Hallinan
- Health Innovation and Transformation Centre, Federation University, Victoria, 3842, Australia; BioThink, Queensland, 4020, Australia
| |
Collapse
|
2
|
Yang G, Lei S, Yang G. Robust Model-Free Identification of the Causal Networks Underlying Complex Nonlinear Systems. ENTROPY (BASEL, SWITZERLAND) 2024; 26:1063. [PMID: 39766692 PMCID: PMC11675911 DOI: 10.3390/e26121063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2024] [Revised: 11/28/2024] [Accepted: 11/30/2024] [Indexed: 01/11/2025]
Abstract
Inferring causal networks from noisy observations is of vital importance in various fields. Due to the complexity of system modeling, the way in which universal and feasible inference algorithms are studied is a key challenge for network reconstruction. In this study, without any assumptions, we develop a novel model-free framework to uncover only the direct relationships in networked systems from observations of their nonlinear dynamics. Our proposed methods are termed multiple-order Polynomial Conditional Granger Causality (PCGC) and sparse PCGC (SPCGC). PCGC mainly adopts polynomial functions to approximate the whole system model, which can be used to judge the interactions among nodes through subsequent nonlinear Granger causality analysis. For SPCGC, Lasso optimization is first used for dimension reduction, and then PCGC is executed to obtain the final network. Specifically, the conditional variables are fused in this general, model-free framework regardless of their formulations in the system model, which could effectively reconcile the inference of direct interactions with an indirect influence. Based on many classical dynamical systems, the performances of PCGC and SPCGC are analyzed and verified. Generally, the proposed framework could be quite promising for the provision of certain guidance for data-driven modeling with an unknown model.
Collapse
Affiliation(s)
- Guanxue Yang
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China;
| | - Shimin Lei
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China;
| | - Guanxiao Yang
- College of Automation, Jiangsu University of Science and Technology, Zhenjiang 212100, China;
| |
Collapse
|
3
|
Yang G, Hu W, He L, Dou L. Nonlinear causal network learning via Granger causality based on extreme support vector regression. CHAOS (WOODBURY, N.Y.) 2024; 34:023127. [PMID: 38377295 DOI: 10.1063/5.0183537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 01/22/2024] [Indexed: 02/22/2024]
Abstract
For complex networked systems, based on the consideration of nonlinearity and causality, a novel general method of nonlinear causal network learning, termed extreme support vector regression Granger causality (ESVRGC), is proposed. The nonuniform time-delayed influence of the driving nodes on the target node is particularly considered. Then, the restricted model and the unrestricted model of Granger causality are, respectively, formulated based on extreme support vector regression, which uses the selected time-delayed components of system variables as the inputs of kernel functions. The nonlinear conditional Granger causality index is finally calculated to confirm the strength of a causal interaction. Generally, based on the simulation of a nonlinear vector autoregressive model and nonlinear discrete time-delayed dynamic systems, ESVRGC demonstrates better performance than other popular methods. Also, the validity and robustness of ESVRGC are also verified by the different cases of network types, sample sizes, noise intensities, and coupling strengths. Finally, the superiority of ESVRGC is successful verified by the experimental study on real benchmark datasets.
Collapse
Affiliation(s)
- Guanxue Yang
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China
| | - Weiwei Hu
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China
| | - Lidong He
- School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Liya Dou
- Department of Automation, Beijing University of Chemical Technology, Beijing 100029, China
| |
Collapse
|
4
|
Li F, Lin Q, Zhao X, Hu Z. Description length guided nonlinear unified Granger causality analysis. Netw Neurosci 2023; 7:1109-1128. [PMID: 37781142 PMCID: PMC10473308 DOI: 10.1162/netn_a_00316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Accepted: 03/22/2023] [Indexed: 10/03/2023] Open
Abstract
Most Granger causality analysis (GCA) methods still remain a two-stage scheme guided by different mathematical theories; both can actually be viewed as the same generalized model selection issues. Adhering to Occam's razor, we present a unified GCA (uGCA) based on the minimum description length principle. In this research, considering the common existence of nonlinearity in functional brain networks, we incorporated the nonlinear modeling procedure into the proposed uGCA method, in which an approximate representation of Taylor's expansion was adopted. Through synthetic data experiments, we revealed that nonlinear uGCA was obviously superior to its linear representation and the conventional GCA. Meanwhile, the nonlinear characteristics of high-order terms and cross-terms would be successively drowned out as noise levels increased. Then, in real fMRI data involving mental arithmetic tasks, we further illustrated that these nonlinear characteristics in fMRI data may indeed be drowned out at a high noise level, and hence a linear causal analysis procedure may be sufficient. Next, involving autism spectrum disorder patients data, compared with conventional GCA, the network property of causal connections obtained by uGCA methods appeared to be more consistent with clinical symptoms.
Collapse
Affiliation(s)
- Fei Li
- Key Laboratory of Quantum Precision Measurement, College of Science, Zhejiang University of Technology, Hangzhou, China
| | - Qiang Lin
- Key Laboratory of Quantum Precision Measurement, College of Science, Zhejiang University of Technology, Hangzhou, China
| | - Xiaohu Zhao
- Department of Radiology, Shanghai Fifth People’s Hospital, Fudan University, Shanghai, China
| | - Zhenghui Hu
- Key Laboratory of Quantum Precision Measurement, College of Science, Zhejiang University of Technology, Hangzhou, China
| |
Collapse
|
5
|
Piecewise Causality Study between Power Load and Vibration in Hydro-Turbine Generator Unit for a Low-Carbon Era. ENERGIES 2022. [DOI: 10.3390/en15031207] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
With the rapid development of wind and photovoltaic power generation, hydro-turbine generator units have to operate in a challenging way, resulting in obvious vibration problems. Because of the significant impact of vibration on safety and economical operation, it is of great significance to study the causal relationship between vibration and other variables. The complexity of the hydro-turbine generator unit makes it difficult to analyze the causality of the mechanism. This paper studied the correlation based on a data-driven method, then transformed the correlation into causality based on the mechanism. In terms of correlation, traditional research only judges whether there is a correlation between all data. When the data with correlation are interfered with by the data without correlation, the traditional methods cannot accurately identify the correlation. A piecewise correlation method based on change point detection was proposed to fill this research gap. The proposed method segmented time series pairs, then analyzed the correlation between subsequences. The causality between power load and vibration of a hydro-turbine generator unit was further analyzed. It indicated that when the power load is less than 200 MW, the causality is weak, and when the power load is greater than 375 MW, the causality is strong. The results show that the causality between vibration and power load is not fixed but piecewise. Furthermore, the piecewise correlation method compensated for the limitation of high variance of the maximum information coefficient.
Collapse
|
6
|
Grimes T, Datta S. A novel probabilistic generator for large-scale gene association networks. PLoS One 2021; 16:e0259193. [PMID: 34767561 PMCID: PMC8589155 DOI: 10.1371/journal.pone.0259193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Accepted: 10/14/2021] [Indexed: 11/18/2022] Open
Abstract
MOTIVATION Gene expression data provide an opportunity for reverse-engineering gene-gene associations using network inference methods. However, it is difficult to assess the performance of these methods because the true underlying network is unknown in real data. Current benchmarks address this problem by subsampling a known regulatory network to conduct simulations. But the topology of regulatory networks can vary greatly across organisms or tissues, and reference-based generators-such as GeneNetWeaver-are not designed to capture this heterogeneity. This means, for example, benchmark results from the E. coli regulatory network will not carry over to other organisms or tissues. In contrast, probabilistic generators do not require a reference network, and they have the potential to capture a rich distribution of topologies. This makes probabilistic generators an ideal approach for obtaining a robust benchmarking of network inference methods. RESULTS We propose a novel probabilistic network generator that (1) provides an alternative to address the inherent limitation of reference-based generators and (2) is able to create realistic gene association networks, and (3) captures the heterogeneity found across gold-standard networks better than existing generators used in practice. Eight organism-specific and 12 human tissue-specific gold-standard association networks are considered. Several measures of global topology are used to determine the similarity of generated networks to the gold-standards. Along with demonstrating the variability of network structure across organisms and tissues, we show that the commonly used "scale-free" model is insufficient for replicating these structures. AVAILABILITY This generator is implemented in the R package "SeqNet" and is available on CRAN (https://cran.r-project.org/web/packages/SeqNet/index.html).
Collapse
Affiliation(s)
- Tyler Grimes
- Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America
| | - Somnath Datta
- Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
7
|
Zhang Y, Chang X, Liu X. Inference of gene regulatory networks using pseudo-time series data. Bioinformatics 2021; 37:2423-2431. [PMID: 33576787 DOI: 10.1093/bioinformatics/btab099] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 01/18/2021] [Accepted: 02/10/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Inferring gene regulatory networks (GRNs) from high-throughput data is an important and challenging problem in systems biology. Although numerous GRN methods have been developed, most have focused on the verification of the specific data set. However, it is difficult to establish directed topological networks that are both suitable for time-series and non-time-series datasets due to the complexity and diversity of biological networks. RESULTS Here, we proposed a novel method, GNIPLR (Gene networks inference based on projection and lagged regression) to infer GRNs from time-series or non-time-series gene expression data. GNIPLR projected gene data twice using the LASSO projection (LSP) algorithm and the linear projection (LP) approximation to produce a linear and monotonous pseudo-time series, and then determined the direction of regulation in combination with lagged regression analyses. The proposed algorithm was validated using simulated and real biological data. Moreover, we also applied the GNIPLR algorithm to the liver hepatocellular carcinoma (LIHC) and bladder urothelial carcinoma (BLCA) cancer expression datasets. These analyses revealed significantly higher accuracy and AUC values than other popular methods. AVAILABILITY The GNIPLR tool is freely available at https://github.com/zyllluck/GNIPLR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuelei Zhang
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| | - Xiao Chang
- Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China
| | - Xiaoping Liu
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| |
Collapse
|
8
|
Li Z, Li S, Yu T, Li X. Measuring the Coupling Direction between Neural Oscillations with Weighted Symbolic Transfer Entropy. ENTROPY (BASEL, SWITZERLAND) 2020; 22:e22121442. [PMID: 33371251 PMCID: PMC7767336 DOI: 10.3390/e22121442] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 12/13/2020] [Accepted: 12/16/2020] [Indexed: 05/30/2023]
Abstract
Neural oscillations reflect rhythmic fluctuations in the synchronization of neuronal populations and play a significant role in neural processing. To further understand the dynamic interactions between different regions in the brain, it is necessary to estimate the coupling direction between neural oscillations. Here, we developed a novel method, termed weighted symbolic transfer entropy (WSTE), that combines symbolic transfer entropy (STE) and weighted probability distribution to measure the directionality between two neuronal populations. The traditional STE ignores the degree of difference between the amplitude values of a time series. In our proposed WSTE method, this information is picked up by utilizing a weighted probability distribution. The simulation analysis shows that the WSTE method can effectively estimate the coupling direction between two neural oscillations. In comparison with STE, the new method is more sensitive to the coupling strength and is more robust against noise. When applied to epileptic electrocorticography data, a significant coupling direction from the anterior nucleus of thalamus (ANT) to the seizure onset zone (SOZ) was detected during seizures. Considering the superiorities of the WSTE method, it is greatly advantageous to measure the coupling direction between neural oscillations and consequently characterize the information flow between different brain regions.
Collapse
Affiliation(s)
- Zhaohui Li
- School of Information Science and Engineering (School of Software), Yanshan University, Qinhuangdao 066004, China; (Z.L.); (S.L.)
- Hebei Key Laboratory of Information Transmission and Signal Processing, Yanshan University, Qinhuangdao 066004, China
| | - Shuaifei Li
- School of Information Science and Engineering (School of Software), Yanshan University, Qinhuangdao 066004, China; (Z.L.); (S.L.)
| | - Tao Yu
- Beijing Institute of Functional Neurosurgery, Capital Medical University, Beijing 100053, China;
| | - Xiaoli Li
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
9
|
Barraquand F, Picoche C, Detto M, Hartig F. Inferring species interactions using Granger causality and convergent cross mapping. THEOR ECOL-NETH 2020. [DOI: 10.1007/s12080-020-00482-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
10
|
Sipahi R, Porfiri M. Improving on transfer entropy-based network reconstruction using time-delays: Approach and validation. CHAOS (WOODBURY, N.Y.) 2020; 30:023125. [PMID: 32113235 DOI: 10.1063/1.5115510] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 01/23/2020] [Indexed: 06/10/2023]
Abstract
Transfer entropy constitutes a viable model-free tool to infer causal relationships between two dynamical systems from their time-series. In an information-theoretic sense, transfer entropy associates a cause-and-effect relationship with directed information transfer, such that one may improve the prediction of the future of a dynamical system from the history of another system. Recent studies have proposed the use of transfer entropy to reconstruct networks, but the inherent dyadic nature of this metric challenges the development of a robust approach that can discriminate direct from indirect interactions between nodes. In this paper, we seek to fill this methodological gap through the cogent integration of time-delays in the transfer entropy computation. By recognizing that information transfer in the network is bound by a finite speed, we relate the value of the time-delayed transfer entropy between two nodes to the number of walks between them. Upon this premise, we lay out the foundation of an alternative framework for network reconstruction, which we illustrate through closed-form results on three-node networks and numerically validate on larger networks, using examples of Boolean models and chaotic maps.
Collapse
Affiliation(s)
- Rifat Sipahi
- Department of Mechanical and Industrial Engineering, Northeastern University, Boston, Massachusetts 02115, USA
| | - Maurizio Porfiri
- Department of Mechanical and Aerospace Engineering and Department of Biomedical Engineering, New York University Tandon School of Engineering, 6 MetroTech Center, Brooklyn, New York 11201, USA
| |
Collapse
|
11
|
Kořenek J, Hlinka J. Causal network discovery by iterative conditioning: Comparison of algorithms. CHAOS (WOODBURY, N.Y.) 2020; 30:013117. [PMID: 32013475 DOI: 10.1063/1.5115267] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2019] [Accepted: 11/25/2019] [Indexed: 06/10/2023]
Abstract
Estimating causal interactions in complex dynamical systems is an important problem encountered in many fields of current science. While a theoretical solution for detecting the causal interactions has been previously formulated in the framework of prediction improvement, it generally requires the computation of high-dimensional information functionals-a situation invoking the curse of dimensionality with increasing network size. Recently, several methods have been proposed to alleviate this problem, based on iterative procedures for the assessment of conditional (in)dependences. In the current work, we bring a comparison of several such prominent approaches. This is done both by theoretical comparison of the algorithms using a formulation in a common framework and by numerical simulations including realistic complex coupling patterns. The theoretical analysis highlights the key similarities and differences between the algorithms, hinting on their comparative strengths and weaknesses. The method assumptions and specific properties such as false positive control and order-dependence are discussed. Numerical simulations suggest that while the accuracy of most of the algorithms is almost indistinguishable, there are substantial differences in their computational demands, ranging theoretically from polynomial to exponential complexity and leading to substantial differences in computation time in realistic scenarios depending on the density and size of networks. Based on the analysis of the algorithms and numerical simulations, we propose a hybrid approach providing competitive accuracy with improved computational efficiency.
Collapse
Affiliation(s)
- Jakub Kořenek
- Institute of Computer Science of the Czech Academy of Sciences, Czech Academy of Sciences, Pod vodarenskou vezi 271/2, 182 07 Prague, Czech Republic
| | - Jaroslav Hlinka
- Institute of Computer Science of the Czech Academy of Sciences, Czech Academy of Sciences, Pod vodarenskou vezi 271/2, 182 07 Prague, Czech Republic
| |
Collapse
|
12
|
Glymour C, Zhang K, Spirtes P. Review of Causal Discovery Methods Based on Graphical Models. Front Genet 2019; 10:524. [PMID: 31214249 PMCID: PMC6558187 DOI: 10.3389/fgene.2019.00524] [Citation(s) in RCA: 165] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Accepted: 05/13/2019] [Indexed: 12/11/2022] Open
Abstract
A fundamental task in various disciplines of science, including biology, is to find underlying causal relations and make use of them. Causal relations can be seen if interventions are properly applied; however, in many cases they are difficult or even impossible to conduct. It is then necessary to discover causal relations by analyzing statistical properties of purely observational data, which is known as causal discovery or causal structure search. This paper aims to give a introduction to and a brief review of the computational methods for causal discovery that were developed in the past three decades, including constraint-based and score-based methods and those based on functional causal models, supplemented by some illustrations and applications.
Collapse
Affiliation(s)
- Clark Glymour
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Kun Zhang
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Peter Spirtes
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
13
|
Inferring a nonlinear biochemical network model from a heterogeneous single-cell time course data. Sci Rep 2018; 8:6790. [PMID: 29717206 PMCID: PMC5931614 DOI: 10.1038/s41598-018-25064-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Accepted: 04/09/2018] [Indexed: 12/30/2022] Open
Abstract
Mathematical modeling and analysis of biochemical reaction networks are key routines in computational systems biology and biophysics; however, it remains difficult to choose the most valid model. Here, we propose a computational framework for data-driven and systematic inference of a nonlinear biochemical network model. The framework is based on the expectation-maximization algorithm combined with particle smoother and sparse regularization techniques. In this method, a “redundant” model consisting of an excessive number of nodes and regulatory paths is iteratively updated by eliminating unnecessary paths, resulting in an inference of the most likely model. Using artificial single-cell time-course data showing heterogeneous oscillatory behaviors, we demonstrated that this algorithm successfully inferred the true network without any prior knowledge of network topology or parameter values. Furthermore, we showed that both the regulatory paths among nodes and the optimal number of nodes in the network could be systematically determined. The method presented in this study provides a general framework for inferring a nonlinear biochemical network model from heterogeneous single-cell time-course data.
Collapse
|
14
|
Baksi KD, Kuntal BK, Mande SS. 'TIME': A Web Application for Obtaining Insights into Microbial Ecology Using Longitudinal Microbiome Data. Front Microbiol 2018; 9:36. [PMID: 29416530 PMCID: PMC5787560 DOI: 10.3389/fmicb.2018.00036] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Accepted: 01/09/2018] [Indexed: 12/21/2022] Open
Abstract
Realization of the importance of microbiome studies, coupled with the decreasing sequencing cost, has led to the exponential growth of microbiome data. A number of these microbiome studies have focused on understanding changes in the microbial community over time. Such longitudinal microbiome studies have the potential to offer unique insights pertaining to the microbial social networks as well as their responses to perturbations. In this communication, we introduce a web based framework called 'TIME' (Temporal Insights into Microbial Ecology'), developed specifically to obtain meaningful insights from microbiome time series data. The TIME web-server is designed to accept a wide range of popular formats as input with options to preprocess and filter the data. Multiple samples, defined by a series of longitudinal time points along with their metadata information, can be compared in order to interactively visualize the temporal variations. In addition to standard microbiome data analytics, the web server implements popular time series analysis methods like Dynamic time warping, Granger causality and Dickey Fuller test to generate interactive layouts for facilitating easy biological inferences. Apart from this, a new metric for comparing metagenomic time series data has been introduced to effectively visualize the similarities/differences in the trends of the resident microbial groups. Augmenting the visualizations with the stationarity information pertaining to the microbial groups is utilized to predict the microbial competition as well as community structure. Additionally, the 'causality graph analysis' module incorporated in TIME allows predicting taxa that might have a higher influence on community structure in different conditions. TIME also allows users to easily identify potential taxonomic markers from a longitudinal microbiome analysis. We illustrate the utility of the web-server features on a few published time series microbiome data and demonstrate the ease with which it can be used to perform complex analysis.
Collapse
Affiliation(s)
- Krishanu D. Baksi
- Bio-Sciences R&D Division, TCS Research, Tata Consultancy Services Ltd., Pune, India
| | - Bhusan K. Kuntal
- Bio-Sciences R&D Division, TCS Research, Tata Consultancy Services Ltd., Pune, India
- Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory (NCL), Pune, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Sharmila S. Mande
- Bio-Sciences R&D Division, TCS Research, Tata Consultancy Services Ltd., Pune, India
| |
Collapse
|