1
|
Straube J, Huang BE, Cao KAL. DynOmics to identify delays and co-expression patterns across time course experiments. Sci Rep 2017; 7:40131. [PMID: 28065937 PMCID: PMC5220332 DOI: 10.1038/srep40131] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 12/02/2016] [Indexed: 12/16/2022] Open
Abstract
Dynamic changes in biological systems can be captured by measuring molecular expression from different levels (e.g., genes and proteins) across time. Integration of such data aims to identify molecules that show similar expression changes over time; such molecules may be co-regulated and thus involved in similar biological processes. Combining data sources presents a systematic approach to study molecular behaviour. It can compensate for missing data in one source, and can reduce false positives when multiple sources highlight the same pathways. However, integrative approaches must accommodate the challenges inherent in ‘omics’ data, including high-dimensionality, noise, and timing differences in expression. As current methods for identification of co-expression cannot cope with this level of complexity, we developed a novel algorithm called DynOmics. DynOmics is based on the fast Fourier transform, from which the difference in expression initiation between trajectories can be estimated. This delay can then be used to realign the trajectories and identify those which show a high degree of correlation. Through extensive simulations, we demonstrate that DynOmics is efficient and accurate compared to existing approaches. We consider two case studies highlighting its application, identifying regulatory relationships across ‘omics’ data within an organism and for comparative gene expression analysis across organisms.
Collapse
Affiliation(s)
- Jasmin Straube
- QFAB@QCIF Bioinformatics, Institute for Molecular Biosciences, The University of Queensland, Queensland Bioscience Precinct, St Lucia, QLD, Australia.,The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, QLD, Australia
| | - Bevan Emma Huang
- Janssen Research &Development, LLC, Discovery Sciences, Menlo Park, USA
| | - Kim-Anh Lê Cao
- The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, QLD, Australia
| |
Collapse
|
2
|
Bulashevska S, Priest C, Speicher D, Zimmermann J, Westermann F, Cremers AB. SwitchFinder - a novel method and query facility for discovering dynamic gene expression patterns. BMC Bioinformatics 2016; 17:532. [PMID: 27978814 PMCID: PMC5160026 DOI: 10.1186/s12859-016-1391-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2016] [Accepted: 11/29/2016] [Indexed: 12/20/2022] Open
Abstract
Background Biological systems and processes are highly dynamic. To gain insights into their functioning time-resolved measurements are necessary. Time-resolved gene expression data captures temporal behaviour of the genes genome-wide under various biological conditions: in response to stimuli, during cell cycle, differentiation or developmental programs. Dissecting dynamic gene expression patterns from this data may shed light on the functioning of the gene regulatory system. The present approach facilitates this discovery. The fundamental idea behind it is the following: there are change-points (switches) in the gene behaviour separating intervals of increasing and decreasing activity, whereas the intervals may have different durations. Elucidating the switch-points is important for the identification of biologically meanigfull features and patterns of the gene dynamics. Results We developed a statistical method, called SwitchFinder, for the analysis of time-series data, in particular gene expression data, based on a change-point model. Fitting the model to the gene expression time-courses indicates switch-points between increasing and decreasing activities of each gene. Two types of the model - based on linear and on generalized logistic function - were used to capture the data between the switch-points. Model inference was facilitated with the Bayesian methodology using Markov chain Monte Carlo (MCMC) technique Gibbs sampling. Further on, we introduced features of the switch-points: growth, decay, spike and cleft, which reflect important dynamic aspects. With this, the gene expression profiles are represented in a qualitative manner - as sets of the dynamic features at their onset-times. We developed a Web application of the approach, enabling to put queries to the gene expression time-courses and to deduce groups of genes with common dynamic patterns. SwitchFinder was applied to our original data - the gene expression time-series measured in neuroblastoma cell line upon treatment with all-trans retinoic acid (ATRA). The analysis revealed eight patterns of the gene expression responses to ATRA, indicating the induction of the BMP, WNT, Notch, FGF and NTRK-receptor signaling pathways involved in cell differentiation, as well as the repression of the cell-cycle related genes. Conclusions SwitchFinder is a novel approach to the analysis of biological time-series data, supporting inference and interactive exploration of its inherent dynamic patterns, hence facilitating biological discovery process. SwitchFinder is freely available at https://newbioinformatics.eu/switchfinder. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1391-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Svetlana Bulashevska
- B-IT Bonn-Aachen International Center for Information Technology, University of Bonn, Dahlmannstr. 2, Bonn, 53113, Germany.
| | - Colin Priest
- Sigma Plus Consulting Pty Ltd, Crows Nest, 2065, NSW, Australia
| | - Daniel Speicher
- B-IT Bonn-Aachen International Center for Information Technology, University of Bonn, Dahlmannstr. 2, Bonn, 53113, Germany.,Institute of Computer Science, University of Bonn, Roemerstr. 164, Bonn, 53117, Germany
| | - Jörg Zimmermann
- B-IT Bonn-Aachen International Center for Information Technology, University of Bonn, Dahlmannstr. 2, Bonn, 53113, Germany.,Institute of Computer Science, University of Bonn, Roemerstr. 164, Bonn, 53117, Germany
| | - Frank Westermann
- Neuroblastoma Genomics Group, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg, 69120, Germany
| | - Armin B Cremers
- B-IT Bonn-Aachen International Center for Information Technology, University of Bonn, Dahlmannstr. 2, Bonn, 53113, Germany.,Institute of Computer Science, University of Bonn, Roemerstr. 164, Bonn, 53117, Germany
| |
Collapse
|
3
|
Natural Cubic Spline Regression Modeling Followed by Dynamic Network Reconstruction for the Identification of Radiation-Sensitivity Gene Association Networks from Time-Course Transcriptome Data. PLoS One 2016; 11:e0160791. [PMID: 27505168 PMCID: PMC4978405 DOI: 10.1371/journal.pone.0160791] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Accepted: 06/14/2016] [Indexed: 11/23/2022] Open
Abstract
Gene expression time-course experiments allow to study the dynamics of transcriptomic changes in cells exposed to different stimuli. However, most approaches for the reconstruction of gene association networks (GANs) do not propose prior-selection approaches tailored to time-course transcriptome data. Here, we present a workflow for the identification of GANs from time-course data using prior selection of genes differentially expressed over time identified by natural cubic spline regression modeling (NCSRM). The workflow comprises three major steps: 1) the identification of differentially expressed genes from time-course expression data by employing NCSRM, 2) the use of regularized dynamic partial correlation as implemented in GeneNet to infer GANs from differentially expressed genes and 3) the identification and functional characterization of the key nodes in the reconstructed networks. The approach was applied on a time-resolved transcriptome data set of radiation-perturbed cell culture models of non-tumor cells with normal and increased radiation sensitivity. NCSRM detected significantly more genes than another commonly used method for time-course transcriptome analysis (BETR). While most genes detected with BETR were also detected with NCSRM the false-detection rate of NCSRM was low (3%). The GANs reconstructed from genes detected with NCSRM showed a better overlap with the interactome network Reactome compared to GANs derived from BETR detected genes. After exposure to 1 Gy the normal sensitive cells showed only sparse response compared to cells with increased sensitivity, which exhibited a strong response mainly of genes related to the senescence pathway. After exposure to 10 Gy the response of the normal sensitive cells was mainly associated with senescence and that of cells with increased sensitivity with apoptosis. We discuss these results in a clinical context and underline the impact of senescence-associated pathways in acute radiation response of normal cells. The workflow of this novel approach is implemented in the open-source Bioconductor R-package splineTimeR.
Collapse
|
4
|
Blomstedt P, Dutta R, Seth S, Brazma A, Kaski S. Modelling-based experiment retrieval: a case study with gene expression clustering. Bioinformatics 2016; 32:1388-94. [PMID: 26740526 DOI: 10.1093/bioinformatics/btv762] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 12/28/2015] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION Public and private repositories of experimental data are growing to sizes that require dedicated methods for finding relevant data. To improve on the state of the art of keyword searches from annotations, methods for content-based retrieval have been proposed. In the context of gene expression experiments, most methods retrieve gene expression profiles, requiring each experiment to be expressed as a single profile, typically of case versus control. A more general, recently suggested alternative is to retrieve experiments whose models are good for modelling the query dataset. However, for very noisy and high-dimensional query data, this retrieval criterion turns out to be very noisy as well. RESULTS We propose doing retrieval using a denoised model of the query dataset, instead of the original noisy dataset itself. To this end, we introduce a general probabilistic framework, where each experiment is modelled separately and the retrieval is done by finding related models. For retrieval of gene expression experiments, we use a probabilistic model called product partition model, which induces a clustering of genes that show similar expression patterns across a number of samples. The suggested metric for retrieval using clusterings is the normalized information distance. Empirical results finally suggest that inference for the full probabilistic model can be approximated with good performance using computationally faster heuristic clustering approaches (e.g. k-means). The method is highly scalable and straightforward to apply to construct a general-purpose gene expression experiment retrieval method. AVAILABILITY AND IMPLEMENTATION The method can be implemented using standard clustering algorithms and normalized information distance, available in many statistical software packages. CONTACT paul.blomstedt@aalto.fi or samuel.kaski@aalto.fi SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Paul Blomstedt
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland and
| | - Ritabrata Dutta
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland and
| | - Sohan Seth
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland and
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, UK
| | - Samuel Kaski
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland and
| |
Collapse
|
5
|
Identifying genes relevant to specific biological conditions in time course microarray experiments. PLoS One 2013; 8:e76561. [PMID: 24146889 PMCID: PMC3795718 DOI: 10.1371/journal.pone.0076561] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2013] [Accepted: 08/28/2013] [Indexed: 11/19/2022] Open
Abstract
Microarrays have been useful in understanding various biological processes by allowing the simultaneous study of the expression of thousands of genes. However, the analysis of microarray data is a challenging task. One of the key problems in microarray analysis is the classification of unknown expression profiles. Specifically, the often large number of non-informative genes on the microarray adversely affects the performance and efficiency of classification algorithms. Furthermore, the skewed ratio of sample to variable poses a risk of overfitting. Thus, in this context, feature selection methods become crucial to select relevant genes and, hence, improve classification accuracy. In this study, we investigated feature selection methods based on gene expression profiles and protein interactions. We found that in our setup, the addition of protein interaction information did not contribute to any significant improvement of the classification results. Furthermore, we developed a novel feature selection method that relies exclusively on observed gene expression changes in microarray experiments, which we call "relative Signal-to-Noise ratio" (rSNR). More precisely, the rSNR ranks genes based on their specificity to an experimental condition, by comparing intrinsic variation, i.e. variation in gene expression within an experimental condition, with extrinsic variation, i.e. variation in gene expression across experimental conditions. Genes with low variation within an experimental condition of interest and high variation across experimental conditions are ranked higher, and help in improving classification accuracy. We compared different feature selection methods on two time-series microarray datasets and one static microarray dataset. We found that the rSNR performed generally better than the other methods.
Collapse
|
6
|
Wang K, Ng SK, McLachlan GJ. Clustering of time-course gene expression profiles using normal mixture models with autoregressive random effects. BMC Bioinformatics 2012; 13:300. [PMID: 23151154 PMCID: PMC3574839 DOI: 10.1186/1471-2105-13-300] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Accepted: 11/07/2012] [Indexed: 11/26/2022] Open
Abstract
Background Time-course gene expression data such as yeast cell cycle data may be periodically expressed. To cluster such data, currently used Fourier series approximations of periodic gene expressions have been found not to be sufficiently adequate to model the complexity of the time-course data, partly due to their ignoring the dependence between the expression measurements over time and the correlation among gene expression profiles. We further investigate the advantages and limitations of available models in the literature and propose a new mixture model with autoregressive random effects of the first order for the clustering of time-course gene-expression profiles. Some simulations and real examples are given to demonstrate the usefulness of the proposed models. Results We illustrate the applicability of our new model using synthetic and real time-course datasets. We show that our model outperforms existing models to provide more reliable and robust clustering of time-course data. Our model provides superior results when genetic profiles are correlated. It also gives comparable results when the correlation between the gene profiles is weak. In the applications to real time-course data, relevant clusters of coregulated genes are obtained, which are supported by gene-function annotation databases. Conclusions Our new model under our extension of the EMMIX-WIRE procedure is more reliable and robust for clustering time-course data because it adopts a random effects model that allows for the correlation among observations at different time points. It postulates gene-specific random effects with an autocorrelation variance structure that models coregulation within the clusters. The developed R package is flexible in its specification of the random effects through user-input parameters that enables improved modelling and consequent clustering of time-course data.
Collapse
Affiliation(s)
- Kui Wang
- Department of Mathematics, University of Queensland, Brisbane, QLD 4072, Australia
| | | | | |
Collapse
|
7
|
Bar-Joseph Z, Gitter A, Simon I. Studying and modelling dynamic biological processes using time-series gene expression data. Nat Rev Genet 2012; 13:552-64. [PMID: 22805708 DOI: 10.1038/nrg3244] [Citation(s) in RCA: 318] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Biological processes are often dynamic, thus researchers must monitor their activity at multiple time points. The most abundant source of information regarding such dynamic activity is time-series gene expression data. These data are used to identify the complete set of activated genes in a biological process, to infer their rates of change, their order and their causal effects and to model dynamic systems in the cell. In this Review we discuss the basic patterns that have been observed in time-series experiments, how these patterns are combined to form expression programs, and the computational analysis, visualization and integration of these data to infer models of dynamic biological systems.
Collapse
Affiliation(s)
- Ziv Bar-Joseph
- Lane Center for Computational Biology and Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.
| | | | | |
Collapse
|
8
|
Redestig H, Costa IG. Detection and interpretation of metabolite-transcript coresponses using combined profiling data. ACTA ACUST UNITED AC 2011; 27:i357-65. [PMID: 21685093 PMCID: PMC3117345 DOI: 10.1093/bioinformatics/btr231] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Motivation: Studying the interplay between gene expression and metabolite levels can yield important information on the physiology of stress responses and adaptation strategies. Performing transcriptomics and metabolomics in parallel during time-series experiments represents a systematic way to gain such information. Several combined profiling datasets have been added to the public domain and they form a valuable resource for hypothesis generating studies. Unfortunately, detecting coresponses between transcript levels and metabolite abundances is non-trivial: they cannot be assumed to overlap directly with underlying biochemical pathways and they may be subject to time delays and obscured by considerable noise. Results: Our aim was to predict pathway comemberships between metabolites and genes based on their coresponses to applied stress. We found that in the presence of strong noise and time-shifted responses, a hidden Markov model-based similarity outperforms the simpler Pearson correlation but performs comparably or worse in their absence. Therefore, we propose a supervised method that applies pathway information to summarize similarity statistics to a consensus statistic that is more informative than any of the single measures. Using four combined profiling datasets, we show that comembership between metabolites and genes can be predicted for numerous KEGG pathways; this opens opportunities for the detection of transcriptionally regulated pathways and novel metabolically related genes. Availability: A command-line software tool is available at http://www.cin.ufpe.br/~igcf/Metabolites. Contact:henning@psc.riken.jp; igcf@cin.ufpe.br Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|