1
|
Velten B, Stegle O. Principles and challenges of modeling temporal and spatial omics data. Nat Methods 2023; 20:1462-1474. [PMID: 37710019 DOI: 10.1038/s41592-023-01992-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 07/31/2023] [Indexed: 09/16/2023]
Abstract
Studies with temporal or spatial resolution are crucial to understand the molecular dynamics and spatial dependencies underlying a biological process or system. With advances in high-throughput omic technologies, time- and space-resolved molecular measurements at scale are increasingly accessible, providing new opportunities to study the role of timing or structure in a wide range of biological questions. At the same time, analyses of the data being generated in the context of spatiotemporal studies entail new challenges that need to be considered, including the need to account for temporal and spatial dependencies and compare them across different scales, biological samples or conditions. In this Review, we provide an overview of common principles and challenges in the analysis of temporal and spatial omics data. We discuss statistical concepts to model temporal and spatial dependencies and highlight opportunities for adapting existing analysis methods to data with temporal and spatial dimensions.
Collapse
Affiliation(s)
- Britta Velten
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- Cellular Genetics Programme, Wellcome Sanger Institute, Hinxton, Cambridge, UK.
- Centre for Organismal Studies (COS) and Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany.
| | - Oliver Stegle
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- Cellular Genetics Programme, Wellcome Sanger Institute, Hinxton, Cambridge, UK.
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
| |
Collapse
|
2
|
Zhang G, Zhang J, Yao Z, Shi Y, Xu C, Shao L, Jiang L, Li M, Tong Y, Wang Y. Time-series gene expression patterns and their characteristics of Beauveria bassiana in the process of infecting pest insects. J Basic Microbiol 2022; 62:1274-1286. [PMID: 35781725 DOI: 10.1002/jobm.202200155] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 05/27/2022] [Accepted: 06/11/2022] [Indexed: 11/06/2022]
Abstract
Beauveria bassiana has been widely used as an important biological control fungus for agricultural and forest pests, and clarifying the interaction mechanism between B. bassiana and its host will help to better exert the efficacy of the mycoinsecticide. Here, we proposed a novel pattern analysis (PA) method for analyzing time-series data and applied it to a transcriptomic data set of B. bassiana infecting Galleria mellonella. We screened out 14 patterns including 868 genes, which had some characteristics that were not inferior to differentially expressed genes (DEGs). Compared with the previous analysis of this data set, we had three novel discoveries during B. bassiana infection, including overall downregulation of gene expression, the more critical first 24 h, and enrichment of regulatory functions of downregulated genes. Our new PA method promises to be an important complement to DEGs analysis for time-series transcriptomic data, and our findings enrich our knowledge of molecular mechanisms of fungal-host interactions.
Collapse
Affiliation(s)
- Guochao Zhang
- College of Plant Protection, Shandong Agricultural University, Tai'an, China.,School of Biological Engineering/Institute of Digital Ecology and Health, Huainan Normal University, Huainan, Anhui, China.,Shandong Tobacco Research Institute Co., Ltd., Jinan, China
| | - Jifeng Zhang
- School of Biological Engineering/Institute of Digital Ecology and Health, Huainan Normal University, Huainan, Anhui, China.,Key Laboratory of Biology and Sustainable Management of Plant Diseases and Pests of Anhui Higher Education Institutes/School of Plant Protection, Anhui Agricultural University, Hefei, China.,State Key Laboratory of Pollution Control and Resource Reuse, Nanjing, China.,Key Laboratory of Industrial Dust Prevention and Control & Occupational Health and Safety, Ministry of Education, Huainan, China.,Anhui Shanhe Pharmaceutical Excipients Co., Ltd., Huainan, China
| | - Zhuo Yao
- Jinan Agricultural Technology Extension Service Center, Jinan, China
| | - Yong Shi
- School of Computer Science/School of Electronic Engineering, Huainan Normal University, Huainan, China
| | - Chenxi Xu
- College of Food Science and Engineering, Northwest A&F University, Xianyang, China
| | - Lvyi Shao
- School of Biological Engineering/Institute of Digital Ecology and Health, Huainan Normal University, Huainan, Anhui, China
| | - Lei Jiang
- Key Laboratory of Biology and Sustainable Management of Plant Diseases and Pests of Anhui Higher Education Institutes/School of Plant Protection, Anhui Agricultural University, Hefei, China
| | - Maoye Li
- Key Laboratory of Biology and Sustainable Management of Plant Diseases and Pests of Anhui Higher Education Institutes/School of Plant Protection, Anhui Agricultural University, Hefei, China
| | - Yue Tong
- School of Computer Science/School of Electronic Engineering, Huainan Normal University, Huainan, China
| | - Yujun Wang
- College of Plant Protection, Shandong Agricultural University, Tai'an, China
| |
Collapse
|
3
|
Yang Q, Shang J, Chen Y, Tang D, Ouyang Y, Xiong B, Zhang X. Plasmonic Imaging of Dynamic Interactions between Membrane Receptor Clusters beyond the Diffraction Limit in Live Cells. Anal Chem 2021; 93:16571-16580. [PMID: 34847664 DOI: 10.1021/acs.analchem.1c03843] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
As a general mechanism, ligand-induced receptor clustering on cell membrane plays determinative roles in pattern recognition and transmembrane signaling. Nevertheless, probing the dynamic characteristics for the complicated interactions between receptor clusters remains difficult because of the lack of strategy for receptor cluster labeling and long-term monitoring in live cells. Herein, we proposed a data-mining-integrated plasmon coupling microscopy to study the dynamic cluster-cluster interactions on cell surface. The receptor clusters were activated and labeled with multivalent plasmonic nanoprobes, which enables the real-time monitoring of individual receptor clusters and the measurement of cluster-cluster interactions from the analysis of plasmonic coupling for the nanoprobe pairs beyond the diffraction limit. Using this method, we found that the protease-activated receptor 1 (PAR1) clusters would experience an initial contact and then form a weakly bound cluster-cluster complex, followed by cluster fusion to generate large-sized signaling complexes. The underlying state transitions for the cluster-cluster fusion process were uncovered using a data-mining technique named the K-means-based hidden Markov model with the scattering intensity of coupled nanoprobe pairs as observations. All of the findings from single-particle analysis and bulk measurements suggested that the allosteric inhibitors could suppress the dynamic transitions from the weakly bound cluster-cluster complexes to fused signaling complexes, leading to the subsequent downregulation of intracellular calcium signaling pathways. We believe that this strategy is promising for imaging and monitoring receptor clustering as well as protein phase separation on the cell surface in various biological and physiological processes.
Collapse
Affiliation(s)
- Qian Yang
- Molecular Science and Biomedicine Laboratory, State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, 410082 Changsha, P. R. China
| | - Jinhui Shang
- Molecular Science and Biomedicine Laboratory, State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, 410082 Changsha, P. R. China
| | - Yancao Chen
- Molecular Science and Biomedicine Laboratory, State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, 410082 Changsha, P. R. China
| | - Decui Tang
- Molecular Science and Biomedicine Laboratory, State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, 410082 Changsha, P. R. China
| | - Yuzhi Ouyang
- Molecular Science and Biomedicine Laboratory, State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, 410082 Changsha, P. R. China
| | - Bin Xiong
- Molecular Science and Biomedicine Laboratory, State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, 410082 Changsha, P. R. China
| | - Xiaobing Zhang
- Molecular Science and Biomedicine Laboratory, State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, 410082 Changsha, P. R. China
| |
Collapse
|
4
|
Han M, Liu X, Zhang W, Wang M, Bu W, Chang C, Yu M, Li Y, Tian C, Yang X, Zhu Y, He F. TSMiner: a novel framework for generating time-specific gene regulatory networks from time-series expression profiles. Nucleic Acids Res 2021; 49:e108. [PMID: 34313778 PMCID: PMC8502000 DOI: 10.1093/nar/gkab629] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 06/30/2021] [Accepted: 07/09/2021] [Indexed: 12/03/2022] Open
Abstract
Time-series gene expression profiles are the primary source of information on complicated biological processes; however, capturing dynamic regulatory events from such data is challenging. Herein, we present a novel analytic tool, time-series miner (TSMiner), that can construct time-specific regulatory networks from time-series expression profiles using two groups of genes: (i) genes encoding transcription factors (TFs) that are activated or repressed at a specific time and (ii) genes associated with biological pathways showing significant mutual interactions with these TFs. Compared with existing methods, TSMiner demonstrated superior sensitivity and accuracy. Additionally, the application of TSMiner to a time-course RNA-seq dataset associated with mouse liver regeneration (LR) identified 389 transcriptional activators and 49 transcriptional repressors that were either activated or repressed across the LR process. TSMiner also predicted 109 and 47 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways significantly interacting with the transcriptional activators and repressors, respectively. These findings revealed the temporal dynamics of multiple critical LR-related biological processes, including cell proliferation, metabolism and the immune response. The series of evaluations and experiments demonstrated that TSMiner provides highly reliable predictions and increases the understanding of rapidly accumulating time-series omics data.
Collapse
Affiliation(s)
- Mingfei Han
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences (Beijing), Beijing 102206, P.R. China
| | - Xian Liu
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences (Beijing), Beijing 102206, P.R. China
| | - Wen Zhang
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences (Beijing), Beijing 102206, P.R. China.,Tianjin Key Laboratory of Food Science and Biotechnology, School of Biotechnology and Food Science, Tianjin University of Commerce, Tianjin 300134, China
| | - Mengnan Wang
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences (Beijing), Beijing 102206, P.R. China
| | - Wenjing Bu
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences (Beijing), Beijing 102206, P.R. China
| | - Cheng Chang
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences (Beijing), Beijing 102206, P.R. China
| | - Miao Yu
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences (Beijing), Beijing 102206, P.R. China
| | - Yingxing Li
- Central Research Laboratory, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China
| | - Chunyan Tian
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences (Beijing), Beijing 102206, P.R. China
| | - Xiaoming Yang
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences (Beijing), Beijing 102206, P.R. China
| | - Yunping Zhu
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences (Beijing), Beijing 102206, P.R. China
| | - Fuchu He
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences (Beijing), Beijing 102206, P.R. China
| |
Collapse
|
5
|
Cinar O, Iyigun C, Ilk O. An evaluation of a novel approach for clustering genes with dissimilar replicates. COMMUN STAT-SIMUL C 2020. [DOI: 10.1080/03610918.2020.1839092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Ozan Cinar
- Department of Psychiatry and Neuropsychology, Maastricht University, Maastricht, The Netherlands
| | - Cem Iyigun
- Department of Industrial Engineering, Middle East Technical University, Ankara, Turkey
| | - Ozlem Ilk
- Department of Statistics, Middle East Technical University, Ankara, Turkey
| |
Collapse
|
6
|
Computational discovery and modeling of novel gene expression rules encoded in the mRNA. Biochem Soc Trans 2020; 48:1519-1528. [PMID: 32662820 DOI: 10.1042/bst20191048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 06/15/2020] [Accepted: 06/17/2020] [Indexed: 11/17/2022]
Abstract
The transcript is populated with numerous overlapping codes that regulate all steps of gene expression. Deciphering these codes is very challenging due to the large number of variables involved, the non-modular nature of the codes, biases and limitations in current experimental approaches, our limited knowledge in gene expression regulation across the tree of life, and other factors. In recent years, it has been shown that computational modeling and algorithms can significantly accelerate the discovery of novel gene expression codes. Here, we briefly summarize the latest developments and different approaches in the field.
Collapse
|
7
|
Wu J, Gupta M, Hussein AI, Gerstenfeld L. Bayesian modeling of factorial time-course data with applications to a bone aging gene expression study. J Appl Stat 2020; 48:1730-1754. [PMID: 34295011 DOI: 10.1080/02664763.2020.1772733] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Many scientific studies, especially in the biomedical sciences, generate data measured simultaneously over a multitude of units, over a period of time, and under different conditions or combinations of factors. Often, an important question of interest asked relates to which units behave similarly under different conditions, but measuring the variation over time complicates the analysis significantly. In this article we address such a problem arising from a gene expression study relating to bone aging, and develop a Bayesian statistical method that can simultaneously detect and uncover signals on three levels within such data: factorial, longitudinal, and transcriptional. Our model framework considers both cluster and time-point-specific parameters and these parameters uniquely determine the shapes of the temporal gene expression profiles, allowing the discovery and characterization of latent gene clusters based on similar underlying biological mechanisms. Our methodology was successfully applied to discover transcriptional networks in a microarray data set comparing the transcriptomic changes that occurred during bone aging in male and female mice expressing one or both copies of the bromodomain (Brd2) gene, a transcriptional regulator which exhibits an age-dependent sex-linked bone loss phenotype.
Collapse
Affiliation(s)
- Joseph Wu
- Boston University School of Public Health, Boston, MA, U. S. A.,Pfizer, Inc., Groton, CT, U.S.A
| | | | | | | |
Collapse
|
8
|
Temporal Pattern Analysis of Local Rainstorm Events in China During the Flood Season Based on Time Series Clustering. WATER 2020. [DOI: 10.3390/w12030725] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Similar to the rainfall depth, duration and intensity, the temporal pattern is also an important characteristic of rainstorm events. Studies have shown that temporal patterns will influence runoff modelling, flash flood warning thresholds as well as urban and infrastructure flood inundation simulations. In this study, a time series clustering method using dynamic time warping (DTW) as similarity measurement criteria is proposed to analyze rainfall temporal patterns. Compared with the existing approaches, it can better reflect the real rainfall processes. Based on this novel method, five representative temporal patterns were extracted from 13,299 rainstorm events during the flood season in China. Through the analysis of their statistical characteristics, the disaster-causing risks of each temporal pattern were compared. Furthermore, we found that for rainstorm events whose durations are less than 24 h, the rainfall is mainly concentrated in 3 to 6 h, which proposes higher requirements for the design of flood control and drainage projects compared with those using average intensities of 12 or 24 h as design standards. Finally, through regional analysis, we found that the rainfall depth, intensity and peak value are affected by the macroclimate. However, the temporal patterns are not strongly related to the macroclimate but are more likely to be affected by the local climate and topography, which needs further studies at smaller scales.
Collapse
|
9
|
Libbrecht MW, Rodriguez OL, Weng Z, Bilmes JA, Hoffman MM, Noble WS. A unified encyclopedia of human functional DNA elements through fully automated annotation of 164 human cell types. Genome Biol 2019; 20:180. [PMID: 31462275 PMCID: PMC6714098 DOI: 10.1186/s13059-019-1784-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 08/05/2019] [Indexed: 12/31/2022] Open
Abstract
Semi-automated genome annotation methods such as Segway take as input a set of genome-wide measurements such as of histone modification or DNA accessibility and output an annotation of genomic activity in the target cell type. Here we present annotations of 164 human cell types using 1615 data sets. To produce these annotations, we automated the label interpretation step to produce a fully automated annotation strategy. Using these annotations, we developed a measure of the importance of each genomic position called the “conservation-associated activity score.” We further combined all annotations into a single, cell type-agnostic encyclopedia that catalogs all human regulatory elements.
Collapse
Affiliation(s)
| | - Oscar L Rodriguez
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Boston, USA
| | - Jeffrey A Bilmes
- Department of Electrical Engineering, University of Washington, Seattle, USA
| | - Michael M Hoffman
- Princess Margaret Cancer Centre, Toronto, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Canada.,Department of Computer Science, University of Toronto, Toronto, Canada
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, USA. .,Department of Computer Science, University of Washington, Seattle, USA.
| |
Collapse
|
10
|
Vitali F, Li Q, Schissler AG, Berghout J, Kenost C, Lussier YA. Developing a 'personalome' for precision medicine: emerging methods that compute interpretable effect sizes from single-subject transcriptomes. Brief Bioinform 2019; 20:789-805. [PMID: 29272327 PMCID: PMC6585155 DOI: 10.1093/bib/bbx149] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2017] [Revised: 10/06/2017] [Indexed: 12/13/2022] Open
Abstract
The development of computational methods capable of analyzing -omics data at the individual level is critical for the success of precision medicine. Although unprecedented opportunities now exist to gather data on an individual's -omics profile ('personalome'), interpreting and extracting meaningful information from single-subject -omics remain underdeveloped, particularly for quantitative non-sequence measurements, including complete transcriptome or proteome expression and metabolite abundance. Conventional bioinformatics approaches have largely been designed for making population-level inferences about 'average' disease processes; thus, they may not adequately capture and describe individual variability. Novel approaches intended to exploit a variety of -omics data are required for identifying individualized signals for meaningful interpretation. In this review-intended for biomedical researchers, computational biologists and bioinformaticians-we survey emerging computational and translational informatics methods capable of constructing a single subject's 'personalome' for predicting clinical outcomes or therapeutic responses, with an emphasis on methods that provide interpretable readouts. Key points: (i) the single-subject analytics of the transcriptome shows the greatest development to date and, (ii) the methods were all validated in simulations, cross-validations or independent retrospective data sets. This survey uncovers a growing field that offers numerous opportunities for the development of novel validation methods and opens the door for future studies focusing on the interpretation of comprehensive 'personalomes' through the integration of multiple -omics, providing valuable insights into individual patient outcomes and treatments.
Collapse
Affiliation(s)
| | - Qike Li
- BIO5 Institute, University of Arizona, Tucson, AZ, USA
| | | | | | | | | |
Collapse
|
11
|
Peng H, Zheng Y, Blumenstein M, Tao D, Li J. CRISPR/Cas9 cleavage efficiency regression through boosting algorithms and Markov sequence profiling. Bioinformatics 2018; 34:3069-3077. [PMID: 29672669 DOI: 10.1093/bioinformatics/bty298] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Accepted: 04/12/2018] [Indexed: 12/26/2022] Open
Abstract
Motivation CRISPR/Cas9 system is a widely used genome editing tool. A prediction problem of great interests for this system is: how to select optimal single-guide RNAs (sgRNAs), such that its cleavage efficiency is high meanwhile the off-target effect is low. Results This work proposed a two-step averaging method (TSAM) for the regression of cleavage efficiencies of a set of sgRNAs by averaging the predicted efficiency scores of a boosting algorithm and those by a support vector machine (SVM). We also proposed to use profiled Markov properties as novel features to capture the global characteristics of sgRNAs. These new features are combined with the outstanding features ranked by the boosting algorithm for the training of the SVM regressor. TSAM improved the mean Spearman correlation coefficiencies comparing with the state-of-the-art performance on benchmark datasets containing thousands of human, mouse and zebrafish sgRNAs. Our method can be also converted to make binary distinctions between efficient and inefficient sgRNAs with superior performance to the existing methods. The analysis reveals that highly efficient sgRNAs have lower melting temperature at the middle of the spacer, cut at 5'-end closer parts of the genome and contain more 'A' but less 'G' comparing with inefficient ones. Comprehensive further analysis also demonstrates that our tool can predict an sgRNA's cutting efficiency with consistently good performance no matter it is expressed from an U6 promoter in cells or from a T7 promoter in vitro. Availability and implementation Online tool is available at http://www.aai-bioinfo.com/CRISPR/. Python and Matlab source codes are freely available at https://github.com/penn-hui/TSAM. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hui Peng
- Faculty of Engineering and Information Technology, Advanced Analytics Institute, University of Technology Sydney, Broadway, NSW, Australia
| | - Yi Zheng
- Faculty of Engineering and Information Technology, Advanced Analytics Institute, University of Technology Sydney, Broadway, NSW, Australia
| | - Michael Blumenstein
- Faculty of Engineering and Information Technology, Advanced Analytics Institute, University of Technology Sydney, Broadway, NSW, Australia
| | - Dacheng Tao
- Faculty of Engineering and Information Technologies, School of Information Technologies, University of Sydney, Darlington, NSW, Australia
| | - Jinyan Li
- Faculty of Engineering and Information Technology, Advanced Analytics Institute, University of Technology Sydney, Broadway, NSW, Australia
| |
Collapse
|
12
|
Jo K, Jung I, Moon JH, Kim S. Influence maximization in time bounded network identifies transcription factors regulating perturbed pathways. Bioinformatics 2017; 32:i128-i136. [PMID: 27307609 PMCID: PMC4908359 DOI: 10.1093/bioinformatics/btw275] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Motivation: To understand the dynamic nature of the biological process, it is crucial to identify perturbed pathways in an altered environment and also to infer regulators that trigger the response. Current time-series analysis methods, however, are not powerful enough to identify perturbed pathways and regulators simultaneously. Widely used methods include methods to determine gene sets such as differentially expressed genes or gene clusters and these genes sets need to be further interpreted in terms of biological pathways using other tools. Most pathway analysis methods are not designed for time series data and they do not consider gene-gene influence on the time dimension. Results: In this article, we propose a novel time-series analysis method TimeTP for determining transcription factors (TFs) regulating pathway perturbation, which narrows the focus to perturbed sub-pathways and utilizes the gene regulatory network and protein–protein interaction network to locate TFs triggering the perturbation. TimeTP first identifies perturbed sub-pathways that propagate the expression changes along the time. Starting points of the perturbed sub-pathways are mapped into the network and the most influential TFs are determined by influence maximization technique. The analysis result is visually summarized in TF-Pathway map in time clock. TimeTP was applied to PIK3CA knock-in dataset and found significant sub-pathways and their regulators relevant to the PIP3 signaling pathway. Availability and Implementation: TimeTP is implemented in Python and available at http://biohealth.snu.ac.kr/software/TimeTP/. Supplementary information:Supplementary data are available at Bioinformatics online. Contact:sunkim.bioinfo@snu.ac.kr
Collapse
Affiliation(s)
- Kyuri Jo
- Department of Computer Science and Engineering
| | - Inuk Jung
- Interdisciplinary Program in Bioinformatics
| | | | - Sun Kim
- Department of Computer Science and Engineering Interdisciplinary Program in Bioinformatics Bioinformatics Institute, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
13
|
Towle KM, Vederas JC. Structural features of many circular and leaderless bacteriocins are similar to those in saposins and saposin-like peptides. MEDCHEMCOMM 2017; 8:276-285. [PMID: 30108744 PMCID: PMC6072434 DOI: 10.1039/c6md00607h] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2016] [Accepted: 12/09/2016] [Indexed: 12/14/2022]
Abstract
Bacteriocins are potent antimicrobial peptides that are ribosomally produced and exported by bacteria, presumably to aid elimination of competing microorganisms. Many circular and linear leaderless bacteriocins have a recuring three dimensional structural motif known as a saposin-like fold. Although these bacteriocin sizes and sequences are often quite different, and their mechanisms of action vary, this conserved motif of multiple helices appears critical for activity and may enable peptide-lipid and peptide-receptor interactions in target bacterial cell membranes. Comparisons between electrostatic surfaces and hydrophobic surface maps of different bacteriocins are discussed emphasizing similarities and differences in the context of proposed modes of action.
Collapse
Affiliation(s)
- K M Towle
- Department of Chemistry , University of Alberta , Edmonton , Alberta , T6G 2G2 Canada .
| | - J C Vederas
- Department of Chemistry , University of Alberta , Edmonton , Alberta , T6G 2G2 Canada .
| |
Collapse
|
14
|
Oh S, Song S. Differential Gene Expression (DEX) and Alternative Splicing Events (ASE) for Temporal Dynamic Processes Using HMMs and Hierarchical Bayesian Modeling Approaches. Methods Mol Biol 2017; 1552:165-176. [PMID: 28224498 DOI: 10.1007/978-1-4939-6753-7_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In gene expression profile, data analysis pipeline is categorized into four levels, major downstream tasks, i.e., (1) identification of differential expression; (2) clustering co-expression patterns; (3) classification of subtypes of samples; and (4) detection of genetic regulatory networks, are performed posterior to preprocessing procedure such as normalization techniques. To be more specific, temporal dynamic gene expression data has its inherent feature, namely, two neighboring time points (previous and current state) are highly correlated with each other, compared to static expression data which samples are assumed as independent individuals. In this chapter, we demonstrate how HMMs and hierarchical Bayesian modeling methods capture the horizontal time dependency structures in time series expression profiles by focusing on the identification of differential expression. In addition, those differential expression genes and transcript variant isoforms over time detected in core prerequisite steps can be generally further applied in detection of genetic regulatory networks to comprehensively uncover dynamic repertoires in the aspects of system biology as the coupled framework.
Collapse
Affiliation(s)
- Sunghee Oh
- Department of Computer Science and Statistics, Jeju National University, Jeju City, 690-756, South Korea.
| | - Seongho Song
- Department of Mathematical Science, University of Cincinnati, Cincinnati, OH, 45221-0025, USA
| |
Collapse
|
15
|
Bulashevska S, Priest C, Speicher D, Zimmermann J, Westermann F, Cremers AB. SwitchFinder - a novel method and query facility for discovering dynamic gene expression patterns. BMC Bioinformatics 2016; 17:532. [PMID: 27978814 PMCID: PMC5160026 DOI: 10.1186/s12859-016-1391-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2016] [Accepted: 11/29/2016] [Indexed: 12/20/2022] Open
Abstract
Background Biological systems and processes are highly dynamic. To gain insights into their functioning time-resolved measurements are necessary. Time-resolved gene expression data captures temporal behaviour of the genes genome-wide under various biological conditions: in response to stimuli, during cell cycle, differentiation or developmental programs. Dissecting dynamic gene expression patterns from this data may shed light on the functioning of the gene regulatory system. The present approach facilitates this discovery. The fundamental idea behind it is the following: there are change-points (switches) in the gene behaviour separating intervals of increasing and decreasing activity, whereas the intervals may have different durations. Elucidating the switch-points is important for the identification of biologically meanigfull features and patterns of the gene dynamics. Results We developed a statistical method, called SwitchFinder, for the analysis of time-series data, in particular gene expression data, based on a change-point model. Fitting the model to the gene expression time-courses indicates switch-points between increasing and decreasing activities of each gene. Two types of the model - based on linear and on generalized logistic function - were used to capture the data between the switch-points. Model inference was facilitated with the Bayesian methodology using Markov chain Monte Carlo (MCMC) technique Gibbs sampling. Further on, we introduced features of the switch-points: growth, decay, spike and cleft, which reflect important dynamic aspects. With this, the gene expression profiles are represented in a qualitative manner - as sets of the dynamic features at their onset-times. We developed a Web application of the approach, enabling to put queries to the gene expression time-courses and to deduce groups of genes with common dynamic patterns. SwitchFinder was applied to our original data - the gene expression time-series measured in neuroblastoma cell line upon treatment with all-trans retinoic acid (ATRA). The analysis revealed eight patterns of the gene expression responses to ATRA, indicating the induction of the BMP, WNT, Notch, FGF and NTRK-receptor signaling pathways involved in cell differentiation, as well as the repression of the cell-cycle related genes. Conclusions SwitchFinder is a novel approach to the analysis of biological time-series data, supporting inference and interactive exploration of its inherent dynamic patterns, hence facilitating biological discovery process. SwitchFinder is freely available at https://newbioinformatics.eu/switchfinder. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1391-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Svetlana Bulashevska
- B-IT Bonn-Aachen International Center for Information Technology, University of Bonn, Dahlmannstr. 2, Bonn, 53113, Germany.
| | - Colin Priest
- Sigma Plus Consulting Pty Ltd, Crows Nest, 2065, NSW, Australia
| | - Daniel Speicher
- B-IT Bonn-Aachen International Center for Information Technology, University of Bonn, Dahlmannstr. 2, Bonn, 53113, Germany.,Institute of Computer Science, University of Bonn, Roemerstr. 164, Bonn, 53117, Germany
| | - Jörg Zimmermann
- B-IT Bonn-Aachen International Center for Information Technology, University of Bonn, Dahlmannstr. 2, Bonn, 53113, Germany.,Institute of Computer Science, University of Bonn, Roemerstr. 164, Bonn, 53117, Germany
| | - Frank Westermann
- Neuroblastoma Genomics Group, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg, 69120, Germany
| | - Armin B Cremers
- B-IT Bonn-Aachen International Center for Information Technology, University of Bonn, Dahlmannstr. 2, Bonn, 53113, Germany.,Institute of Computer Science, University of Bonn, Roemerstr. 164, Bonn, 53117, Germany
| |
Collapse
|
16
|
Wang T, Zhuang J, Obara K, Tsuruoka H. Hidden Markov modelling of sparse time series from non-volcanic tremor observations. J R Stat Soc Ser C Appl Stat 2016. [DOI: 10.1111/rssc.12194] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Ting Wang
- University of Otago; Dunedin New Zealand
| | - Jiancang Zhuang
- Institute of Statistical Mathematics; Tokyo
- Graduate University for Advanced Studies; Tokyo Japan
| | | | | |
Collapse
|
17
|
Abstract
Background All biological processes are inherently dynamic. Biological systems evolve transiently or sustainably according to sequential time points after perturbation by environment insults, drugs and chemicals. Investigating the temporal behavior of molecular events has been an important subject to understand the underlying mechanisms governing the biological system in response to, such as, drug treatment. The intrinsic complexity of time series data requires appropriate computational algorithms for data interpretation. In this study, we propose, for the first time, the application of dynamic topic models (DTM) for analyzing time-series gene expression data. Results A large time-series toxicogenomics dataset was studied. It contains over 3144 microarrays of gene expression data corresponding to rat livers treated with 131 compounds (most are drugs) at two doses (control and high dose) in a repeated schedule containing four separate time points (4-, 8-, 15- and 29-day). We analyzed, with DTM, the topics (consisting of a set of genes) and their biological interpretations over these four time points. We identified hidden patterns embedded in this time-series gene expression profiles. From the topic distribution for compound-time condition, a number of drugs were successfully clustered by their shared mode-of-action such as PPARɑ agonists and COX inhibitors. The biological meaning underlying each topic was interpreted using diverse sources of information such as functional analysis of the pathways and therapeutic uses of the drugs. Additionally, we found that sample clusters produced by DTM are much more coherent in terms of functional categories when compared to traditional clustering algorithms. Conclusions We demonstrated that DTM, a text mining technique, can be a powerful computational approach for clustering time-series gene expression profiles with the probabilistic representation of their dynamic features along sequential time frames. The method offers an alternative way for uncovering hidden patterns embedded in time series gene expression profiles to gain enhanced understanding of dynamic behavior of gene regulation in the biological system. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1225-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mikyung Lee
- NIH/National Center for Advancing Translational Sciences, Rockville, MD, USA
| | - Zhichao Liu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, Jefferson, AR, USA
| | - Ruili Huang
- NIH/National Center for Advancing Translational Sciences, Rockville, MD, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, Jefferson, AR, USA.
| |
Collapse
|
18
|
Carey M, Wu S, Gan G, Wu H. Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans. Infect Dis Model 2016; 1:28-39. [PMID: 29928719 PMCID: PMC5963321 DOI: 10.1016/j.idm.2016.07.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2016] [Accepted: 07/08/2016] [Indexed: 12/25/2022] Open
Abstract
Many pragmatic clustering methods have been developed to group data vectors or objects into clusters so that the objects in one cluster are very similar and objects in different clusters are distinct based on some similarity measure. The availability of time course data has motivated researchers to develop methods, such as mixture and mixed-effects modelling approaches, that incorporate the temporal information contained in the shape of the trajectory of the data. However, there is still a need for the development of time-course clustering methods that can adequately deal with inhomogeneous clusters (some clusters are quite large and others are quite small). Here we propose two such methods, hierarchical clustering (IHC) and iterative pairwise-correlation clustering (IPC). We evaluate and compare the proposed methods to the Markov Cluster Algorithm (MCL) and the generalised mixed-effects model (GMM) using simulation studies and an application to a time course gene expression data set from a study containing human subjects who were challenged by a live influenza virus. We identify four types of temporal gene response modules to influenza infection in humans, i.e., single-gene modules (SGM), small-size modules (SSM), medium-size modules (MSM) and large-size modules (LSM). The LSM contain genes that perform various fundamental biological functions that are consistent across subjects. The SSM and SGM contain genes that perform either different or similar biological functions that have complex temporal responses to the virus and are unique to each subject. We show that the temporal response of the genes in the LSM have either simple patterns with a single peak or trough a consequence of the transient stimuli sustained or state-transitioning patterns pertaining to developmental cues and that these modules can differentiate the severity of disease outcomes. Additionally, the size of gene response modules follows a power-law distribution with a consistent exponent across all subjects, which reveals the presence of universality in the underlying biological principles that generated these modules.
Collapse
Affiliation(s)
- Michelle Carey
- Department of Biostatistics and Computational Biology, Crittenden Blvd, Rochester, NY 14642, USA
- Department of Mathematics and Statistics, McGill University, 805 Sherbrooke Street West, Montreal, Canada
| | - Shuang Wu
- Department of Biostatistics and Computational Biology, Crittenden Blvd, Rochester, NY 14642, USA
- Biogen, 250 Binney Street, Cambridge, MA, USA
| | - Guojun Gan
- Department of Mathematics, University of Connecticut, 196 Auditorium Road U-3009, Storrs, USA
| | - Hulin Wu
- Department of Biostatistics and Computational Biology, Crittenden Blvd, Rochester, NY 14642, USA
- Department of Biostatistics, University of Texas Health Science Center School of Public Health at Houston, 1200 Pressler Street, Houston, USA
- Corresponding author. Department of Biostatistics, University of Texas Health Science Center School of Public Health at Houston, 1200 Pressler Street, Houston, USA.
| |
Collapse
|
19
|
Michalopoulos K, Zervakis M, Deiber MP, Bourbakis N. Classification of EEG Single Trial Microstates Using Local Global Graphs and Discrete Hidden Markov Models. Int J Neural Syst 2016; 26:1650036. [DOI: 10.1142/s0129065716500362] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We present a novel synergistic methodology for the spatio-temporal analysis of single Electroencephalogram (EEG) trials. This new methodology is based on the novel synergy of Local Global Graph (LG graph) to characterize define the structural features of the EEG topography as a global descriptor for robust comparison of dominant topographies (microstates) and Hidden Markov Models (HMM) to model the topographic sequence in a unique way. In particular, the LG graph descriptor defines similarity and distance measures that can be successfully used for the difficult comparison of the extracted LG graphs in the presence of noise. In addition, hidden states represent periods of stationary distribution of topographies that constitute the equivalent of the microstates in the model. The transitions between the different microstates and the formed syntactic patterns can reveal differences in the processing of the input stimulus between different pathologies. We train the HMM model to learn the transitions between the different microstates and express the syntactic patterns that appear in the single trials in a compact and efficient way. We applied this methodology in single trials consisting of normal subjects and patients with Progressive Mild Cognitive Impairment (PMCI) to discriminate these two groups. The classification results show that this approach is capable to efficiently discriminate between control and Progressive MCI single trials. Results indicate that HMMs provide physiologically meaningful results that can be used in the syntactic analysis of Event Related Potentials.
Collapse
Affiliation(s)
- Kostas Michalopoulos
- Center of Assistive Research Technologies, Wright State University, Dayton OH 45435, USA
| | | | - Marie-Pierre Deiber
- Faculty of Medicine, INSERM Unit 1039, La Tronche, France
- Biomarkers of Vulnerability Unit, Dep. of Psychiatry, University Hospitals, Geneva, Switzerland
| | - Nikolaos Bourbakis
- Center of Assistive Research Technologies, Wright State University, Dayton OH 45435, USA
| |
Collapse
|
20
|
Fidaner IB, Cankorur-Cetinkaya A, Dikicioglu D, Kirdar B, Cemgil AT, Oliver SG. CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data. Bioinformatics 2016; 32:388-97. [PMID: 26411869 PMCID: PMC4734040 DOI: 10.1093/bioinformatics/btv532] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Accepted: 09/03/2015] [Indexed: 11/13/2022] Open
Abstract
Motivation: Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use and flexible tools designed specifically for the analysis of time-series datasets. Results: We present a novel statistical application called CLUSTERnGO, which uses a model-based clustering algorithm that fulfils this need. This algorithm involves two components of operation. Component 1 constructs a Bayesian non-parametric model (Infinite Mixture of Piecewise Linear Sequences) and Component 2, which applies a novel clustering methodology (Two-Stage Clustering). The software can also assign biological meaning to the identified clusters using an appropriate ontology. It applies multiple hypothesis testing to report the significance of these enrichments. The algorithm has a four-phase pipeline. The application can be executed using either command-line tools or a user-friendly Graphical User Interface. The latter has been developed to address the needs of both specialist and non-specialist users. We use three diverse test cases to demonstrate the flexibility of the proposed strategy. In all cases, CLUSTERnGO not only outperformed existing algorithms in assigning unique GO term enrichments to the identified clusters, but also revealed novel insights regarding the biological systems examined, which were not uncovered in the original publications. Availability and implementation: The C++ and QT source codes, the GUI applications for Windows, OS X and Linux operating systems and user manual are freely available for download under the GNU GPL v3 license at http://www.cmpe.boun.edu.tr/content/CnG. Contact:sgo24@cam.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Ayca Cankorur-Cetinkaya
- Department of Chemical Engineering, Bogazici University, Istanbul, Turkey and Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Duygu Dikicioglu
- Department of Chemical Engineering, Bogazici University, Istanbul, Turkey and Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Betul Kirdar
- Department of Chemical Engineering, Bogazici University, Istanbul, Turkey and
| | | | - Stephen G Oliver
- Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
21
|
Gao Q, Ostendorf E, Cruz JA, Jin R, Kramer DM, Chen J. Inter-functional analysis of high-throughput phenotype data by non-parametric clustering and its application to photosynthesis. Bioinformatics 2015; 32:67-76. [PMID: 26342101 DOI: 10.1093/bioinformatics/btv515] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2015] [Accepted: 08/25/2015] [Indexed: 01/20/2023] Open
Abstract
MOTIVATION Phenomics is the study of the properties and behaviors of organisms (i.e. their phenotypes) on a high-throughput scale. New computational tools are needed to analyze complex phenomics data, which consists of multiple traits/behaviors that interact with each other and are dependent on external factors, such as genotype and environmental conditions, in a way that has not been well studied. RESULTS We deployed an efficient framework for partitioning complex and high dimensional phenotype data into distinct functional groups. To achieve this, we represented measured phenotype data from each genotype as a cloud-of-points, and developed a novel non-parametric clustering algorithm to cluster all the genotypes. When compared with conventional clustering approaches, the new method is advantageous in that it makes no assumption about the parametric form of the underlying data distribution and is thus particularly suitable for phenotype data analysis. We demonstrated the utility of the new clustering technique by distinguishing novel phenotypic patterns in both synthetic data and a high-throughput plant photosynthetic phenotype dataset. We biologically verified the clustering results using four Arabidopsis chloroplast mutant lines. AVAILABILITY AND IMPLEMENTATION Software is available at www.msu.edu/~jinchen/NPM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT jinchen@msu.edu, kramerd8@cns.msu.edu or rongjin@cse.msu.edu.
Collapse
Affiliation(s)
- Qiaozi Gao
- Department of Computer Science and Engineering
| | | | | | - Rong Jin
- Department of Computer Science and Engineering
| | - David M Kramer
- Department of Energy Plant Research Laboratory and Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Jin Chen
- Department of Computer Science and Engineering, Department of Energy Plant Research Laboratory and
| |
Collapse
|
22
|
Modeling and Classification of Kinetic Patterns of Dynamic Metabolic Biomarkers in Physical Activity. PLoS Comput Biol 2015; 11:e1004454. [PMID: 26317529 PMCID: PMC4552566 DOI: 10.1371/journal.pcbi.1004454] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 07/09/2015] [Indexed: 11/19/2022] Open
Abstract
The objectives of this work were the classification of dynamic metabolic biomarker candidates and the modeling and characterization of kinetic regulatory mechanisms in human metabolism with response to external perturbations by physical activity. Longitudinal metabolic concentration data of 47 individuals from 4 different groups were examined, obtained from a cycle ergometry cohort study. In total, 110 metabolites (within the classes of acylcarnitines, amino acids, and sugars) were measured through a targeted metabolomics approach, combining tandem mass spectrometry (MS/MS) with the concept of stable isotope dilution (SID) for metabolite quantitation. Biomarker candidates were selected by combined analysis of maximum fold changes (MFCs) in concentrations and P-values resulting from statistical hypothesis testing. Characteristic kinetic signatures were identified through a mathematical modeling approach utilizing polynomial fitting. Modeled kinetic signatures were analyzed for groups with similar behavior by applying hierarchical cluster analysis. Kinetic shape templates were characterized, defining different forms of basic kinetic response patterns, such as sustained, early, late, and other forms, that can be used for metabolite classification. Acetylcarnitine (C2), showing a late response pattern and having the highest values in MFC and statistical significance, was classified as late marker and ranked as strong predictor (MFC = 1.97, P < 0.001). In the class of amino acids, highest values were shown for alanine (MFC = 1.42, P < 0.001), classified as late marker and strong predictor. Glucose yields a delayed response pattern, similar to a hockey stick function, being classified as delayed marker and ranked as moderate predictor (MFC = 1.32, P < 0.001). These findings coincide with existing knowledge on central metabolic pathways affected in exercise physiology, such as β-oxidation of fatty acids, glycolysis, and glycogenolysis. The presented modeling approach demonstrates high potential for dynamic biomarker identification and the investigation of kinetic mechanisms in disease or pharmacodynamics studies using MS data from longitudinal cohort studies. Human metabolism is controlled through basic kinetic regulatory mechanisms, where the overall system aims to maintain a state of homeostasis. In response to external perturbations, such as environmental influences, nutrition or physical exercise, circulating metabolites show specific kinetic response patterns, which can be computationally modeled. In this work, we searched for dynamic metabolic biomarker candidates and analyzed specific kinetic mechanisms from longitudinal metabolic concentration data, obtained through a cycle ergometry stress test. In total, 110 metabolites measured from blood samples of 47 individuals were analyzed using tandem mass spectrometry (MS/MS). Dynamic biomarker candidates could be selected based on the amplitudes of changes in metabolite concentrations and the significance of statistical hypothesis testing. We were able to characterize specific kinetic patterns for groups of similarly behaving metabolites. Kinetic shape templates were identified, defining basic kinetic response patterns to physical exercise, such as sustained, early, late and other shape forms. The presented approach contributes to a better understanding of (patho)physiological biochemical mechanisms in human health, disease or during drug therapy, by offering tools for classifying dynamic biomarker candidates and for modeling and characterizing kinetic regulatory mechanisms from longitudinal experimental data.
Collapse
|
23
|
|
24
|
Oh S, Song S, Dasgupta N, Grabowski G. The analytical landscape of static and temporal dynamics in transcriptome data. Front Genet 2014; 5:35. [PMID: 24600473 PMCID: PMC3929947 DOI: 10.3389/fgene.2014.00035] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Accepted: 01/30/2014] [Indexed: 12/16/2022] Open
Abstract
Interpreting gene expression profiles often involves statistical analysis of large numbers of differentially expressed genes, isoforms, and alternative splicing events at either static or dynamic spectrums. Reduced sequencing costs have made feasible dense time-series analysis of gene expression using RNA-seq; however, statistical methods in the context of temporal RNA-seq data are poorly developed. Here we will review current methods for identifying temporal changes in gene expression using RNA-seq, which are limited to static pairwise comparisons of time points and which fail to account for temporal dependencies in gene expression patterns. We also review recently developed very few number of temporal dynamic RNA-seq specific methods. Application and development of RNA-specific temporal dynamic methods have been continuously under the development, yet, it is still in infancy. We fully cover microarray specific temporal methods and transcriptome studies in initial digital technology (e.g., SAGE) between traditional microarray and new RNA-seq.
Collapse
Affiliation(s)
- Sunghee Oh
- Division of Human Genetics, Department of Pediatrics, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| | - Seongho Song
- Department of Mathematical Sciences, McMicken College of Arts and Sciences, University of Cincinnati Cincinnati, OH, USA
| | - Nupur Dasgupta
- Division of Human Genetics, Department of Pediatrics, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| | - Gregory Grabowski
- Division of Human Genetics, Department of Pediatrics, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| |
Collapse
|
25
|
Time-series RNA-seq analysis package (TRAP) and its application to the analysis of rice, Oryza sativa L. ssp. Japonica, upon drought stress. Methods 2014; 67:364-72. [PMID: 24518221 DOI: 10.1016/j.ymeth.2014.02.001] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2013] [Revised: 01/17/2014] [Accepted: 02/01/2014] [Indexed: 12/19/2022] Open
Abstract
Measuring expression levels of genes at the whole genome level can be useful for many purposes, especially for revealing biological pathways underlying specific phenotype conditions. When gene expression is measured over a time period, we have opportunities to understand how organisms react to stress conditions over time. Thus many biologists routinely measure whole genome level gene expressions at multiple time points. However, there are several technical difficulties for analyzing such whole genome expression data. In addition, these days gene expression data is often measured by using RNA-sequencing rather than microarray technologies and then analysis of expression data is much more complicated since the analysis process should start with mapping short reads and produce differentially activated pathways and also possibly interactions among pathways. In addition, many useful tools for analyzing microarray gene expression data are not applicable for the RNA-seq data. Thus a comprehensive package for analyzing time series transcriptome data is much needed. In this article, we present a comprehensive package, Time-series RNA-seq Analysis Package (TRAP), integrating all necessary tasks such as mapping short reads, measuring gene expression levels, finding differentially expressed genes (DEGs), clustering and pathway analysis for time-series data in a single environment. In addition to implementing useful algorithms that are not available for RNA-seq data, we extended existing pathway analysis methods, ORA and SPIA, for time series analysis and estimates statistical values for combined dataset by an advanced metric. TRAP also produces visual summary of pathway interactions. Gene expression change labeling, a practical clustering method used in TRAP, enables more accurate interpretation of the data when combined with pathway analysis. We applied our methods on a real dataset for the analysis of rice (Oryza sativa L. Japonica nipponbare) upon drought stress. The result showed that TRAP was able to detect pathways more accurately than several existing methods. TRAP is available at http://biohealth.snu.ac.kr/software/TRAP/.
Collapse
|
26
|
Kim J, Ogden RT, Kim H. A method to identify differential expression profiles of time-course gene data with Fourier transformation. BMC Bioinformatics 2013; 14:310. [PMID: 24134721 PMCID: PMC4015127 DOI: 10.1186/1471-2105-14-310] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Accepted: 10/10/2013] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Time course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis. RESULTS This work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization.The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles. CONCLUSIONS Using this method, we identified a set of cell-cycle-regulated time-course yeast genes. The proposed method is general and can be potentially used to identify genes which have the same patterns or biological processes, and help facing the present and forthcoming challenges of data analysis in functional genomics.
Collapse
Affiliation(s)
- Jaehee Kim
- Department of Statistics, Duksung Women’s University, Seoul, South Korea
| | | | - Haseong Kim
- Biochemicals and Synthetic Biology Research Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon, South Korea
| |
Collapse
|
27
|
Godsey B. Improved inference of gene regulatory networks through integrated Bayesian clustering and dynamic modeling of time-course expression data. PLoS One 2013; 8:e68358. [PMID: 23935862 PMCID: PMC3720774 DOI: 10.1371/journal.pone.0068358] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2013] [Accepted: 06/03/2013] [Indexed: 12/23/2022] Open
Abstract
Inferring gene regulatory networks from expression data is difficult, but it is common and often useful. Most network problems are under-determined--there are more parameters than data points--and therefore data or parameter set reduction is often necessary. Correlation between variables in the model also contributes to confound network coefficient inference. In this paper, we present an algorithm that uses integrated, probabilistic clustering to ease the problems of under-determination and correlated variables within a fully Bayesian framework. Specifically, ours is a dynamic Bayesian network with integrated Gaussian mixture clustering, which we fit using variational Bayesian methods. We show, using public, simulated time-course data sets from the DREAM4 Challenge, that our algorithm outperforms non-clustering methods in many cases (7 out of 25) with fewer samples, rarely underperforming (1 out of 25), and often selects a non-clustering model if it better describes the data. Source code (GNU Octave) for BAyesian Clustering Over Networks (BACON) and sample data are available at: http://code.google.com/p/bacon-for-genetic-networks.
Collapse
Affiliation(s)
- Brian Godsey
- Department of Statistics and Probability Theory, Vienna University of Technology, Vienna, Austria.
| |
Collapse
|
28
|
Munger SC, Natarajan A, Looger LL, Ohler U, Capel B. Fine time course expression analysis identifies cascades of activation and repression and maps a putative regulator of mammalian sex determination. PLoS Genet 2013; 9:e1003630. [PMID: 23874228 PMCID: PMC3708841 DOI: 10.1371/journal.pgen.1003630] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2013] [Accepted: 05/28/2013] [Indexed: 02/07/2023] Open
Abstract
In vertebrates, primary sex determination refers to the decision within a bipotential organ precursor to differentiate as a testis or ovary. Bifurcation of organ fate begins between embryonic day (E) 11.0-E12.0 in mice and likely involves a dynamic transcription network that is poorly understood. To elucidate the first steps of sexual fate specification, we profiled the XX and XY gonad transcriptomes at fine granularity during this period and resolved cascades of gene activation and repression. C57BL/6J (B6) XY gonads showed a consistent ~5-hour delay in the activation of most male pathway genes and repression of female pathway genes relative to 129S1/SvImJ, which likely explains the sensitivity of the B6 strain to male-to-female sex reversal. Using this fine time course data, we predicted novel regulatory genes underlying expression QTLs (eQTLs) mapped in a previous study. To test predictions, we developed an in vitro gonad primary cell assay and optimized a lentivirus-based shRNA delivery method to silence candidate genes and quantify effects on putative targets. We provide strong evidence that Lmo4 (Lim-domain only 4) is a novel regulator of sex determination upstream of SF1 (Nr5a1), Sox9, Fgf9, and Col9a3. This approach can be readily applied to identify regulatory interactions in other systems.
Collapse
Affiliation(s)
- Steven C. Munger
- Department of Cell Biology, Duke University, Durham, North Carolina, United States of America
- Center for Genome Dynamics, The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Anirudh Natarajan
- Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina, United States of America
| | - Loren L. Looger
- Howard Hughes Medical Institute, Janelia Farm Research Campus, Ashburn, Virginia, United States of America
| | - Uwe Ohler
- Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina, United States of America
- Department of Biostatistics & Bioinformatics, Duke University, Durham, North Carolina, United States of America
| | - Blanche Capel
- Department of Cell Biology, Duke University, Durham, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
29
|
Time series expression analyses using RNA-seq: a statistical approach. BIOMED RESEARCH INTERNATIONAL 2013; 2013:203681. [PMID: 23586021 PMCID: PMC3622290 DOI: 10.1155/2013/203681] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2012] [Revised: 01/10/2013] [Accepted: 01/15/2013] [Indexed: 11/29/2022]
Abstract
RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis.
Collapse
|
30
|
Qian L, Zheng H, Zhou H, Qin R, Li J. Classification of time series gene expression in clinical studies via integration of biological network. PLoS One 2013; 8:e58383. [PMID: 23516469 PMCID: PMC3596388 DOI: 10.1371/journal.pone.0058383] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2012] [Accepted: 02/04/2013] [Indexed: 12/24/2022] Open
Abstract
The increasing availability of time series expression datasets, although promising, raises a number of new computational challenges. Accordingly, the development of suitable classification methods to make reliable and sound predictions is becoming a pressing issue. We propose, here, a new method to classify time series gene expression via integration of biological networks. We evaluated our approach on 2 different datasets and showed that the use of a hidden Markov model/Gaussian mixture models hybrid explores the time-dependence of the expression data, thereby leading to better prediction results. We demonstrated that the biclustering procedure identifies function-related genes as a whole, giving rise to high accordance in prognosis prediction across independent time series datasets. In addition, we showed that integration of biological networks into our method significantly improves prediction performance. Moreover, we compared our approach with several state-of-the-art algorithms and found that our method outperformed previous approaches with regard to various criteria. Finally, our approach achieved better prediction results on early-stage data, implying the potential of our method for practical prediction.
Collapse
Affiliation(s)
- Liwei Qian
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, People's Republic of China
| | - Haoran Zheng
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, People's Republic of China
- Anhui Key Laboratory of Software Engineering in Computing and Communication, University of Science and Technology of China, Hefei, People's Republic of China
- Department of Systems Biology, University of Science and Technology of China, Hefei, People's Republic of China
- * E-mail:
| | - Hong Zhou
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, People's Republic of China
| | - Ruibin Qin
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, People's Republic of China
| | - Jinlong Li
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, People's Republic of China
| |
Collapse
|
31
|
Manshaei R, Sobhe Bidari P, Aliyari Shoorehdeli M, Feizi A, Lohrasebi T, Malboobi MA, Kyan M, Alirezaie J. Hybrid-controlled neurofuzzy networks analysis resulting in genetic regulatory networks reconstruction. ISRN BIOINFORMATICS 2012; 2012:419419. [PMID: 25969749 PMCID: PMC4393070 DOI: 10.5402/2012/419419] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2012] [Accepted: 08/15/2012] [Indexed: 12/03/2022]
Abstract
Reverse engineering of gene regulatory networks (GRNs) is the process of estimating genetic interactions of a cellular system from gene expression data. In this paper, we propose a novel hybrid systematic algorithm based on neurofuzzy network for reconstructing GRNs from observational gene expression data when only a medium-small number of measurements are available. The approach uses fuzzy logic to transform gene expression values into qualitative descriptors that can be evaluated by using a set of defined rules. The algorithm uses neurofuzzy network to model genes effects on other genes followed by four stages of decision making to extract gene interactions. One of the main features of the proposed algorithm is that an optimal number of fuzzy rules can be easily and rapidly extracted without overparameterizing. Data analysis and simulation are conducted on microarray expression profiles of S. cerevisiae cell cycle and demonstrate that the proposed algorithm not only selects the patterns of the time series gene expression data accurately, but also provides models with better reconstruction accuracy when compared with four published algorithms: DBNs, VBEM, time delay ARACNE, and PF subjected to LASSO. The accuracy of the proposed approach is evaluated in terms of recall and F-score for the network reconstruction task.
Collapse
Affiliation(s)
- Roozbeh Manshaei
- Electrical and Computer Engineering Department, Ryerson University, Toronto, ON, Canada M5B 2K3
| | - Pooya Sobhe Bidari
- Electrical and Computer Engineering Department, Ryerson University, Toronto, ON, Canada M5B 2K3
| | - Mahdi Aliyari Shoorehdeli
- Electrical and Computer Engineering Department, K.N. Toosi University of Technology, Tehran 16315-1355, Iran
| | - Amir Feizi
- Department of Chemical and Biological Engineering, Systems and Synthetic Biology Group, Chalmers University, 41296 Gutenberg, Sweden
| | - Tahmineh Lohrasebi
- National Institute of Genetic Engineering and Biotechnology (NIGEB), Tehran 14965/161, Iran
| | - Mohammad Ali Malboobi
- National Institute of Genetic Engineering and Biotechnology (NIGEB), Tehran 14965/161, Iran
| | - Matthew Kyan
- Electrical and Computer Engineering Department, Ryerson University, Toronto, ON, Canada M5B 2K3
| | - Javad Alirezaie
- Electrical and Computer Engineering Department, Ryerson University, Toronto, ON, Canada M5B 2K3
| |
Collapse
|
32
|
Ayadi W, Elloumi M, Hao JK. BiMine+: An efficient algorithm for discovering relevant biclusters of DNA microarray data. Knowl Based Syst 2012. [DOI: 10.1016/j.knosys.2012.04.017] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
33
|
Bar-Joseph Z, Gitter A, Simon I. Studying and modelling dynamic biological processes using time-series gene expression data. Nat Rev Genet 2012; 13:552-64. [PMID: 22805708 DOI: 10.1038/nrg3244] [Citation(s) in RCA: 291] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Biological processes are often dynamic, thus researchers must monitor their activity at multiple time points. The most abundant source of information regarding such dynamic activity is time-series gene expression data. These data are used to identify the complete set of activated genes in a biological process, to infer their rates of change, their order and their causal effects and to model dynamic systems in the cell. In this Review we discuss the basic patterns that have been observed in time-series experiments, how these patterns are combined to form expression programs, and the computational analysis, visualization and integration of these data to infer models of dynamic biological systems.
Collapse
Affiliation(s)
- Ziv Bar-Joseph
- Lane Center for Computational Biology and Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.
| | | | | |
Collapse
|
34
|
Ayadi W, Elloumi M, Hao JK. Pattern-driven neighborhood search for biclustering of microarray data. BMC Bioinformatics 2012; 13 Suppl 7:S11. [PMID: 22594997 PMCID: PMC3348021 DOI: 10.1186/1471-2105-13-s7-s11] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biclustering aims at finding subgroups of genes that show highly correlated behaviors across a subgroup of conditions. Biclustering is a very useful tool for mining microarray data and has various practical applications. From a computational point of view, biclustering is a highly combinatorial search problem and can be solved with optimization methods. RESULTS We describe a stochastic pattern-driven neighborhood search algorithm for the biclustering problem. Starting from an initial bicluster, the proposed method improves progressively the quality of the bicluster by adjusting some genes and conditions. The adjustments are based on the quality of each gene and condition with respect to the bicluster and the initial data matrix. The performance of the method was evaluated on two well-known microarray datasets (Yeast cell cycle and Saccharomyces cerevisiae), showing that it is able to obtain statistically and biologically significant biclusters. The proposed method was also compared with six reference methods from the literature. CONCLUSIONS The proposed method is computationally fast and can be applied to discover significant biclusters. It can also used to effectively improve the quality of existing biclusters provided by other biclustering methods.
Collapse
Affiliation(s)
- Wassim Ayadi
- LERIA, Université d'Angers, 2 Boulevard Lavoisier, 49045 Angers Cedex 01, France
- LaTICE, Higher School of Sciences and Technologies of Tunis, 5 Avenue Taha Hussein, B. P. : 56, Bab Menara, 1008 Tunis, University of Tunis, Tunisia
| | - Mourad Elloumi
- LaTICE, Higher School of Sciences and Technologies of Tunis, 5 Avenue Taha Hussein, B. P. : 56, Bab Menara, 1008 Tunis, University of Tunis, Tunisia
| | - Jin-Kao Hao
- LERIA, Université d'Angers, 2 Boulevard Lavoisier, 49045 Angers Cedex 01, France
| |
Collapse
|
35
|
Castro-Melchor M, Le H, Hu WS. Transcriptome data analysis for cell culture processes. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2012; 127:27-70. [PMID: 22194060 DOI: 10.1007/10_2011_116] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
In the past decade, DNA microarrays have fundamentally changed the way we study complex biological systems. By measuring the expression levels of thousands of transcripts, the paradigm of studying organisms has shifted from focusing on the local phenomena of a few genes to surveying the whole genome. DNA microarrays are used in a variety of ways, from simple comparisons between two samples to more intricate time-series studies. With the large number of genes being studied, the dimensionality of the problem is inevitably high. The analysis of microarray data thus requires specific approaches. In the case of time-series microarray studies, data analysis is further complicated by the correlation between successive time points in a series.In this review, we survey the methodologies used in the analysis of static and time-series microarray data, covering data pre-processing, identification of differentially expressed genes, profile pattern recognition, pathway analysis, and network reconstruction. When available, examples of their use in mammalian cell cultures are presented.
Collapse
|
36
|
Redestig H, Costa IG. Detection and interpretation of metabolite-transcript coresponses using combined profiling data. ACTA ACUST UNITED AC 2011; 27:i357-65. [PMID: 21685093 PMCID: PMC3117345 DOI: 10.1093/bioinformatics/btr231] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Motivation: Studying the interplay between gene expression and metabolite levels can yield important information on the physiology of stress responses and adaptation strategies. Performing transcriptomics and metabolomics in parallel during time-series experiments represents a systematic way to gain such information. Several combined profiling datasets have been added to the public domain and they form a valuable resource for hypothesis generating studies. Unfortunately, detecting coresponses between transcript levels and metabolite abundances is non-trivial: they cannot be assumed to overlap directly with underlying biochemical pathways and they may be subject to time delays and obscured by considerable noise. Results: Our aim was to predict pathway comemberships between metabolites and genes based on their coresponses to applied stress. We found that in the presence of strong noise and time-shifted responses, a hidden Markov model-based similarity outperforms the simpler Pearson correlation but performs comparably or worse in their absence. Therefore, we propose a supervised method that applies pathway information to summarize similarity statistics to a consensus statistic that is more informative than any of the single measures. Using four combined profiling datasets, we show that comembership between metabolites and genes can be predicted for numerous KEGG pathways; this opens opportunities for the detection of transcriptionally regulated pathways and novel metabolically related genes. Availability: A command-line software tool is available at http://www.cin.ufpe.br/~igcf/Metabolites. Contact:henning@psc.riken.jp; igcf@cin.ufpe.br Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
37
|
Lee HJ, Kim PA, Park MR. Clustering of Time-Course Microarray Data Using Pharmacokinetic Parameter. KOREAN JOURNAL OF APPLIED STATISTICS 2011. [DOI: 10.5351/kjas.2011.24.4.623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
38
|
Dalton L, Ballarin V, Brun M. Clustering algorithms: on learning, validation, performance, and applications to genomics. Curr Genomics 2011; 10:430-45. [PMID: 20190957 PMCID: PMC2766793 DOI: 10.2174/138920209789177601] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2009] [Revised: 04/20/2009] [Accepted: 05/11/2009] [Indexed: 11/22/2022] Open
Abstract
The development of microarray technology has enabled scientists to measure the expression of thousands of genes simultaneously, resulting in a surge of interest in several disciplines throughout biology and medicine. While data clustering has been used for decades in image processing and pattern recognition, in recent years it has joined this wave of activity as a popular technique to analyze microarrays. To illustrate its application to genomics, clustering applied to genes from a set of microarray data groups together those genes whose expression levels exhibit similar behavior throughout the samples, and when applied to samples it offers the potential to discriminate pathologies based on their differential patterns of gene expression. Although clustering has now been used for many years in the context of gene expression microarrays, it has remained highly problematic. The choice of a clustering algorithm and validation index is not a trivial one, more so when applying them to high throughput biological or medical data. Factors to consider when choosing an algorithm include the nature of the application, the characteristics of the objects to be analyzed, the expected number and shape of the clusters, and the complexity of the problem versus computational power available. In some cases a very simple algorithm may be appropriate to tackle a problem, but many situations may require a more complex and powerful algorithm better suited for the job at hand. In this paper, we will cover the theoretical aspects of clustering, including error and learning, followed by an overview of popular clustering algorithms and classical validation indices. We also discuss the relative performance of these algorithms and indices and conclude with examples of the application of clustering to computational biology.
Collapse
Affiliation(s)
- Lori Dalton
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843-3128, USA
| | | | | |
Collapse
|
39
|
Njambere EN, Clarke BB, Zhang N. Dimeric oligonucleotide probes enhance diagnostic macroarray performance. J Microbiol Methods 2011; 86:52-61. [DOI: 10.1016/j.mimet.2011.03.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2011] [Revised: 03/26/2011] [Accepted: 03/26/2011] [Indexed: 11/26/2022]
|
40
|
Schilling R, Costa IG, Schliep A. pGQL: A probabilistic graphical query language for gene expression time courses. BioData Min 2011; 4:9. [PMID: 21501515 PMCID: PMC3096586 DOI: 10.1186/1756-0381-4-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2010] [Accepted: 04/18/2011] [Indexed: 11/24/2022] Open
Abstract
Background Timeboxes are graphical user interface widgets that were proposed to specify queries on time course data. As queries can be very easily defined, an exploratory analysis of time course data is greatly facilitated. While timeboxes are effective, they have no provisions for dealing with noisy data or data with fluctuations along the time axis, which is very common in many applications. In particular, this is true for the analysis of gene expression time courses, which are mostly derived from noisy microarray measurements at few unevenly sampled time points. From a data mining point of view the robust handling of data through a sound statistical model is of great importance. Results We propose probabilistic timeboxes, which correspond to a specific class of Hidden Markov Models, that constitutes an established method in data mining. Since HMMs are a particular class of probabilistic graphical models we call our method Probabilistic Graphical Query Language. Its implementation was realized in the free software package pGQL. We evaluate its effectiveness in exploratory analysis on a yeast sporulation data set. Conclusions We introduce a new approach to define dynamic, statistical queries on time course data. It supports an interactive exploration of reasonably large amounts of data and enables users without expert knowledge to specify fairly complex statistical models with ease. The expressivity of our approach is by its statistical nature greater and more robust with respect to amplitude and frequency fluctuation than the prior, deterministic timeboxes.
Collapse
Affiliation(s)
- Ruben Schilling
- Max Planck Institute for Molecular Genetics, Department of Computational Biology, Ihnestr, 63-73, 14195 Berlin, Germany.
| | | | | |
Collapse
|
41
|
Sun W, Wei Z. Multiple Testing for Pattern Identification, With Applications to Microarray Time-Course Experiments. J Am Stat Assoc 2011. [DOI: 10.1198/jasa.2011.ap09587] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
42
|
|
43
|
Zhang B, Chen B, Wu T, Xuan Z, Zhu X, Chen R. Estimating developmental states of tumors and normal tissues using a linear time-ordered model. BMC Bioinformatics 2011; 12:53. [PMID: 21310084 PMCID: PMC3223864 DOI: 10.1186/1471-2105-12-53] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2010] [Accepted: 02/11/2011] [Indexed: 11/18/2022] Open
Abstract
Background Tumor cells are considered to have an aberrant cell state, and some evidence indicates different development states appearing in the tumorigenesis. Embryonic development and stem cell differentiation are ordered processes in which the sequence of events over time is highly conserved. The "cancer attractor" concept integrates normal developmental processes and tumorigenesis into a high-dimensional "cell state space", and provides a reasonable explanation of the relationship between these two biological processes from theoretical viewpoint. However, it is hard to describe such relationship by using existed experimental data; moreover, the measurement of different development states is also difficult. Results Here, by applying a novel time-ordered linear model based on a co-bisector which represents the joint direction of a series of vectors, we described the trajectories of development process by a line and showed different developmental states of tumor cells from developmental timescale perspective in a cell state space. This model was used to transform time-course developmental expression profiles of human ESCs, normal mouse liver, ovary and lung tissue into "cell developmental state lines". Then these cell state lines were applied to observe the developmental states of different tumors and their corresponding normal samples. Mouse liver and ovarian tumors showed different similarity to early development stage. Similarly, human glioma cells and ovarian tumors became developmentally "younger". Conclusions The time-ordered linear model captured linear projected development trajectories in a cell state space. Meanwhile it also reflected the change tendency of gene expression over time from the developmental timescale perspective, and our finding indicated different development states during tumorigenesis processes in different tissues.
Collapse
Affiliation(s)
- Bo Zhang
- Laboratory of Bioinformatics and Noncoding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, PR China
| | | | | | | | | | | |
Collapse
|
44
|
Hafemeister C, Costa IG, Schönhuth A, Schliep A. Classifying short gene expression time-courses with Bayesian estimation of piecewise constant functions. ACTA ACUST UNITED AC 2011; 27:946-52. [PMID: 21266444 DOI: 10.1093/bioinformatics/btr037] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Analyzing short time-courses is a frequent and relevant problem in molecular biology, as, for example, 90% of gene expression time-course experiments span at most nine time-points. The biological or clinical questions addressed are elucidating gene regulation by identification of co-expressed genes, predicting response to treatment in clinical, trial-like settings or classifying novel toxic compounds based on similarity of gene expression time-courses to those of known toxic compounds. The latter problem is characterized by irregular and infrequent sample times and a total lack of prior assumptions about the incoming query, which comes in stark contrast to clinical settings and requires to implicitly perform a local, gapped alignment of time series. The current state-of-the-art method (SCOW) uses a variant of dynamic time warping and models time series as higher order polynomials (splines). RESULTS We suggest to model time-courses monitoring response to toxins by piecewise constant functions, which are modeled as left-right Hidden Markov Models. A Bayesian approach to parameter estimation and inference helps to cope with the short, but highly multivariate time-courses. We improve prediction accuracy by 7% and 4%, respectively, when classifying toxicology and stress response data. We also reduce running times by at least a factor of 140; note that reasonable running times are crucial when classifying response to toxins. In conclusion, we have demonstrated that appropriate reduction of model complexity can result in substantial improvements both in classification performance and running time. AVAILABILITY A Python package implementing the methods described is freely available under the GPL from http://bioinformatics.rutgers.edu/Software/MVQueries/.
Collapse
Affiliation(s)
- Christoph Hafemeister
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | | | | | | |
Collapse
|
45
|
Sinha A, Markatou M. A platform for processing expression of short time series (PESTS). BMC Bioinformatics 2011; 12:13. [PMID: 21223570 PMCID: PMC3027112 DOI: 10.1186/1471-2105-12-13] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2010] [Accepted: 01/11/2011] [Indexed: 11/30/2022] Open
Abstract
Background Time course microarray profiles examine the expression of genes over a time domain. They are necessary in order to determine the complete set of genes that are dynamically expressed under given conditions, and to determine the interaction between these genes. Because of cost and resource issues, most time series datasets contain less than 9 points and there are few tools available geared towards the analysis of this type of data. Results To this end, we introduce a platform for Processing Expression of Short Time Series (PESTS). It was designed with a focus on usability and interpretability of analyses for the researcher. As such, it implements several standard techniques for comparability as well as visualization functions. However, it is designed specifically for the unique methods we have developed for significance analysis, multiple test correction and clustering of short time series data. The central tenet of these methods is the use of biologically relevant features for analysis. Features summarize short gene expression profiles, inherently incorporate dependence across time, and allow for both full description of the examined curve and missing data points. Conclusions PESTS is fully generalizable to other types of time series analyses. PESTS implements novel methods as well as several standard techniques for comparability and visualization functions. These features and functionality make PESTS a valuable resource for a researcher's toolkit. PESTS is available to download for free to academic and non-profit users at http://www.mailman.columbia.edu/academic-departments/biostatistics/research-service/software-development.
Collapse
Affiliation(s)
- Anshu Sinha
- Department of Biostatistics, Columbia University, New York, NY, USA
| | | |
Collapse
|
46
|
Ghandhi SA, Sinha A, Markatou M, Amundson SA. Time-series clustering of gene expression in irradiated and bystander fibroblasts: an application of FBPA clustering. BMC Genomics 2011; 12:2. [PMID: 21205307 PMCID: PMC3022823 DOI: 10.1186/1471-2164-12-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2010] [Accepted: 01/04/2011] [Indexed: 11/22/2022] Open
Abstract
Background The radiation bystander effect is an important component of the overall biological response of tissues and organisms to ionizing radiation, but the signaling mechanisms between irradiated and non-irradiated bystander cells are not fully understood. In this study, we measured a time-series of gene expression after α-particle irradiation and applied the Feature Based Partitioning around medoids Algorithm (FBPA), a new clustering method suitable for sparse time series, to identify signaling modules that act in concert in the response to direct irradiation and bystander signaling. We compared our results with those of an alternate clustering method, Short Time series Expression Miner (STEM). Results While computational evaluations of both clustering results were similar, FBPA provided more biological insight. After irradiation, gene clusters were enriched for signal transduction, cell cycle/cell death and inflammation/immunity processes; but only FBPA separated clusters by function. In bystanders, gene clusters were enriched for cell communication/motility, signal transduction and inflammation processes; but biological functions did not separate as clearly with either clustering method as they did in irradiated samples. Network analysis confirmed p53 and NF-κB transcription factor-regulated gene clusters in irradiated and bystander cells and suggested novel regulators, such as KDM5B/JARID1B (lysine (K)-specific demethylase 5B) and HDACs (histone deacetylases), which could epigenetically coordinate gene expression after irradiation. Conclusions In this study, we have shown that a new time series clustering method, FBPA, can provide new leads to the mechanisms regulating the dynamic cellular response to radiation. The findings implicate epigenetic control of gene expression in addition to transcription factor networks.
Collapse
Affiliation(s)
- Shanaz A Ghandhi
- Center for Radiological Research, Columbia University, New York, NY 10032, USA
| | | | | | | |
Collapse
|
47
|
Kim J. Clustering change patterns using Fourier transformation with time-course gene expression data. Methods Mol Biol 2011; 734:201-220. [PMID: 21468991 DOI: 10.1007/978-1-61779-086-7_10] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a period of time because biologically related gene groups can share the same change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. This work is aimed at discovering gene groups with similar change patterns which share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. We applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns.
Collapse
Affiliation(s)
- Jaehee Kim
- Department of Statistics, Duksung Women's University, Seoul, South Korea.
| |
Collapse
|
48
|
Shah M, Corbeil J. A general framework for analyzing data from two short time-series microarray experiments. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:14-26. [PMID: 21071793 DOI: 10.1109/tcbb.2009.51] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
We propose a general theoretical framework for analyzing differentially expressed genes and behavior patterns from two homogenous short time-course data. The framework generalizes the recently proposed Hilbert-Schmidt Independence Criterion (HSIC)-based framework adapting it to the time-series scenario by utilizing tensor analysis for data transformation. The proposed framework is effective in yielding criteria that can identify both the differentially expressed genes and time-course patterns of interest between two time-series experiments without requiring to explicitly cluster the data. The results, obtained by applying the proposed framework with a linear kernel formulation, on various data sets are found to be both biologically meaningful and consistent with published studies.
Collapse
Affiliation(s)
- Mohak Shah
- Centre for Intelligent Machines, McGill University, McConnell Engineering Building, Room 444, 3480, University Street, Montreal, QC H3A 2A7, Canada.
| | | |
Collapse
|
49
|
Song JZ, Duan KM, Ware T, Surette M. The wavelet-based cluster analysis for temporal gene expression data. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2010:39382. [PMID: 17713589 PMCID: PMC3171337 DOI: 10.1155/2007/39382] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2005] [Revised: 10/01/2006] [Accepted: 03/04/2007] [Indexed: 11/17/2022]
Abstract
A variety of high-throughput methods have made it possible to generate detailed temporal expression data for a single gene or large numbers of genes. Common methods for analysis of these large data sets can be problematic. One challenge is the comparison of temporal expression data obtained from different growth conditions where the patterns of expression may be shifted in time. We propose the use of wavelet analysis to transform the data obtained under different growth conditions to permit comparison of expression patterns from experiments that have time shifts or delays. We demonstrate this approach using detailed temporal data for a single bacterial gene obtained under 72 different growth conditions. This general strategy can be applied in the analysis of data sets of thousands of genes under different conditions.
Collapse
Affiliation(s)
- JZ Song
- Department of Animal and Avian Science, 2413 Animal Science Center, University of Maryland, College Park, MD 20742, USA
| | - KM Duan
- Department of Microbiology and Infectious Diseases, and Department of Biochemistry and Molecular Biology, Health Sciences Centre, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - T Ware
- Department of Mathematics, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - M Surette
- Department of Microbiology and Infectious Diseases, and Department of Biochemistry and Molecular Biology, Health Sciences Centre, University of Calgary, Calgary, AB T2N 4N1, Canada
| |
Collapse
|
50
|
Reverse engineering dynamic temporal models of biological processes and their relationships. Proc Natl Acad Sci U S A 2010; 107:12511-6. [PMID: 20571120 DOI: 10.1073/pnas.1006283107] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Biological processes such as circadian rhythms, cell division, metabolism, and development occur as ordered sequences of events. The synchronization of these coordinated events is essential for proper cell function, and hence the determination of critical time points in biological processes is an important component of all biological investigations. In particular, such critical time points establish logical ordering constraints on subprocesses, impose prerequisites on temporal regulation and spatial compartmentalization, and situate dynamic reorganization of functional elements in preparation for subsequent stages. Thus, building temporal phenomenological representations of biological processes from genome-wide datasets is relevant in formulating biological hypotheses on: how processes are mechanistically regulated; how the regulations vary on an evolutionary scale, and how their inadvertent disregulation leads to a diseased state or fatality. This paper presents a general framework (GOALIE) to reconstruct temporal models of cellular processes from time-course gene expression data. We mathematically formulate the problem as one of optimally segmenting datasets into a succession of "informative" windows such that time points within a window expose concerted clusters of gene action whereas time points straddling window boundaries constitute points of significant restructuring. We illustrate here how GOALIE successfully brings out the interplay between multiple yeast processes, inferred from combined experimental datasets for the cell cycle and the metabolic cycle.
Collapse
|