1
|
Kuang N, Ma Q, Zheng X, Meng X, Zhai Z, Li Q, Pan J. GeTeSEPdb: A comprehensive database and online tool for the identification and analysis of gene profiles with temporal-specific expression patterns. Comput Struct Biotechnol J 2024; 23:2488-2496. [PMID: 38939556 PMCID: PMC11208770 DOI: 10.1016/j.csbj.2024.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 05/29/2024] [Accepted: 06/04/2024] [Indexed: 06/29/2024] Open
Abstract
Gene expression is dynamic and varies at different stages of processes. The identification of gene profiles with temporal-specific expression patterns can provide valuable insights into ongoing biological processes, such as the cell cycle, cell development, circadian rhythms, or responses to external stimuli such as drug treatments or viral infections. However, currently, no database defines, identifies or archives gene profiles with temporal-specific expression patterns. Here, using a high-throughput regression analysis approach, eight linear and nonlinear parametric models were fitted to gene expression profiles from time-series experiments to identify eight types of gene profiles with temporal-specific expression patterns. We curated 2684 time-series transcriptome datasets and identified 2644,370 gene profiles exhibiting temporal-specific expression patterns. The results were stored in the database GeTeSEPdb (gene profiles with temporal-specific expression patterns database, http://www.inbirg.com/GeTeSEPdb/). Moreover, we implemented an online tool to identify gene profiles with temporal-specific expression patterns from user-submitted data. In summary, GeTeSEPdb is a comprehensive web service that can be used to identify and analyse gene profiles with temporal-specific expression patterns. This approach facilitates the exploration of transcriptional changes and temporal patterns of responses. We firmly believe that GeTeSEPdb will become a valuable resource for biologists and bioinformaticians.
Collapse
Affiliation(s)
- Ni Kuang
- Basic Medicine Research and Innovation Center for Novel Target and Therapeutic Intervention, Ministry of Education, Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China
| | - Qinfeng Ma
- Basic Medicine Research and Innovation Center for Novel Target and Therapeutic Intervention, Ministry of Education, Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China
| | - Xiao Zheng
- Basic Medicine Research and Innovation Center for Novel Target and Therapeutic Intervention, Ministry of Education, Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China
| | - Xuehang Meng
- Basic Medicine Research and Innovation Center for Novel Target and Therapeutic Intervention, Ministry of Education, Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China
| | - Zhaoyu Zhai
- Basic Medicine Research and Innovation Center for Novel Target and Therapeutic Intervention, Ministry of Education, Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China
| | - Qiang Li
- Basic Medicine Research and Innovation Center for Novel Target and Therapeutic Intervention, Ministry of Education, Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China
| | - Jianbo Pan
- Basic Medicine Research and Innovation Center for Novel Target and Therapeutic Intervention, Ministry of Education, Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China
- Precision Medicine Center, The Second Affiliated Hospital of Chongqing Medical University, Chongqing 400010, China
| |
Collapse
|
2
|
Pantazis LJ, Frechtel GD, Cerrone GE, García R, Iglesias Molli AE. Phenotype similarities in automatically grouped T2D patients by variation-based clustering of IL-1β gene expression. EJIFCC 2023; 34:228-244. [PMID: 37868088 PMCID: PMC10588079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/24/2023]
Abstract
Background Analyzing longitudinal gene expression data is extremely challenging due to limited prior information, high dimensionality, and heterogeneity. Similar difficulties arise in research of multifactorial diseases such as Type 2 Diabetes. Clustering methods can be applied to automatically group similar observations. Common clinical values within the resulting groups suggest potential associations. However, applying traditional clustering methods to gene expression over time fails to capture variations in the response. Therefore, shape-based clustering could be applied to identify patient groups by gene expression variation in a large time metabolic compensatory intervention. Objectives To search for clinical grouping patterns between subjects that showed similar structure in the variation of IL-1β gene expression over time. Methods A new approach for shape-based clustering by IL-1β expression behavior was applied to a real longitudinal database of Type 2 Diabetes patients. In order to capture correctly variations in the response, we applied traditional clustering methods to slopes between measurements. Results In this setting, the application of K-Medoids using the Manhattan distance yielded the best results for the corresponding database. Among the resulting groups, one of the clusters presented significant differences in many key clinical values regarding the metabolic syndrome in comparison to the rest of the data. Conclusions The proposed method can be used to group patients according to variation patterns in gene expression (or other applications) and thus, provide clinical insights even when there is no previous knowledge on the subject clinical profile and few timepoints for each individual.
Collapse
Affiliation(s)
- Lucio José Pantazis
- Centro de Sistemas y Control, Instituto Tecnológico de Buenos Aires (ITBA), Lavardén 315 1437, Ciudad Autónoma de Buenos Aires, Argentina
| | - Gustavo Daniel Frechtel
- CONICET-Universidad de Buenos Aires. Instituto de Inmunología, Genética y Metabolismo (INIGEM). Laboratorio de Diabetes y Metabolismo. Avenida Córdoba 2351, Ciudad Autónoma de Buenos Aires, Argentina
- Universidad de Buenos Aires. Facultad de Medicina. Departamento de Medicina. Cátedra de Nutrición. Avenida Córdoba 2351, Ciudad Autónoma de Buenos Aires, Argentina
| | - Gloria Edith Cerrone
- CONICET-Universidad de Buenos Aires. Instituto de Inmunología, Genética y Metabolismo (INIGEM). Laboratorio de Diabetes y Metabolismo. Avenida Córdoba 2351, Ciudad Autónoma de Buenos Aires, Argentina
- Universidad de Buenos Aires. Facultad de Farmacia y Bioquímica. Departamento de Microbiología, Inmunología, Biotecnología y Genética. Cátedra de Genética. Avenida Córdoba 2351, Ciudad Autónoma de Buenos Aires, Argentina
| | - Rafael García
- Centro de Sistemas y Control, Instituto Tecnológico de Buenos Aires (ITBA), Lavardén 315 1437, Ciudad Autónoma de Buenos Aires, Argentina
| | - Andrea Elena Iglesias Molli
- CONICET-Universidad de Buenos Aires. Instituto de Inmunología, Genética y Metabolismo (INIGEM). Laboratorio de Diabetes y Metabolismo. Avenida Córdoba 2351, Ciudad Autónoma de Buenos Aires, Argentina
| |
Collapse
|
3
|
Pham TD. Time-frequency time-space LSTM for robust classification of physiological signals. Sci Rep 2021; 11:6936. [PMID: 33767352 PMCID: PMC7994826 DOI: 10.1038/s41598-021-86432-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2020] [Accepted: 03/16/2021] [Indexed: 02/01/2023] Open
Abstract
Automated analysis of physiological time series is utilized for many clinical applications in medicine and life sciences. Long short-term memory (LSTM) is a deep recurrent neural network architecture used for classification of time-series data. Here time-frequency and time-space properties of time series are introduced as a robust tool for LSTM processing of long sequential data in physiology. Based on classification results obtained from two databases of sensor-induced physiological signals, the proposed approach has the potential for (1) achieving very high classification accuracy, (2) saving tremendous time for data learning, and (3) being cost-effective and user-comfortable for clinical trials by reducing multiple wearable sensors for data recording.
Collapse
Affiliation(s)
- Tuan D. Pham
- grid.449337.e0000 0004 1756 6721Center for Artificial Intelligence, Prince Mohammad Bin Fahd University, Khobar, 31952 Saudi Arabia
| |
Collapse
|
4
|
Tan Q, Thomassen M, Burton M, Mose KF, Andersen KE, Hjelmborg J, Kruse T. Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data. J Integr Bioinform 2017; 14:/j/jib.2017.14.issue-2/jib-2017-0011/jib-2017-0011.xml. [PMID: 28753536 PMCID: PMC6042830 DOI: 10.1515/jib-2017-0011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 04/04/2017] [Indexed: 11/15/2022] Open
Abstract
Modeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray time-course data and for exploring the complex relationships in the omics data for studying their association with disease and health.
Collapse
Affiliation(s)
- Qihua Tan
- Unit of Human Genetics, Department of Clinical Research, University of Southern Denmark, 5000 Odense C, Denmark
- Epidemiology, Biostatistics, and Biodemography, Department of Public Health, University of Southern Denmark, J.B. Winsløws Vej 9B, DK-5000, Odense C, Denmark
| | - Mads Thomassen
- Unit of Human Genetics, Department of Clinical Research, University of Southern Denmark, 5000 Odense C, Denmark
| | - Mark Burton
- Unit of Human Genetics, Department of Clinical Research, University of Southern Denmark, 5000 Odense C, Denmark
| | - Kristian Fredløv Mose
- Department of Dermatology and Allergy Centre, Odense University Hospital, University of Southern Denmark, 5000 Odense C, Denmark
| | - Klaus Ejner Andersen
- Department of Dermatology and Allergy Centre, Odense University Hospital, University of Southern Denmark, 5000 Odense C, Denmark
- Dermatological Investigations Scandinavia, J.B. Winsløwsvej 9, 5000 Odense C, Denmark
- Centre for Innovative Medical Technology, Institute of Clinical Research, University of Southern Denmark, 5000 Odense C, Denmark
| | - Jacob Hjelmborg
- Epidemiology, Biostatistics, and Biodemography, Department of Public Health, University of Southern Denmark, J.B. Winsløws Vej 9B, DK-5000, Odense C, Denmark
| | - Torben Kruse
- Unit of Human Genetics, Department of Clinical Research, University of Southern Denmark, 5000 Odense C, Denmark
| |
Collapse
|
5
|
Ma Q, Shen L, Chen W, Wang J, Wei J, Yu Z. Functional echo state network for time series classification. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.08.081] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
6
|
Natural Cubic Spline Regression Modeling Followed by Dynamic Network Reconstruction for the Identification of Radiation-Sensitivity Gene Association Networks from Time-Course Transcriptome Data. PLoS One 2016; 11:e0160791. [PMID: 27505168 PMCID: PMC4978405 DOI: 10.1371/journal.pone.0160791] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Accepted: 06/14/2016] [Indexed: 11/23/2022] Open
Abstract
Gene expression time-course experiments allow to study the dynamics of transcriptomic changes in cells exposed to different stimuli. However, most approaches for the reconstruction of gene association networks (GANs) do not propose prior-selection approaches tailored to time-course transcriptome data. Here, we present a workflow for the identification of GANs from time-course data using prior selection of genes differentially expressed over time identified by natural cubic spline regression modeling (NCSRM). The workflow comprises three major steps: 1) the identification of differentially expressed genes from time-course expression data by employing NCSRM, 2) the use of regularized dynamic partial correlation as implemented in GeneNet to infer GANs from differentially expressed genes and 3) the identification and functional characterization of the key nodes in the reconstructed networks. The approach was applied on a time-resolved transcriptome data set of radiation-perturbed cell culture models of non-tumor cells with normal and increased radiation sensitivity. NCSRM detected significantly more genes than another commonly used method for time-course transcriptome analysis (BETR). While most genes detected with BETR were also detected with NCSRM the false-detection rate of NCSRM was low (3%). The GANs reconstructed from genes detected with NCSRM showed a better overlap with the interactome network Reactome compared to GANs derived from BETR detected genes. After exposure to 1 Gy the normal sensitive cells showed only sparse response compared to cells with increased sensitivity, which exhibited a strong response mainly of genes related to the senescence pathway. After exposure to 10 Gy the response of the normal sensitive cells was mainly associated with senescence and that of cells with increased sensitivity with apoptosis. We discuss these results in a clinical context and underline the impact of senescence-associated pathways in acute radiation response of normal cells. The workflow of this novel approach is implemented in the open-source Bioconductor R-package splineTimeR.
Collapse
|
7
|
Schleif FM, Hammer B, Monroy JG, Jimenez JG, Blanco-Claraco JL, Biehl M, Petkov N. Odor recognition in robotics applications by discriminative time-series modeling. Pattern Anal Appl 2015. [DOI: 10.1007/s10044-014-0442-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
8
|
Deng H, Runger G, Tuv E, Vladimir M. A time series forest for classification and feature extraction. Inf Sci (N Y) 2013. [DOI: 10.1016/j.ins.2013.02.030] [Citation(s) in RCA: 116] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
9
|
Qian L, Zheng H, Zhou H, Qin R, Li J. Classification of time series gene expression in clinical studies via integration of biological network. PLoS One 2013; 8:e58383. [PMID: 23516469 PMCID: PMC3596388 DOI: 10.1371/journal.pone.0058383] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2012] [Accepted: 02/04/2013] [Indexed: 12/24/2022] Open
Abstract
The increasing availability of time series expression datasets, although promising, raises a number of new computational challenges. Accordingly, the development of suitable classification methods to make reliable and sound predictions is becoming a pressing issue. We propose, here, a new method to classify time series gene expression via integration of biological networks. We evaluated our approach on 2 different datasets and showed that the use of a hidden Markov model/Gaussian mixture models hybrid explores the time-dependence of the expression data, thereby leading to better prediction results. We demonstrated that the biclustering procedure identifies function-related genes as a whole, giving rise to high accordance in prognosis prediction across independent time series datasets. In addition, we showed that integration of biological networks into our method significantly improves prediction performance. Moreover, we compared our approach with several state-of-the-art algorithms and found that our method outperformed previous approaches with regard to various criteria. Finally, our approach achieved better prediction results on early-stage data, implying the potential of our method for practical prediction.
Collapse
Affiliation(s)
- Liwei Qian
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, People's Republic of China
| | - Haoran Zheng
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, People's Republic of China
- Anhui Key Laboratory of Software Engineering in Computing and Communication, University of Science and Technology of China, Hefei, People's Republic of China
- Department of Systems Biology, University of Science and Technology of China, Hefei, People's Republic of China
- * E-mail:
| | - Hong Zhou
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, People's Republic of China
| | - Ruibin Qin
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, People's Republic of China
| | - Jinlong Li
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, People's Republic of China
| |
Collapse
|
10
|
|
11
|
Ghalwash MF, Obradovic Z. Early classification of multivariate temporal observations by extraction of interpretable shapelets. BMC Bioinformatics 2012; 13:195. [PMID: 22873729 PMCID: PMC3475011 DOI: 10.1186/1471-2105-13-195] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Accepted: 07/23/2012] [Indexed: 11/14/2022] Open
Abstract
Background Early classification of time series is beneficial for biomedical informatics problems such including, but not limited to, disease change detection. Early classification can be of tremendous help by identifying the onset of a disease before it has time to fully take hold. In addition, extracting patterns from the original time series helps domain experts to gain insights into the classification results. This problem has been studied recently using time series segments called shapelets. In this paper, we present a method, which we call Multivariate Shapelets Detection (MSD), that allows for early and patient-specific classification of multivariate time series. The method extracts time series patterns, called multivariate shapelets, from all dimensions of the time series that distinctly manifest the target class locally. The time series were classified by searching for the earliest closest patterns. Results The proposed early classification method for multivariate time series has been evaluated on eight gene expression datasets from viral infection and drug response studies in humans. In our experiments, the MSD method outperformed the baseline methods, achieving highly accurate classification by using as little as 40%-64% of the time series. The obtained results provide evidence that using conventional classification methods on short time series is not as accurate as using the proposed methods specialized for early classification. Conclusion For the early classification task, we proposed a method called Multivariate Shapelets Detection (MSD), which extracts patterns from all dimensions of the time series. We showed that the MSD method can classify the time series early by using as little as 40%-64% of the time series’ length.
Collapse
Affiliation(s)
- Mohamed F Ghalwash
- Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, USA.
| | | |
Collapse
|
12
|
Bar-Joseph Z, Gitter A, Simon I. Studying and modelling dynamic biological processes using time-series gene expression data. Nat Rev Genet 2012; 13:552-64. [PMID: 22805708 DOI: 10.1038/nrg3244] [Citation(s) in RCA: 318] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Biological processes are often dynamic, thus researchers must monitor their activity at multiple time points. The most abundant source of information regarding such dynamic activity is time-series gene expression data. These data are used to identify the complete set of activated genes in a biological process, to infer their rates of change, their order and their causal effects and to model dynamic systems in the cell. In this Review we discuss the basic patterns that have been observed in time-series experiments, how these patterns are combined to form expression programs, and the computational analysis, visualization and integration of these data to infer models of dynamic biological systems.
Collapse
Affiliation(s)
- Ziv Bar-Joseph
- Lane Center for Computational Biology and Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.
| | | | | |
Collapse
|
13
|
Redestig H, Costa IG. Detection and interpretation of metabolite-transcript coresponses using combined profiling data. ACTA ACUST UNITED AC 2011; 27:i357-65. [PMID: 21685093 PMCID: PMC3117345 DOI: 10.1093/bioinformatics/btr231] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Motivation: Studying the interplay between gene expression and metabolite levels can yield important information on the physiology of stress responses and adaptation strategies. Performing transcriptomics and metabolomics in parallel during time-series experiments represents a systematic way to gain such information. Several combined profiling datasets have been added to the public domain and they form a valuable resource for hypothesis generating studies. Unfortunately, detecting coresponses between transcript levels and metabolite abundances is non-trivial: they cannot be assumed to overlap directly with underlying biochemical pathways and they may be subject to time delays and obscured by considerable noise. Results: Our aim was to predict pathway comemberships between metabolites and genes based on their coresponses to applied stress. We found that in the presence of strong noise and time-shifted responses, a hidden Markov model-based similarity outperforms the simpler Pearson correlation but performs comparably or worse in their absence. Therefore, we propose a supervised method that applies pathway information to summarize similarity statistics to a consensus statistic that is more informative than any of the single measures. Using four combined profiling datasets, we show that comembership between metabolites and genes can be predicted for numerous KEGG pathways; this opens opportunities for the detection of transcriptionally regulated pathways and novel metabolically related genes. Availability: A command-line software tool is available at http://www.cin.ufpe.br/~igcf/Metabolites. Contact:henning@psc.riken.jp; igcf@cin.ufpe.br Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
14
|
A growth curve model with fractional polynomials for analysing incomplete time-course data in microarray gene expression studies. Adv Bioinformatics 2011; 2011:261514. [PMID: 21966290 PMCID: PMC3182337 DOI: 10.1155/2011/261514] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2011] [Revised: 06/07/2011] [Accepted: 08/02/2011] [Indexed: 11/20/2022] Open
Abstract
Identifying the various gene expression response patterns is a challenging issue in expression microarray time-course experiments. Due to heterogeneity in the regulatory reaction among thousands of genes tested, it is impossible to manually characterize a parametric form for each of the time-course pattern in a gene by gene manner. We introduce a growth curve model with fractional polynomials to automatically capture the various time-dependent expression patterns and meanwhile efficiently handle missing values due to incomplete observations. For each gene, our procedure compares the performances among fractional polynomial models with power terms from a set of fixed values that offer a wide range of curve shapes and suggests a best fitting model. After a limited simulation study, the model has been applied to our human in vivo irritated epidermis data with missing observations to investigate time-dependent transcriptional responses to a chemical irritant. Our method was able to identify the various nonlinear time-course expression trajectories. The integration of growth curves with fractional polynomials provides a flexible way to model different time-course patterns together with model selection and significant gene identification strategies that can be applied in microarray-based time-course gene expression experiments with missing observations.
Collapse
|
15
|
Hafemeister C, Costa IG, Schönhuth A, Schliep A. Classifying short gene expression time-courses with Bayesian estimation of piecewise constant functions. ACTA ACUST UNITED AC 2011; 27:946-52. [PMID: 21266444 DOI: 10.1093/bioinformatics/btr037] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Analyzing short time-courses is a frequent and relevant problem in molecular biology, as, for example, 90% of gene expression time-course experiments span at most nine time-points. The biological or clinical questions addressed are elucidating gene regulation by identification of co-expressed genes, predicting response to treatment in clinical, trial-like settings or classifying novel toxic compounds based on similarity of gene expression time-courses to those of known toxic compounds. The latter problem is characterized by irregular and infrequent sample times and a total lack of prior assumptions about the incoming query, which comes in stark contrast to clinical settings and requires to implicitly perform a local, gapped alignment of time series. The current state-of-the-art method (SCOW) uses a variant of dynamic time warping and models time series as higher order polynomials (splines). RESULTS We suggest to model time-courses monitoring response to toxins by piecewise constant functions, which are modeled as left-right Hidden Markov Models. A Bayesian approach to parameter estimation and inference helps to cope with the short, but highly multivariate time-courses. We improve prediction accuracy by 7% and 4%, respectively, when classifying toxicology and stress response data. We also reduce running times by at least a factor of 140; note that reasonable running times are crucial when classifying response to toxins. In conclusion, we have demonstrated that appropriate reduction of model complexity can result in substantial improvements both in classification performance and running time. AVAILABILITY A Python package implementing the methods described is freely available under the GPL from http://bioinformatics.rutgers.edu/Software/MVQueries/.
Collapse
Affiliation(s)
- Christoph Hafemeister
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | | | | | | |
Collapse
|
16
|
Carreiro AV, Anunciação O, Carriço JA, Madeira SC. Biclustering-Based Classification of Clinical Expression Time Series: A Case Study in Patients with Multiple Sclerosis. 5TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS (PACBB 2011) 2011. [DOI: 10.1007/978-3-642-19914-1_31] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
17
|
Li Y, Ngom A. Classification of Clinical Gene-Sample-Time Microarray Expression Data via Tensor Decomposition Methods. COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS 2011. [DOI: 10.1007/978-3-642-21946-7_22] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
18
|
Szczurek E, Biecek P, Tiuryn J, Vingron M. Introducing knowledge into differential expression analysis. J Comput Biol 2010; 17:953-67. [PMID: 20726790 DOI: 10.1089/cmb.2010.0034] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Gene expression measurements allow determining sets of up- or down-regulated, or unchanged genes in a particular experimental condition. Additional biological knowledge can suggest examples of genes from one of these sets. For instance, known target genes of a transcriptional activator are expected, but are not certain to go down after this activator is knocked out. Available differential expression analysis tools do not take such imprecise examples into account. Here we put forward a novel partially supervised mixture modeling methodology for differential expression analysis. Our approach, guided by imprecise examples, clusters expression data into differentially expressed and unchanged genes. The partially supervised methodology is implemented by two methods: a newly introduced belief-based mixture modeling, and soft-label mixture modeling, a method proved efficient in other applications. We investigate on synthetic data the input example settings favorable for each method. In our tests, both belief-based and soft-label methods prove their advantage over semi-supervised mixture modeling in correcting for erroneous examples. We also compare them to alternative differential expression analysis approaches, showing that incorporation of knowledge yields better performance. We present a broad range of knowledge sources and data to which our partially supervised methodology can be applied. First, we determine targets of Ste12 based on yeast knockout data, guided by a Ste12 DNA-binding experiment. Second, we distinguish miR-1 from miR-124 targets in human by clustering expression data under transfection experiments of both microRNAs, using their computationally predicted targets as examples. Finally, we utilize literature knowledge to improve clustering of time-course expression profiles.
Collapse
Affiliation(s)
- Ewa Szczurek
- Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | | | | | | |
Collapse
|
19
|
Aittokallio T. Dealing with missing values in large-scale studies: microarray data imputation and beyond. Brief Bioinform 2009; 11:253-64. [DOI: 10.1093/bib/bbp059] [Citation(s) in RCA: 109] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|