1
|
Tian T, Cheng R, Wei Z. An empirical Bayes change-point model for transcriptome time-course data. Ann Appl Stat 2021. [DOI: 10.1214/20-aoas1403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Tian Tian
- Department of Computer Science, New Jersey Institute of Technology
| | - Ruihua Cheng
- Big Data Statistics Research Center, Tianjin University of Finance and Economics
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology
| |
Collapse
|
2
|
Deregulated KLF4 Expression in Myeloid Leukemias Alters Cell Proliferation and Differentiation through MicroRNA and Gene Targets. Mol Cell Biol 2015; 36:559-73. [PMID: 26644403 DOI: 10.1128/mcb.00712-15] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2015] [Accepted: 11/20/2015] [Indexed: 12/23/2022] Open
Abstract
Acute myeloid leukemia (AML) is characterized by increased proliferation and blocked differentiation of hematopoietic progenitors mediated, in part, by altered myeloid transcription factor expression. Decreased Krüppel-like factor 4 (KLF4) expression has been observed in AML, but how decreased KLF4 contributes to AML pathogenesis is largely unknown. We demonstrate decreased KLF4 expression in AML patient samples with various cytogenetic aberrations, confirm that KLF4 overexpression promotes myeloid differentiation and inhibits cell proliferation in AML cell lines, and identify new targets of KLF4. We have demonstrated that microRNA 150 (miR-150) expression is decreased in AML and that reintroducing miR-150 expression induces myeloid differentiation and inhibits proliferation of AML cells. We show that KLF family DNA binding sites are necessary for miR-150 promoter activity and that KLF2 or KLF4 overexpression induces miR-150 expression. miR-150 silencing, alone or in combination with silencing of CDKN1A, a well-described KLF4 target, did not fully reverse KLF4-mediated effects. Gene expression profiling and validation identified putative KLF4-regulated genes, including decreased MYC and downstream MYC-regulated gene expression in KLF4-overexpressing cells. Our findings indicate that decreased KLF4 expression mediates antileukemic effects through regulation of gene and microRNA networks, containing miR-150, CDKN1A, and MYC, and provide mechanistic support for therapeutic strategies increasing KLF4 expression.
Collapse
|
3
|
Abu-Jamous B, Fa R, Roberts DJ, Nandi AK. UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets. BMC Bioinformatics 2015; 16:184. [PMID: 26040489 PMCID: PMC4453228 DOI: 10.1186/s12859-015-0614-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 05/16/2015] [Indexed: 12/13/2022] Open
Abstract
Background Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Results Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. Conclusions The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0614-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Basel Abu-Jamous
- Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, Middlesex, UB8 3PH, UK.
| | - Rui Fa
- Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, Middlesex, UB8 3PH, UK.
| | - David J Roberts
- National Health Service Blood and Transplant, Oxford, OX3 9BQ, UK. .,Radcliffe Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DU, UK.
| | - Asoke K Nandi
- Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, Middlesex, UB8 3PH, UK. .,Department of Mathematical Information Technology, University of Jyväskylä, Jyväskylä, Finland.
| |
Collapse
|
4
|
Fa R, Nandi AK. Noise Resistant Generalized Parametric Validity Index of Clustering for Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:741-752. [PMID: 26356344 DOI: 10.1109/tcbb.2014.2312006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Validity indices have been investigated for decades. However, since there is no study of noise-resistance performance of these indices in the literature, there is no guideline for determining the best clustering in noisy data sets, especially microarray data sets. In this paper, we propose a generalized parametric validity (GPV) index which employs two tunable parameters α and β to control the proportions of objects being considered to calculate the dissimilarities. The greatest advantage of the proposed GPV index is its noise-resistance ability, which results from the flexibility of tuning the parameters. Several rules are set to guide the selection of parameter values. To illustrate the noise-resistance performance of the proposed index, we evaluate the GPV index for assessing five clustering algorithms in two gene expression data simulation models with different noise levels and compare the ability of determining the number of clusters with eight existing indices. We also test the GPV in three groups of real gene expression data sets. The experimental results suggest that the proposed GPV index has superior noise-resistance ability and provides fairly accurate judgements.
Collapse
|
5
|
Fa R, Roberts DJ, Nandi AK. SMART: unique splitting-while-merging framework for gene clustering. PLoS One 2014; 9:e94141. [PMID: 24714159 PMCID: PMC3979766 DOI: 10.1371/journal.pone.0094141] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Accepted: 03/14/2014] [Indexed: 11/18/2022] Open
Abstract
Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named "splitting merging awareness tactics" (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms.
Collapse
Affiliation(s)
- Rui Fa
- Department of Electronic and Computer Engineering, Brunel University, Uxbridge, Middlesex, United Kingdom
| | - David J. Roberts
- National Health Service Blood and Transplant, Oxford, United Kingdom
- The University of Oxford, John Radcliffe Hospital, Oxford, United Kingdom
| | - Asoke K. Nandi
- Department of Electronic and Computer Engineering, Brunel University, Uxbridge, Middlesex, United Kingdom
- Department of Mathematical Information Technology, University of Jyväskylä, Jyväskylä, Finland
| |
Collapse
|
6
|
Qin LX, Breeden L, Self SG. Finding gene clusters for a replicated time course study. BMC Res Notes 2014; 7:60. [PMID: 24460656 PMCID: PMC3906880 DOI: 10.1186/1756-0500-7-60] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2013] [Accepted: 01/15/2014] [Indexed: 11/12/2022] Open
Abstract
Background Finding genes that share similar expression patterns across samples is an important question that is frequently asked in high-throughput microarray studies. Traditional clustering algorithms such as K-means clustering and hierarchical clustering base gene clustering directly on the observed measurements and do not take into account the specific experimental design under which the microarray data were collected. A new model-based clustering method, the clustering of regression models method, takes into account the specific design of the microarray study and bases the clustering on how genes are related to sample covariates. It can find useful gene clusters for studies from complicated study designs such as replicated time course studies. Findings In this paper, we applied the clustering of regression models method to data from a time course study of yeast on two genotypes, wild type and YOX1 mutant, each with two technical replicates, and compared the clustering results with K-means clustering. We identified gene clusters that have similar expression patterns in wild type yeast, two of which were missed by K-means clustering. We further identified gene clusters whose expression patterns were changed in YOX1 mutant yeast compared to wild type yeast. Conclusions The clustering of regression models method can be a valuable tool for identifying genes that are coordinately transcribed by a common mechanism.
Collapse
Affiliation(s)
- Li-Xuan Qin
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA.
| | | | | |
Collapse
|
7
|
Bellido M, Stirewalt DL, Zhao LP, Radich JP. Use of gene expression microarrays for the study of acute leukemia. Expert Rev Mol Diagn 2014; 6:733-47. [PMID: 17009907 DOI: 10.1586/14737159.6.5.733] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Genetic lesions found in acute leukemia drive the pathology of the disease in addition to forming reliable classifications of prognosis. However, there is still a reasonable heterogeneity of response among cases with the same genetic lesion. Moreover, many leukemia cases have no detectable genetic marker and these cases have marked heterogeneity of response. How can we learn more about the genes and pathways involved with leukemogenesis and response in the midst of such complexity? Gene expression microarrays are experimental platforms that allow for the simultaneous evaluation of the thousands of mRNA transcripts (the 'transcriptome'). This technology has revolutionized the study of leukemia, giving insight into genes and pathways involved in disease response and the biology involved in specific translocations.
Collapse
Affiliation(s)
- Mar Bellido
- Fred Hutchinson Cancer Research Center, Clinical Research Division, Public Health Sciences Division, 1100 Fairview Ave N., Seattle, WA 98109, USA.
| | | | | | | |
Collapse
|
8
|
Shi J, Qin LX. CORM: An R Package Implementing the Clustering of Regression Models Method for Gene Clustering. Cancer Inform 2014; 13:11-3. [PMID: 25452684 PMCID: PMC4218679 DOI: 10.4137/cin.s13967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Revised: 07/21/2014] [Accepted: 07/21/2014] [Indexed: 11/05/2022] Open
Abstract
We report a new R package implementing the clustering of regression models (CORM) method for clustering genes using gene expression data and provide data examples illustrating each clustering function in the package. The CORM package is freely available at CRAN from http://cran.r-project.org .
Collapse
Affiliation(s)
- Jiejun Shi
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Li-Xuan Qin
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
9
|
De Santis M, Rinaldi F, Falcone E, Lucidi S, Piaggio G, Gurtner A, Farina L. Combining optimization and machine learning techniques for genome-wide prediction of human cell cycle-regulated genes. Bioinformatics 2013; 30:228-33. [DOI: 10.1093/bioinformatics/btt671] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
10
|
Cheng C, Fu Y, Shen L, Gerstein M. Identification of yeast cell cycle regulated genes based on genomic features. BMC SYSTEMS BIOLOGY 2013; 7:70. [PMID: 23895232 PMCID: PMC3734186 DOI: 10.1186/1752-0509-7-70] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2013] [Accepted: 07/09/2013] [Indexed: 11/13/2022]
Abstract
Background Time-course microarray experiments have been widely used to identify cell cycle regulated genes. However, the method is not effective for lowly expressed genes and is sensitive to experimental conditions. To complement microarray experiments, we propose a computational method to predict cell cycle regulated genes based on their genomic features – transcription factor binding and motif profiles. Results Through integrating gene-expression data with ChIP-chip binding and putative binding sites of transcription factors, our method shows high accuracy in discriminating yeast cell cycle regulated genes from non-cell cycle regulated ones. We predict 211 novel cell cycle regulated genes. Our model rediscovers the main cell cycle transcription factors and provides new insights into the regulatory mechanisms. The model also reveals a regulatory circuit mediated by a number of key cell cycle regulators. Conclusions Our model suggests that the periodical pattern of cell cycle genes is largely coded in their promoter regions, which can be captured by motif and transcription factor binding data. Cell cycle is controlled by a relatively small number of master transcription factors. The concept of genomic feature based method can be readily extended to human cell cycle process and other transcriptionally regulated processes, such as tissue-specific expression.
Collapse
Affiliation(s)
- Chao Cheng
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA.
| | | | | | | |
Collapse
|
11
|
Abu-Jamous B, Fa R, Roberts DJ, Nandi AK. Paradigm of tunable clustering using Binarization of Consensus Partition Matrices (Bi-CoPaM) for gene discovery. PLoS One 2013; 8:e56432. [PMID: 23409186 PMCID: PMC3569426 DOI: 10.1371/journal.pone.0056432] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2012] [Accepted: 01/10/2013] [Indexed: 11/19/2022] Open
Abstract
Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.
Collapse
Affiliation(s)
- Basel Abu-Jamous
- Department of Electrical Engineering and Electronics, The University of Liverpool, Brownlow Hill, Liverpool, United Kingdom
| | - Rui Fa
- Department of Electrical Engineering and Electronics, The University of Liverpool, Brownlow Hill, Liverpool, United Kingdom
| | - David J. Roberts
- National Health Service Blood and Transplant, Oxford, United Kingdom
- The University of Oxford, John Radcliffe Hospital, Oxford, United Kingdom
| | - Asoke K. Nandi
- Department of Electrical Engineering and Electronics, The University of Liverpool, Brownlow Hill, Liverpool, United Kingdom
- Department of Mathematical Information Technology, University of Jyväskylä, Jyväskylä, Finland
- * E-mail:
| |
Collapse
|
12
|
Xu C, Wang P, Liu Y, Zhang Y, Fan W, Upton MP, Lohavanichbutr P, Houck JR, Doody DR, Futran ND, Zhao LP, Schwartz SM, Chen C, Méndez E. Integrative genomics in combination with RNA interference identifies prognostic and functionally relevant gene targets for oral squamous cell carcinoma. PLoS Genet 2013; 9:e1003169. [PMID: 23341773 PMCID: PMC3547824 DOI: 10.1371/journal.pgen.1003169] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2012] [Accepted: 10/29/2012] [Indexed: 12/22/2022] Open
Abstract
In oral squamous cell carcinoma (OSCC), metastasis to lymph nodes is associated with a 50% reduction in 5-year survival. To identify a metastatic gene set based on DNA copy number abnormalities (CNAs) of differentially expressed genes, we compared DNA and RNA of OSCC cells laser-microdissected from non-metastatic primary tumors (n = 17) with those from lymph node metastases (n = 20), using Affymetrix 250K Nsp single-nucleotide polymorphism (SNP) arrays and U133 Plus 2.0 arrays, respectively. With a false discovery rate (FDR)<5%, 1988 transcripts were found to be differentially expressed between primary and metastatic OSCC. Of these, 114 were found to have a significant correlation between DNA copy number and gene expression (FDR<0.01). Among these 114 correlated transcripts, the corresponding genomic regions of each of 95 transcripts had CNAs differences between primary and metastatic OSCC (FDR<0.01). Using an independent dataset of 133 patients, multivariable analysis showed that the OSCC–specific and overall mortality hazards ratio (HR) for patients carrying the 95-transcript signature were 4.75 (95% CI: 2.03–11.11) and 3.45 (95% CI: 1.84–6.50), respectively. To determine the degree by which these genes impact cell survival, we compared the growth of five OSCC cell lines before and after knockdown of over-amplified transcripts via a high-throughput siRNA–mediated screen. The expression-knockdown of 18 of the 26 genes tested showed a growth suppression ≥30% in at least one cell line (P<0.01). In particular, cell lines derived from late-stage OSCC were more sensitive to the knockdown of G3BP1 than cell lines derived from early-stage OSCC, and the growth suppression was likely caused by increase in apoptosis. Further investigation is warranted to examine the biological role of these genes in OSCC progression and their therapeutic potentials. Neck lymph node metastasis is the most important prognostic factor in oral squamous cell carcinoma (OSCC). To identify genes associated with this critical step of OSCC progression, we compared DNA copy number aberrations and gene expression differences between tumor cells found in metastatic lymph nodes versus those in non-metastatic primary tumors. We identified 95 transcripts (87 genes) with metastasis-specific genome abnormalities and gene expression. Tested in an independent cohort of 133 OSCC patients, the 95 gene signature was an independent risk factor of disease-specific and overall death, suggesting a disease progression phenotype. We knocked down the expression of over-amplified genes in five OSCC cell lines. Knockdown of 18 of the 26 tested genes suppressed the cell growth in at least one cell line. Interestingly, cell lines derived from late-stage OSCC were more sensitive to the knockdown of G3BP1 than cell lines derived from early-stage OSCC. The knockdown of G3BP1 increased programmed cell death in the p53-mutant but not wild-type OSCC cell lines. Taken together, we demonstrate that CNA–associated transcripts differentially expressed in carcinoma cells with an aggressive phenotype (i.e., metastatic to lymph nodes) can be biomarkers with both prognostic information and functional relevance. Moreover, results suggest that G3BP1 is a potential therapeutic target against late-stage p53-negative OSCC.
Collapse
Affiliation(s)
- Chang Xu
- Department of Otolaryngology–Head and Neck Surgery, University of Washington, Seattle, Washington, United States of America
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Pei Wang
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Yan Liu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Yuzheng Zhang
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Wenhong Fan
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Melissa P. Upton
- Department of Pathology, University of Washington, Seattle, Washington, United States of America
| | - Pawadee Lohavanichbutr
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - John R. Houck
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - David R. Doody
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Neal D. Futran
- Department of Otolaryngology–Head and Neck Surgery, University of Washington, Seattle, Washington, United States of America
| | - Lue Ping Zhao
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Stephen M. Schwartz
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- Department of Epidemiology, University of Washington, Seattle, Washington, United States of America
| | - Chu Chen
- Department of Otolaryngology–Head and Neck Surgery, University of Washington, Seattle, Washington, United States of America
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- Department of Epidemiology, University of Washington, Seattle, Washington, United States of America
| | - Eduardo Méndez
- Department of Otolaryngology–Head and Neck Surgery, University of Washington, Seattle, Washington, United States of America
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- Surgery and Perioperative Care Service, VA Puget Sound Health Care System, Seattle, Washington, United States of America
- * E-mail:
| |
Collapse
|
13
|
Abstract
Principal-oscillation-pattern (POP) analysis is a multivariate and systematic technique for identifying the dynamic characteristics of a system from time-series data. In this study, we demonstrate the first application of POP analysis to genome-wide time-series gene-expression data. We use POP analysis to infer oscillation patterns in gene expression. Typically, a genomic system matrix cannot be directly estimated because the number of genes is usually much larger than the number of time points in a genomic study. Thus, we first identify the POPs of the eigen-genomic system that consists of the first few significant eigengenes obtained by singular value decomposition. By using the linear relationship between eigengenes and genes, we then infer the POPs of the genes. Both simulation data and real-world data are used in this study to demonstrate the applicability of POP analysis to genomic data. We show that POP analysis not only compares favorably with experiments and existing computational methods, but that it also provides complementary information relative to other approaches.
Collapse
|
14
|
Guo X, Pan W. USING WEIGHTED PERMUTATION SCORES TO DETECT DIFFERENTIAL GENE EXPRESSION WITH MICROARRAY DATA. J Bioinform Comput Biol 2011; 3:989-1006. [PMID: 16078371 DOI: 10.1142/s021972000500134x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2004] [Revised: 01/25/2005] [Accepted: 01/26/2005] [Indexed: 11/18/2022]
Abstract
A class of nonparametric statistical methods, including a nonparametric empirical Bayes (EB) method, the Significance Analysis of Microarrays (SAM) and the mixture model method (MMM) have been proposed to detect differential gene expression for replicated microarray experiments. They all depend on constructing a test statistic, for example, a t-statistic, and then using permutation to draw inferences. However, due to special features of microarray data, using standard permutation scores may not estimate the null distribution of the test statistic well, leading to possibly too conservative inferences. We propose a new method of constructing weighted permutation scores to overcome the problem: posterior probabilities of having no differential expression from the EB method are used as weights for genes to better estimate the null distribution of the test statistic. We also propose a weighted method to estimate the false discovery rate (FDR) using the posterior probabilities. Using simulated data and real data for time-course microarray experiments, we show the improved performance of the proposed methods when implemented in MMM, EB and SAM.
Collapse
Affiliation(s)
- Xu Guo
- Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building, MMC 303, Minneapolis, MN 55455-0378, USA
| | | |
Collapse
|
15
|
Wang H, Wang YH, Wu WS. Yeast cell cycle transcription factors identification by variable selection criteria. Gene 2011; 485:172-6. [DOI: 10.1016/j.gene.2011.06.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2011] [Revised: 05/12/2011] [Accepted: 06/03/2011] [Indexed: 01/12/2023]
|
16
|
Wang Y, Xu M, Wang Z, Tao M, Zhu J, Wang L, Li R, Berceli SA, Wu R. How to cluster gene expression dynamics in response to environmental signals. Brief Bioinform 2011; 13:162-74. [PMID: 21746694 DOI: 10.1093/bib/bbr032] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Organisms usually cope with change in the environment by altering the dynamic trajectory of gene expression to adjust the complement of active proteins. The identification of particular sets of genes whose expression is adaptive in response to environmental changes helps to understand the mechanistic base of gene-environment interactions essential for organismic development. We describe a computational framework for clustering the dynamics of gene expression in distinct environments through Gaussian mixture fitting to the expression data measured at a set of discrete time points. We outline a number of quantitative testable hypotheses about the patterns of dynamic gene expression in changing environments and gene-environment interactions causing developmental differentiation. The future directions of gene clustering in terms of incorporations of the latest biological discoveries and statistical innovations are discussed. We provide a set of computational tools that are applicable to modeling and analysis of dynamic gene expression data measured in multiple environments.
Collapse
Affiliation(s)
- Yaqun Wang
- Department of Statistics, Pennsylvania State University, Hershey, PA 17033, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Irigoien I, Vives S, Arenas C. Microarray time course experiments: finding profiles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:464-475. [PMID: 21233526 DOI: 10.1109/tcbb.2009.79] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Time course studies with microarray techniques and experimental replicates are very useful in biomedical research. We present, in replicate experiments, an alternative approach to select and cluster genes according to a new measure for association between genes. First, the procedure normalizes and standardizes the expression profile of each gene, and then, identifies scaling parameters that will further minimize the distance between replicates of the same gene. Then, the procedure filters out genes with a flat profile, detects differences between replicates, and separates genes without significant differences from the rest. For this last group of genes, we define a mean profile for each gene and use it to compute the distance between two genes. Next, a hierarchical clustering procedure is proposed, a statistic is computed for each cluster to determine its compactness, and the total number of classes is determined. For the rest of the genes, those with significant differences between replicates, the procedure detects where the differences between replicates lie, and assigns each gene to the best fitting previously identified profile or defines a new profile. We illustrate this new procedure using simulated data and a representative data set arising from a microarray experiment with replication, and report interesting results.
Collapse
Affiliation(s)
- Itziar Irigoien
- Department of Computation Science and Artificial Intelligence, University of the Basque Country, Manuel de Lardizabal Pasealckua 1, 20080 Donostia, Spain.
| | | | | |
Collapse
|
18
|
Wu FX, Huan J. Guest editorial: Special focus on bioinformatics and systems biology. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:292-293. [PMID: 21298823 DOI: 10.1109/tcbb.2011.16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
|
19
|
Pogosova-Agadjanyan EL, Fan W, Georges GE, Schwartz JL, Kepler CM, Lee H, Suchanek AL, Cronk MR, Brumbaugh A, Engel JH, Yukawa M, Zhao LP, Heimfeld S, Stirewalt DL. Identification of radiation-induced expression changes in nonimmortalized human T cells. Radiat Res 2010; 175:172-84. [PMID: 21268710 DOI: 10.1667/rr1977.1] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In the event of a radiation accident or attack, it will be imperative to quickly assess the amount of radiation exposure to accurately triage victims for appropriate care. RNA-based radiation dosimetry assays offer the potential to rapidly screen thousands of individuals in an efficient and cost-effective manner. However, prior to the development of these assays, it will be critical to identify those genes that will be most useful to delineate different radiation doses. Using global expression profiling, we examined expression changes in nonimmortalized T cells across a wide range of doses (0.15-12 Gy). Because many radiation responses are highly dependent on time, expression changes were examined at three different times (3, 8, and 24 h). Analyses identified 61, 512 and 1310 genes with significant linear dose-dependent expression changes at 3, 8 and 24 h, respectively. Using a stepwise regression procedure, a model was developed to estimate in vitro radiation exposures using the expression of three genes (CDKN1A, PSRC1 and TNFSF4) and validated in an independent test set with 86% accuracy. These findings suggest that RNA-based expression assays for a small subset of genes can be employed to develop clinical biodosimetry assays to be used in assessments of radiation exposure and toxicity.
Collapse
Affiliation(s)
- Era L Pogosova-Agadjanyan
- Clinical Research Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N., Seattle, WA 98109, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Kim BR, McMurry T, Zhao W, Wu R, Berg A. Wavelet-based functional clustering for patterns of high-dimensional dynamic gene expression. J Comput Biol 2010; 17:1067-80. [PMID: 20726793 PMCID: PMC3133835 DOI: 10.1089/cmb.2009.0270] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Functional gene clustering is a statistical approach for identifying the temporal patterns of gene expression measured at a series of time points. By integrating wavelet transformations, a power dimension-reduction technique, noisy gene expression data is smoothed and clustered allowing for new patterns of functional gene expression profiles to be identified. We implement the idea of wavelet dimension reduction into the mixture model for gene clustering, aimed to de-noise the data by transforming an inherently high-dimensional biological problem to its tractable low-dimensional representation. As a first attempt of its kind, we capitalize on the simplest Haar wavelet shrinkage technique to break an original signal down into its spectrum by taking its averages and differences and, subsequently, detect gene expression patterns that differ in the smooth coefficients extracted from noisy time series gene expression data. The method is shown to be effective on simulated data and and on recent time course gene expression data. Supplementary Material is available at www.liebertonline.com .
Collapse
Affiliation(s)
- Bong-Rae Kim
- Department of Dentistry, Seoul National University, Seoul, Republic of Korea
| | - Timothy McMurry
- Department of Mathematical Sciences, DePaul University, Chicago, Illinois
| | - Wei Zhao
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Rongling Wu
- Department of Biostatistics, Pennsylvania State University, Hershey, Pennsylvania
| | - Arthur Berg
- Department of Biostatistics, Pennsylvania State University, Hershey, Pennsylvania
| |
Collapse
|
21
|
Ito T, Kwon HY, Zimdahl B, Congdon KL, Blum J, Lento WE, Zhao C, Lagoo A, Gerrard G, Foroni L, Goldman J, Goh H, Kim SH, Kim DW, Chuah C, Oehler VG, Radich JP, Jordan CT, Reya T. Regulation of myeloid leukaemia by the cell-fate determinant Musashi. Nature 2010; 466:765-8. [PMID: 20639863 PMCID: PMC2918284 DOI: 10.1038/nature09171] [Citation(s) in RCA: 272] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2008] [Accepted: 05/13/2010] [Indexed: 12/25/2022]
Abstract
Chronic myelogenous leukemia (CML) can progress from an indolent chronic phase to an aggressive blast crisis phase1 but the molecular basis of this transition remains poorly understood. Here we have used mouse models of CML2,3 to show that disease progression is regulated by the Musashi-Numb signaling axis4,5. Specifically, we find that chronic phase is marked by high and blast crisis phase by low levels of Numb expression, and that ectopic expression of Numb promotes differentiation and impairs advanced phase disease in vivo. As a possible explanation for the decreased levels of Numb in blast crisis, we show that NUP98-HOXA9, an oncogene associated with blast crisis CML6,7, can trigger expression of the RNA binding protein Musashi2 (Msi2) which in turn represses Numb. Importantly, loss of Msi2 restores Numb expression and significantly impairs the development and propagation of blast crisis CML in vitro and in vivo. Finally, we show that Msi2 expression is not only highly upregulated during human CML progression but is also an early indicator of poorer prognosis. These data show that the Musashi-Numb pathway can control the differentiation of CML cells, and raise the possibility that targeting this pathway may provide a new strategy for therapy of aggressive leukemias.
Collapse
Affiliation(s)
- Takahiro Ito
- Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, North Carolina 27710, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Ferrezuelo F, Colomina N, Futcher B, Aldea M. The transcriptional network activated by Cln3 cyclin at the G1-to-S transition of the yeast cell cycle. Genome Biol 2010; 11:R67. [PMID: 20573214 PMCID: PMC2911115 DOI: 10.1186/gb-2010-11-6-r67] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2010] [Accepted: 06/23/2010] [Indexed: 12/25/2022] Open
Abstract
Background The G1-to-S transition of the cell cycle in the yeast Saccharomyces cerevisiae involves an extensive transcriptional program driven by transcription factors SBF (Swi4-Swi6) and MBF (Mbp1-Swi6). Activation of these factors ultimately depends on the G1 cyclin Cln3. Results To determine the transcriptional targets of Cln3 and their dependence on SBF or MBF, we first have used DNA microarrays to interrogate gene expression upon Cln3 overexpression in synchronized cultures of strains lacking components of SBF and/or MBF. Secondly, we have integrated this expression dataset together with other heterogeneous data sources into a single probabilistic model based on Bayesian statistics. Our analysis has produced more than 200 transcription factor-target assignments, validated by ChIP assays and by functional enrichment. Our predictions show higher internal coherence and predictive power than previous classifications. Our results support a model whereby SBF and MBF may be differentially activated by Cln3. Conclusions Integration of heterogeneous genome-wide datasets is key to building accurate transcriptional networks. By such integration, we provide here a reliable transcriptional network at the G1-to-S transition in the budding yeast cell cycle. Our results suggest that to improve the reliability of predictions we need to feed our models with more informative experimental data.
Collapse
Affiliation(s)
- Francisco Ferrezuelo
- Departament de Ciències Mèdiques Bàsiques, Institut de Recerca Biomèdica de Lleida, Universitat de Lleida, Montserrat Roig 2, 25008 Lleida, Spain.
| | | | | | | |
Collapse
|
23
|
Xu C, Liu Y, Wang P, Fan W, Rue TC, Upton MP, Houck JR, Lohavanichbutr P, Doody DR, Futran ND, Zhao LP, Schwartz SM, Chen C, Méndez E. Integrative analysis of DNA copy number and gene expression in metastatic oral squamous cell carcinoma identifies genes associated with poor survival. Mol Cancer 2010; 9:143. [PMID: 20537188 PMCID: PMC2893102 DOI: 10.1186/1476-4598-9-143] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2010] [Accepted: 06/11/2010] [Indexed: 01/01/2023] Open
Abstract
Background Lymphotropism in oral squamous cell carcinoma (OSCC) is one of the most important prognostic factors of 5-year survival. In an effort to identify genes that may be responsible for the initiation of OSCC lymphotropism, we examined DNA copy number gains and losses and corresponding gene expression changes from tumor cells in metastatic lymph nodes of patients with OSCC. Results We performed integrative analysis of DNA copy number alterations (CNA) and corresponding mRNA expression from OSCC cells isolated from metastatic lymph nodes of 20 patients using Affymetrix 250 K Nsp I SNP and U133 Plus 2.0 arrays, respectively. Overall, genome CNA accounted for expression changes in 31% of the transcripts studied. Genome region 11q13.2-11q13.3 shows the highest correlation between DNA CNA and expression. With a false discovery rate < 1%, 530 transcripts (461 genes) demonstrated a correlation between CNA and expression. Among these, we found two subsets that were significantly associated with OSCC (n = 122) when compared to controls, and with survival (n = 27), as tested using an independent dataset with genome-wide expression profiles for 148 primary OSCC and 45 normal oral mucosa. We fit Cox models to calculate a principal component analysis-derived risk-score for these two gene sets ('122-' or '27-transcript PC'). The models combining the 122- or 27-transcript PC with stage outperformed the model using stage alone in terms of the Area Under the Curve (AUC = 0.82 or 0.86 vs. 0.72, with p = 0.044 or 0.011, respectively). Conclusions Genes exhibiting CNA-correlated expression may have biological impact on carcinogenesis and cancer progression in OSCC. Determination of copy number-associated transcripts associated with clinical outcomes in tumor cells with an aggressive phenotype (i.e., cells metastasized to the lymph nodes) can help prioritize candidate transcripts from high-throughput data for further studies.
Collapse
Affiliation(s)
- Chang Xu
- Department of Otolaryngology-Head and Neck Surgery, University of Washington, Seattle, WA 98195, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Fan X, Pyne S, Liu JS. Bayesian meta-analysis for identifying periodically expressed genes in fission yeast cell cycle. Ann Appl Stat 2010. [DOI: 10.1214/09-aoas300] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
25
|
Bergemann TL. Use of signal quality measurements to gain efficiency in the analysis of cDNA microarray data. J Genet Genomics 2010; 37:265-279. [PMID: 20439103 DOI: 10.1016/s1673-8527(09)60045-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2009] [Revised: 01/19/2010] [Accepted: 02/18/2010] [Indexed: 05/29/2023]
Abstract
This research provides a new way to measure error in microarray data in order to improve gene expression analysis. Microarray data contains many sources of error. In order to glean information about mRNA expression levels, the true signal must first be segregated from noise. This research focuses on the variation that can be captured at the spot level in cDNA microarray images. Variation at other levels, due to differences at the array, dye, and block levels, can be corrected for by a variety of existing normalization procedures. Two signal quality estimates that capture the reliability of each spot printed on a microarray are described. A parametric estimate of within-spot variance, referred to here as sigma(2)(spot), assumes that pixels follow a normal distribution and are spatially correlated. A non-parametric estimate of error, called the mean square prediction error (MSPE), assumes that spots of high quality possess pixels that are similar to their neighbors. This paper will provide a framework to use either spot quality measure in downstream analysis, specifically as weights in regression models. Using these spot quality estimates as weights can result in greater efficiency, in a statistical sense, when modeling microarray data.
Collapse
Affiliation(s)
- Tracy L Bergemann
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA.
| |
Collapse
|
26
|
Krishna R, Li CT, Buchanan-Wollaston V. A temporal precedence based clustering method for gene expression microarray data. BMC Bioinformatics 2010; 11:68. [PMID: 20113513 PMCID: PMC2841598 DOI: 10.1186/1471-2105-11-68] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2009] [Accepted: 01/30/2010] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Time-course microarray experiments can produce useful data which can help in understanding the underlying dynamics of the system. Clustering is an important stage in microarray data analysis where the data is grouped together according to certain characteristics. The majority of clustering techniques are based on distance or visual similarity measures which may not be suitable for clustering of temporal microarray data where the sequential nature of time is important. We present a Granger causality based technique to cluster temporal microarray gene expression data, which measures the interdependence between two time-series by statistically testing if one time-series can be used for forecasting the other time-series or not. RESULTS A gene-association matrix is constructed by testing temporal relationships between pairs of genes using the Granger causality test. The association matrix is further analyzed using a graph-theoretic technique to detect highly connected components representing interesting biological modules. We test our approach on synthesized datasets and real biological datasets obtained for Arabidopsis thaliana. We show the effectiveness of our approach by analyzing the results using the existing biological literature. We also report interesting structural properties of the association network commonly desired in any biological system. CONCLUSIONS Our experiments on synthesized and real microarray datasets show that our approach produces encouraging results. The method is simple in implementation and is statistically traceable at each step. The method can produce sets of functionally related genes which can be further used for reverse-engineering of gene circuits.
Collapse
Affiliation(s)
- Ritesh Krishna
- Department of Computer Science, Warwick University, Coventry CV4 7AL, UK
| | - Chang-Tsun Li
- Department of Computer Science, Warwick University, Coventry CV4 7AL, UK
| | | |
Collapse
|
27
|
Tchagang AB, Bui KV, McGinnis T, Benos PV. Extracting biologically significant patterns from short time series gene expression data. BMC Bioinformatics 2009; 10:255. [PMID: 19695084 PMCID: PMC2743670 DOI: 10.1186/1471-2105-10-255] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2009] [Accepted: 08/20/2009] [Indexed: 11/10/2022] Open
Abstract
Background Time series gene expression data analysis is used widely to study the dynamics of various cell processes. Most of the time series data available today consist of few time points only, thus making the application of standard clustering techniques difficult. Results We developed two new algorithms that are capable of extracting biological patterns from short time point series gene expression data. The two algorithms, ASTRO and MiMeSR, are inspired by the rank order preserving framework and the minimum mean squared residue approach, respectively. However, ASTRO and MiMeSR differ from previous approaches in that they take advantage of the relatively few number of time points in order to reduce the problem from NP-hard to linear. Tested on well-defined short time expression data, we found that our approaches are robust to noise, as well as to random patterns, and that they can correctly detect the temporal expression profile of relevant functional categories. Evaluation of our methods was performed using Gene Ontology (GO) annotations and chromatin immunoprecipitation (ChIP-chip) data. Conclusion Our approaches generally outperform both standard clustering algorithms and algorithms designed specifically for clustering of short time series gene expression data. Both algorithms are available at .
Collapse
Affiliation(s)
- Alain B Tchagang
- Department of Computational Biology, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| | | | | | | |
Collapse
|
28
|
Emmert-Streib F, Dehmer M. Predicting cell cycle regulated genes by causal interactions. PLoS One 2009; 4:e6633. [PMID: 19688096 PMCID: PMC2723924 DOI: 10.1371/journal.pone.0006633] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2009] [Accepted: 04/23/2009] [Indexed: 11/28/2022] Open
Abstract
The fundamental difference between classic and modern biology is that technological innovations allow to generate high-throughput data to get insights into molecular interactions on a genomic scale. These high-throughput data can be used to infer gene networks, e.g., the transcriptional regulatory or signaling network, representing a blue print of the current dynamical state of the cellular system. However, gene networks do not provide direct answers to biological questions, instead, they need to be analyzed to reveal functional information of molecular working mechanisms. In this paper we propose a new approach to analyze the transcriptional regulatory network of yeast to predict cell cycle regulated genes. The novelty of our approach is that, in contrast to all other approaches aiming to predict cell cycle regulated genes, we do not use time series data but base our analysis on the prior information of causal interactions among genes. The major purpose of the present paper is to predict cell cycle regulated genes in S. cerevisiae. Our analysis is based on the transcriptional regulatory network, representing causal interactions between genes, and a list of known periodic genes. No further data are used. Our approach utilizes the causal membership of genes and the hierarchical organization of the transcriptional regulatory network leading to two groups of periodic genes with a well defined direction of information flow. We predict genes as periodic if they appear on unique shortest paths connecting two periodic genes from different hierarchy levels. Our results demonstrate that a classical problem as the prediction of cell cycle regulated genes can be seen in a new light if the concept of a causal membership of a gene is applied consequently. This also shows that there is a wealth of information buried in the transcriptional regulatory network whose unraveling may require more elaborate concepts than it might seem at first.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Computational Biology and Machine Learning, Center for Cancer Research and Cell Biology, School of Biomedical Sciences, Queen's University Belfast, Belfast, United Kingdom.
| | | |
Collapse
|
29
|
Emmert-Streib F, Dehmer M. Hierarchical coordination of periodic genes in the cell cycle of Saccharomyces cerevisiae. BMC SYSTEMS BIOLOGY 2009; 3:76. [PMID: 19619302 PMCID: PMC2721836 DOI: 10.1186/1752-0509-3-76] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2009] [Accepted: 07/20/2009] [Indexed: 11/25/2022]
Abstract
Background Gene networks are a representation of molecular interactions among genes or products thereof and, hence, are forming causal networks. Despite intense studies during the last years most investigations focus so far on inferential methods to reconstruct gene networks from experimental data or on their structural properties, e.g., degree distributions. Their structural analysis to gain functional insights into organizational principles of, e.g., pathways remains so far under appreciated. Results In the present paper we analyze cell cycle regulated genes in S. cerevisiae. Our analysis is based on the transcriptional regulatory network, representing causal interactions and not just associations or correlations between genes, and a list of known periodic genes. No further data are used. Partitioning the transcriptional regulatory network according to a graph theoretical property leads to a hierarchy in the network and, hence, in the information flow allowing to identify two groups of periodic genes. This reveals a novel conceptual interpretation of the working mechanism of the cell cycle and the genes regulated by this pathway. Conclusion Aside from the obtained results for the cell cycle of yeast our approach could be exemplary for the analysis of general pathways by exploiting the rich causal structure of inferred and/or curated gene networks including protein or signaling networks.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Center for Cancer Research and Cell Biology, Queen's University Belfast, UK.
| | | |
Collapse
|
30
|
Chechik G, Koller D. Timing of gene expression responses to environmental changes. J Comput Biol 2009; 16:279-90. [PMID: 19193146 DOI: 10.1089/cmb.2008.13tt] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Cells respond to environmental perturbations with changes in their gene expression that are coordinated in magnitude and time. Timing information about individual genes, rather than clusters, provides a refined way to view and analyze responses, but it is hard to estimate accurately. To analyze response timing of individual genes, we developed a parametric model that captures the typical temporal responses: an abrupt early response followed by a second transition to a steady state. This impulse model explicitly represents natural temporal properties such as the onset and the offset time, and can be estimated robustly, as demonstrated by its superior ability to impute missing values in gene expression data. Using response time of individual genes, we identify relations between gene function and their response timing, showing, for example, how cytosolic ribosomal genes are only repressed after the mitochondrial ribosome is activated. We further demonstrate a strong relation between the binding affinity of a transcription factor and the activation timing of its targets, suggesting that graded binding affinities could be a widely used mechanism for controlling expression timing. See online Supplementary Material at (www.liebertonline.com).
Collapse
Affiliation(s)
- Gal Chechik
- Computer Science Department, Stanford University, Stanford, CA, USA.
| | | |
Collapse
|
31
|
|
32
|
Chen C, Méndez E, Houck J, Fan W, Lohavanichbutr P, Doody D, Yueh B, Futran ND, Upton M, Farwell DG, Schwartz SM, Zhao LP. Gene expression profiling identifies genes predictive of oral squamous cell carcinoma. Cancer Epidemiol Biomarkers Prev 2008; 17:2152-62. [PMID: 18669583 DOI: 10.1158/1055-9965.epi-07-2893] [Citation(s) in RCA: 191] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Oral squamous cell carcinoma (OSCC) is associated with substantial mortality and morbidity. To identify potential biomarkers for the early detection of invasive OSCC, we compared the gene expressions of incident primary OSCC, oral dysplasia, and clinically normal oral tissue from surgical patients without head and neck cancer or preneoplastic oral lesions (controls), using Affymetrix U133 2.0 Plus arrays. We identified 131 differentially expressed probe sets using a training set of 119 OSCC patients and 35 controls. Forward and stepwise logistic regression analyses identified 10 successive combinations of genes which expression differentiated OSCC from controls. The best model included LAMC2, encoding laminin-gamma2 chain, and COL4A1, encoding collagen, type IV alpha1 chain. Subsequent modeling without these two markers showed that COL1A1, encoding collagen, type I alpha1 chain, and PADI1, encoding peptidyl arginine deiminase, type 1, could also distinguish OSCC from controls. We validated these two models using an internal independent testing set of 48 invasive OSCC and 10 controls and an external testing set of 42 head and neck squamous cell carcinoma cases and 14 controls (GEO GSE6791), with sensitivity and specificity above 95%. These two models were also able to distinguish dysplasia (n = 17) from control (n = 35) tissue. Differential expression of these four genes was confirmed by quantitative reverse transcription-PCR. If confirmed in larger studies, the proposed models may hold promise for monitoring local recurrence at surgical margins and the development of second primary oral cancer in patients with OSCC.
Collapse
Affiliation(s)
- Chu Chen
- Program in Epidemiology, Fred Hutchinson Cancer Research Center, DEpartment of Epidemiology, 1100 Fairview Avenue North, M5-C800 P.O. Box 19024, Seattle, WA 98109-1024, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Cheng C, Li LM. Systematic identification of cell cycle regulated transcription factors from microarray time series data. BMC Genomics 2008; 9:116. [PMID: 18315882 PMCID: PMC2315658 DOI: 10.1186/1471-2164-9-116] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2007] [Accepted: 03/03/2008] [Indexed: 02/05/2023] Open
Abstract
Background The cell cycle has long been an important model to study the genome-wide transcriptional regulation. Although several methods have been introduced to identify cell cycle regulated genes from microarray data, they can not be directly used to investigate cell cycle regulated transcription factors (CCRTFs), because for many transcription factors (TFs) it is their activities instead of expressions that are periodically regulated across the cell cycle. To overcome this problem, it is useful to infer TF activities across the cell cycle by integrating microarray expression data with ChIP-chip data, and then examine the periodicity of the inferred activities. For most species, however, large-scale ChIP-chip data are still not available. Results We propose a two-step method to identify the CCRTFs by integrating microarray cell cycle data with ChIP-chip data or motif discovery data. In S. cerevisiae, we identify 42 CCRTFs, among which 23 have been verified experimentally. The cell cycle related behaviors (e.g. at which cell cycle phase a TF achieves the highest activity) predicted by our method are consistent with the well established knowledge about them. We also find that the periodical activity fluctuation of some TFs can be perturbed by the cell synchronization treatment. Moreover, by integrating expression data with in-silico motif discovery data, we identify 8 cell cycle associated regulatory motifs, among which 7 are binding sites for well-known cell cycle related TFs. Conclusion Our method is effective to identify CCRTFs by integrating microarray cell cycle data with TF-gene binding information. In S. cerevisiae, the TF-gene binding information is provided by the systematic ChIP-chip experiments. In other species where systematic ChIP-chip data is not available, in-silico motif discovery and analysis provide us with an alternative method. Therefore, our method is ready to be implemented to the microarray cell cycle data sets from different species. The C++ program for AC score calculation is available for download from URL .
Collapse
Affiliation(s)
- Chao Cheng
- Molecular and Computational biology program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089-2910, USA.
| | | |
Collapse
|
34
|
Futschik ME, Herzel H. Are we overestimating the number of cell-cycling genes? The impact of background models on time-series analysis. ACTA ACUST UNITED AC 2008; 24:1063-9. [PMID: 18310054 DOI: 10.1093/bioinformatics/btn072] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Periodic processes play fundamental roles in organisms. Prominent examples are the cell cycle and the circadian clock. Microarray array technology has enabled us to screen complete sets of transcripts for possible association with such fundamental periodic processes on a system-wide level. Frequently, quite large numbers of genes have been detected as periodically expressed. However, the small overlap between genes identified in different studies has cast some doubts on the reliability of the periodic expression detected. RESULTS In this study, comparative analysis suggests that the lacking agreement between different cell-cycle studies might be due to inadequate background models for the determination of significance. We demonstrate that the choice of background model has considerable impact on the statistical significance of periodic expression. For illustration, we reanalyzed two microarray studies of the yeast cell cycle. Our evaluation strongly indicates that the results of previous analyses might have been overoptimistic and that the use of more suitable background model promises to give more realistic results. AVAILABILITY R scripts are available on request from the corresponding author.
Collapse
Affiliation(s)
- Matthias E Futschik
- Institute for Theoretical Biology, Charité, Humboldt-Universität, Invalidenstrasse 43, 10115 Berlin, Germany.
| | | |
Collapse
|
35
|
Stirewalt DL, Meshinchi S, Kopecky KJ, Fan W, Pogosova-Agadjanyan EL, Engel JH, Cronk MR, Dorcy KS, McQuary AR, Hockenbery D, Wood B, Heimfeld S, Radich JP. Identification of genes with abnormal expression changes in acute myeloid leukemia. Genes Chromosomes Cancer 2008; 47:8-20. [PMID: 17910043 DOI: 10.1002/gcc.20500] [Citation(s) in RCA: 129] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Acute myeloid leukemia (AML) is one of the most common and deadly forms of hematopoietic malignancies. We hypothesized that microarray studies could identify previously unrecognized expression changes that occur only in AML blasts. We were particularly interested in those genes with increased expression in AML, believing that these genes may be potential therapeutic targets. To test this hypothesis, we compared gene expression profiles between normal hematopoietic cells from 38 healthy donors and leukemic blasts from 26 AML patients. Normal hematopoietic samples included CD34+ selected cells (N = 18), unselected bone marrows (N = 10), and unselected peripheral bloods (N = 10). Twenty genes displayed AML-specific expression changes that were not found in the normal hematopoietic cells. Subsequent analyses using microarray data from 285 additional AML patients confirmed expression changes for 13 of the 20 genes. Seven genes (BIK, CCNA1, FUT4, IL3RA, HOMER3, JAG1, WT1) displayed increased expression in AML, while 6 genes (ALDHA1A, PELO, PLXNC1, PRUNE, SERPINB9, TRIB2) displayed decreased expression. Quantitative RT/PCR studies for the 7 over-expressed genes were performed in an independent set of 9 normal and 21 pediatric AML samples. All 7 over-expressed genes displayed an increased expression in the AML samples compared to normals. Three of the 7 over-expressed genes (WT1, CCNA1, and IL3RA) have already been linked to leukemogenesis and/or AML prognosis, while little is known about the role of the other 4 over-expressed genes in AML. Future studies will determine their potential role in leukemogenesis and their clinical significance.
Collapse
Affiliation(s)
- Derek L Stirewalt
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Testing the significance of cell-cycle patterns in time-course microarray data using nonparametric quadratic inference functions. Comput Stat Data Anal 2008. [DOI: 10.1016/j.csda.2007.03.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
37
|
Stirewalt DL, Mhyre AJ, Marcondes M, Pogosova-Agadjanyan E, Abbasi N, Radich JP, Deeg HJ. Tumour necrosis factor-induced gene expression in human marrow stroma: clues to the pathophysiology of MDS? Br J Haematol 2007; 140:444-53. [PMID: 18162123 DOI: 10.1111/j.1365-2141.2007.06923.x] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Aberrant regulation of the tumour necrosis factor alpha gene (TNF) and stroma-derived signals are involved in the pathophysiology of myelodysplasia. Therefore, KG1a, a myeloid leukaemia cell line, was exposed to Tnf in the absence or presence of either HS-5 or HS-27a cells, two human stroma cell lines. While KG1a cells were resistant to Tnf-induced apoptosis in the absence of stroma cells, Tnf-promoted apoptosis of KG1a cells in co-culture experiments with stroma cells. To investigate the Tnf-induced signals from the stroma cells, we examined expression changes in HS-5 and HS-27a cells after Tnf exposure. DNA microarray studies found both discordant and concordant Tnf-induced expression responses in the two stroma cell lines. Tnf promoted an increased mRNA expression of pro-inflammatory cytokines [e.g. interleukin (IL)6, IL8 and IL32]. At the same time, Tnf decreased the mRNA expression of anti-apoptotic genes (e.g. BCL2L1) and increased the mRNA expression of pro-apoptotic genes (e.g. BID). Overall, the results suggested that Tnf induced a complex set of pro-inflammatory and pro-apoptotic signals in stroma cells that promote apoptosis in malignant myeloid clones. Additional studies will be required to determine which of these signals are critical for the induction of apoptosis in the malignant clones. Those insights, in turn, may point the way to novel therapeutic approaches.
Collapse
Affiliation(s)
- Derek L Stirewalt
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| | | | | | | | | | | | | |
Collapse
|
38
|
Bonig H, Priestley GV, Oehler V, Papayannopoulou T. Hematopoietic progenitor cells (HPC) from mobilized peripheral blood display enhanced migration and marrow homing compared to steady-state bone marrow HPC. Exp Hematol 2007; 35:326-34. [PMID: 17258081 PMCID: PMC1847625 DOI: 10.1016/j.exphem.2006.09.017] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2006] [Revised: 08/21/2006] [Accepted: 09/25/2006] [Indexed: 12/29/2022]
Abstract
OBJECTIVE Faster engraftment of G-CSF-mobilized peripheral blood (MPB) transplants compared to steady-state bone marrow (ssBM) is well documented and clinically relevant. A number of different factors likely contribute to this outcome. In the present study we explored whether independent of cell number there are intrinsic differences in the efficiency of progenitor cell homing to marrow between MPB and ssBM. METHODS Mobilization was achieved by continuous infusion of G-CSF alone or in combination with other mobilizing agents. In vivo homing assays, in vitro migration assays, gene expression analysis, and flow cytometry were utilized to compare homing-related in vivo and in vitro properties of MPB and ssBM HPC. RESULTS Marrow homing of murine MPB HPC, generated by different mobilizing schemes, was reproducibly significantly superior to that of ssBM, in lethally irradiated as well as in nonirradiated hosts. This phenotype was independent of MMP9, selectins, and beta2- and alpha4-integrins. Superior homing was also observed for human MPB HPC transplanted into NOD/SCIDbeta2microglobulin(-/-) recipients. Inhibition of HPC migration abrogated the homing advantage of MPB but did not affect homing of ssBM HPC, whereas enhancement of motility by CD26 inhibition improved marrow homing only of ssBM HPC. Enhanced SDF-1-dependent chemotaxis and low CD26 expression on MPB HPC were identified as potential contributing factors. Significant contributions of the putative alternative SDF-1 receptor, RDC1, were unlikely based on gene expression data. CONCLUSION The data suggest increased motility as a converging endpoint of complex changes seen in MPB HPC which is likely responsible for their favorable homing.
Collapse
Affiliation(s)
- Halvard Bonig
- Department of Medicine/Hematology, University of Washington, Seattle, WA 98195, USA.
| | | | | | | |
Collapse
|
39
|
Larsen P, Almasri E, Chen G, Dai Y. Correlated Discretized Expression score: a method for identifying gene interaction networks from time course microarray expression data. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2007; 2006:5842-5. [PMID: 17946340 DOI: 10.1109/iembs.2006.259256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
One of the goals of genomic expression analysis is to construct gene interaction networks from microarray data. Time course microarray data is a common place to seek causal relationships between the expression of a regulator and its effect on the expression of its targets. By proposing gene expression patterns of regulator and target genes based on biological expectation of regulatory interactions, it is possible to propose a system to identify these patterns. This system is based on the Correlated Discretized Expression (CDE) score calculated from microarray time course data. The CDE-score is derived by discretizing microarray data to identify significant gene expression changes. The usefulness of this method is demonstrated using a set of hypothetical gene expression data and the analysis of S. cerevisiae cell cycle microarray data.
Collapse
Affiliation(s)
- Peter Larsen
- Core Genomics Lab., Illinois Univ., Chicago, IL 60607-7058, USA.
| | | | | | | |
Collapse
|
40
|
Gauthier NP, Larsen ME, Wernersson R, de Lichtenberg U, Jensen LJ, Brunak S, Jensen TS. Cyclebase.org--a comprehensive multi-organism online database of cell-cycle experiments. Nucleic Acids Res 2007; 36:D854-9. [PMID: 17940094 PMCID: PMC2238932 DOI: 10.1093/nar/gkm729] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The past decade has seen the publication of a large number of cell-cycle microarray studies and many more are in the pipeline. However, data from these experiments are not easy to access, combine and evaluate. We have developed a centralized database with an easy-to-use interface, Cyclebase.org, for viewing and downloading these data. The user interface facilitates searches for genes of interest as well as downloads of genome-wide results. Individual genes are displayed with graphs of expression profiles throughout the cell cycle from all available experiments. These expression profiles are normalized to a common timescale to enable inspection of the combined experimental evidence. Furthermore, state-of-the-art computational analyses provide key information on both individual experiments and combined datasets such as whether or not a gene is periodically expressed and, if so, the time of peak expression. Cyclebase is available at http://www.cyclebase.org.
Collapse
Affiliation(s)
- Nicholas Paul Gauthier
- Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, Building 208, DK-2800 Lyngby, Denmark
| | | | | | | | | | | | | |
Collapse
|
41
|
Rowicka M, Kudlicki A, Tu BP, Otwinowski Z. High-resolution timing of cell cycle-regulated gene expression. Proc Natl Acad Sci U S A 2007; 104:16892-7. [PMID: 17827275 PMCID: PMC2040468 DOI: 10.1073/pnas.0706022104] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
The eukaryotic cell division cycle depends on an intricate sequence of transcriptional events. Using an algorithm based on maximum-entropy deconvolution, and expression data from a highly synchronized yeast culture, we have timed the peaks of expression of transcriptionally regulated cell cycle genes to an accuracy of 2 min (approximately equal to 1% of the cell cycle time). The set of 1,129 cell cycle-regulated genes was identified by a comprehensive analysis encompassing all available cell cycle yeast data sets. Our results reveal distinct subphases of the cell cycle undetectable by morphological observation, as well as the precise timeline of macromolecular complex assembly during key cell cycle events.
Collapse
Affiliation(s)
- Maga Rowicka
- Department of Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390
- *To whom correspondence may be addressed. E-mail: , , or
| | - Andrzej Kudlicki
- Department of Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390
- *To whom correspondence may be addressed. E-mail: , , or
| | - Benjamin P. Tu
- Department of Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390
| | - Zbyszek Otwinowski
- Department of Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390
- *To whom correspondence may be addressed. E-mail: , , or
| |
Collapse
|
42
|
Méndez E, Fan W, Choi P, Agoff SN, Whipple M, Farwell DG, Futran ND, Weymuller EA, Zhao LP, Chen C. Tumor-specific genetic expression profile of metastatic oral squamous cell carcinoma. Head Neck 2007; 29:803-14. [PMID: 17573689 DOI: 10.1002/hed.20598] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
BACKGROUND Metastasis is the most important predictor of survival in patients with oral squamous cell carcinoma (OSCC). We tested the hypothesis that there is a genetic expression profile associated with OSCC metastasis. METHODS We obtained samples from 6 OSCC node-positive primary tumors and their matched metastatic lymph nodes, and 5 OSCC node-negative primary tumors. Using laser capture microdissection, we isolated OSCC cells from metastatic lymph nodes and compared them with those from matched primary tumors and unmatched node-negative primary tumors using Affymetrix Human Genome Focus arrays. RESULTS Comparison of tumor cells from the lymph nodes with those from the unmatched, node-negative primary tumors revealed differential expression of 160 genes. Hierarchical clustering and principal component analysis using this 160-gene set showed that the node-negative samples were distinguishable from both, node-positive primary tumors and tumors in the lymph nodes. Many of the expression changes found in the metastatic cells from the lymph nodes were also found in the node-positive primary tumors. Immunohistochemical analysis for transglutaminase-3 and keratin 16 confirmed the differential genetic expression for these genes. CONCLUSION These preliminary results suggest that there may be a metastatic gene expression profile present in node-positive primary OSCC.
Collapse
Affiliation(s)
- Eduardo Méndez
- Department of Otolaryngology - Head and Neck Surgery, University of Washington, Seattle, Washington 98195, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Larsen P, Almasri E, Chen G, Dai Y. A statistical method to incorporate biological knowledge for generating testable novel gene regulatory interactions from microarray experiments. BMC Bioinformatics 2007; 8:317. [PMID: 17727721 PMCID: PMC2082045 DOI: 10.1186/1471-2105-8-317] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2006] [Accepted: 08/29/2007] [Indexed: 11/16/2022] Open
Abstract
Background The incorporation of prior biological knowledge in the analysis of microarray data has become important in the reconstruction of transcription regulatory networks in a cell. Most of the current research has been focused on the integration of multiple sets of microarray data as well as curated databases for a genome scale reconstruction. However, individual researchers are more interested in the extraction of most useful information from the data of their hypothesis-driven microarray experiments. How to compile the prior biological knowledge from literature to facilitate new hypothesis generation from a microarray experiment is the focus of this work. We propose a novel method based on the statistical analysis of reported gene interactions in PubMed literature. Results Using Gene Ontology (GO) Molecular Function annotation for reported gene regulatory interactions in PubMed literature, a statistical analysis method was proposed for the derivation of a likelihood of interaction (LOI) score for a pair of genes. The LOI-score and the Pearson correlation coefficient of gene profiles were utilized to check if a pair of query genes would be in the above specified interaction. The method was validated in the analysis of two gene sets formed from the yeast Saccharomyces cerevisiae cell cycle microarray data. It was found that high percentage of identified interactions shares GO Biological Process annotations (39.5% for a 102 interaction enriched gene set and 23.0% for a larger 999 cyclically expressed gene set). Conclusion This method can uncover novel biologically relevant gene interactions. With stringent confidence levels, small interaction networks can be identified for further establishment of a hypothesis testable by biological experiment. This procedure is computationally inexpensive and can be used as a preprocessing procedure for screening potential biologically relevant gene pairs subject to the analysis with sophisticated statistical methods.
Collapse
Affiliation(s)
- Peter Larsen
- Core Genomics Laboratory at University of Illinois at Chicago, 845 West Taylor Street Chicago, IL 60607, USA
| | - Eyad Almasri
- Department of Bioengineering (MC063), University of Illinois at Chicago, 851 South Morgan Street, Chicago, IL 60607, USA
| | - Guanrao Chen
- Department of Computer Science, University of Illinois at Chicago, 851 South Morgan Street, Chicago, IL 60607, USA
| | - Yang Dai
- Department of Bioengineering (MC063), University of Illinois at Chicago, 851 South Morgan Street, Chicago, IL 60607, USA
| |
Collapse
|
44
|
Ahdesmäki M, Lähdesmäki H, Gracey A, Shmulevich L, Yli-Harja O. Robust regression for periodicity detection in non-uniformly sampled time-course gene expression data. BMC Bioinformatics 2007; 8:233. [PMID: 17605777 PMCID: PMC1934414 DOI: 10.1186/1471-2105-8-233] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2006] [Accepted: 07/02/2007] [Indexed: 11/23/2022] Open
Abstract
Background In practice many biological time series measurements, including gene microarrays, are conducted at time points that seem to be interesting in the biologist's opinion and not necessarily at fixed time intervals. In many circumstances we are interested in finding targets that are expressed periodically. To tackle the problems of uneven sampling and unknown type of noise in periodicity detection, we propose to use robust regression. Methods The aim of this paper is to develop a general framework for robust periodicity detection and review and rank different approaches by means of simulations. We also show the results for some real measurement data. Results The simulation results clearly show that when the sampling of time series gets more and more uneven, the methods that assume even sampling become unusable. We find that M-estimation provides a good compromise between robustness and computational efficiency. Conclusion Since uneven sampling occurs often in biological measurements, the robust methods developed in this paper are expected to have many uses. The regression based formulation of the periodicity detection problem easily adapts to non-uniform sampling. Using robust regression helps to reject inconsistently behaving data points. Availability The implementations are currently available for Matlab and will be made available for the users of R as well. More information can be found in the web-supplement [1].
Collapse
Affiliation(s)
- Miika Ahdesmäki
- Institute of Signal Processing, Tampere University of Technology, P.O.Box 553, 33101 Tampere, Finland
| | - Harri Lähdesmäki
- Institute of Signal Processing, Tampere University of Technology, P.O.Box 553, 33101 Tampere, Finland
- Institute for Systems Biology, WA 98103, USA
| | - Andrew Gracey
- Marine Environmental Biology, University of Southern California, CA 90089, USA
| | | | - Olli Yli-Harja
- Institute of Signal Processing, Tampere University of Technology, P.O.Box 553, 33101 Tampere, Finland
| |
Collapse
|
45
|
Regenberg B, Grotkjær T, Winther O, Fausbøll A, Åkesson M, Bro C, Hansen LK, Brunak S, Nielsen J. Growth-rate regulated genes have profound impact on interpretation of transcriptome profiling in Saccharomyces cerevisiae. Genome Biol 2007; 7:R107. [PMID: 17105650 PMCID: PMC1794586 DOI: 10.1186/gb-2006-7-11-r107] [Citation(s) in RCA: 179] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2006] [Revised: 09/04/2006] [Accepted: 11/14/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Growth rate is central to the development of cells in all organisms. However, little is known about the impact of changing growth rates. We used continuous cultures to control growth rate and studied the transcriptional program of the model eukaryote Saccharomyces cerevisiae, with generation times varying between 2 and 35 hours. RESULTS A total of 5930 transcripts were identified at the different growth rates studied. Consensus clustering of these revealed that half of all yeast genes are affected by the specific growth rate, and that the changes are similar to those found when cells are exposed to different types of stress (>80% overlap). Genes with decreased transcript levels in response to faster growth are largely of unknown function (>50%) whereas genes with increased transcript levels are involved in macromolecular biosynthesis such as those that encode ribosomal proteins. This group also covers most targets of the transcriptional activator RAP1, which is also known to be involved in replication. A positive correlation between the location of replication origins and the location of growth-regulated genes suggests a role for replication in growth rate regulation. CONCLUSION Our data show that the cellular growth rate has great influence on transcriptional regulation. This, in turn, implies that one should be cautious when comparing mutants with different growth rates. Our findings also indicate that much of the regulation is coordinated via the chromosomal location of the affected genes, which may be valuable information for the control of heterologous gene expression in metabolic engineering.
Collapse
Affiliation(s)
- Birgitte Regenberg
- Institut für Molekulare Biowissenschaften, Johann Wolfgang Goethe-Universität, Max-von-Laue-Str. 9, 60438 Frankfurt am Main, Germany
| | - Thomas Grotkjær
- Center for Microbial Biotechnology, BioCentrum-DTU, Building 223, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Ole Winther
- Informatics and Mathematical Modelling, Building 321, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Anders Fausbøll
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Mats Åkesson
- Center for Microbial Biotechnology, BioCentrum-DTU, Building 223, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Christoffer Bro
- Center for Microbial Biotechnology, BioCentrum-DTU, Building 223, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Lars Kai Hansen
- Informatics and Mathematical Modelling, Building 321, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Søren Brunak
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Jens Nielsen
- Center for Microbial Biotechnology, BioCentrum-DTU, Building 223, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| |
Collapse
|
46
|
Kapil A, Gudi RD, Noronha SB. Gene expression profile analysis using discrimination and fuzzy classification methods. ASIA-PAC J CHEM ENG 2007. [DOI: 10.1002/apj.12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
47
|
Doyle FJ, Stelling J. Systems interface biology. J R Soc Interface 2006; 3:603-16. [PMID: 16971329 PMCID: PMC1664650 DOI: 10.1098/rsif.2006.0143] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2006] [Accepted: 07/03/2006] [Indexed: 02/03/2023] Open
Abstract
The field of systems biology has attracted the attention of biologists, engineers, mathematicians, physicists, chemists and others in an endeavour to create systems-level understanding of complex biological networks. In particular, systems engineering methods are finding unique opportunities in characterizing the rich behaviour exhibited by biological systems. In the same manner, these new classes of biological problems are motivating novel developments in theoretical systems approaches. Hence, the interface between systems and biology is of mutual benefit to both disciplines.
Collapse
Affiliation(s)
- Francis J Doyle
- Department of Chemical Engineering, University of California, Santa Barbara, CA 93106, USA.
| | | |
Collapse
|
48
|
Abstract
Identification of differentially expressed genes and clustering of genes are two important and complementary objectives addressed with gene expression data. For the differential expression question, many "per-gene" analytic methods have been proposed. These methods can generally be characterized as using a regression function to independently model the observations for each gene; various adjustments for multiplicity are then used to interpret the statistical significance of these per-gene regression models over the collection of genes analyzed. Motivated by this common structure of per-gene models, we proposed a new model-based clustering method--the clustering of regression models method, which groups genes that share a similar relationship to the covariate(s). This method provides a unified approach for a family of clustering procedures and can be applied for data collected with various experimental designs. In addition, when combined with per-gene methods for assessing differential expression that employ the same regression modeling structure, an integrated framework for the analysis of microarray data is obtained. The proposed methodology was applied to two microarray data sets, one from a breast cancer study and the other from a yeast cell cycle study.
Collapse
Affiliation(s)
- Li-Xuan Qin
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, New York 10021, USA.
| | | |
Collapse
|
49
|
Huang D, Wei P, Pan W. Combining gene annotations and gene expression data in model-based clustering: weighted method. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2006; 10:28-39. [PMID: 16584316 DOI: 10.1089/omi.2006.10.28] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
It has been increasingly recognized that incorporating prior knowledge into cluster analysis can result in more reliable and meaningful clusters. In contrast to the standard modelbased clustering with a global mixture model, which does not use any prior information, a stratified mixture model was recently proposed to incorporate gene functions or biological pathways as priors in model-based clustering of gene expression profiles: various gene functional groups form the strata in a stratified mixture model. Albeit useful, the stratified method may be less efficient than the global analysis if the strata are non-informative to clustering. We propose a weighted method that aims to strike a balance between a stratified analysis and a global analysis: it weights between the clustering results of the stratified analysis and that of the global analysis; the weight is determined by data. More generally, the weighted method can take advantage of the hierarchical structure of most existing gene functional annotation systems, such as MIPS and Gene Ontology (GO), and facilitate choosing appropriate gene functional groups as priors. We use simulated data and real data to demonstrate the feasibility and advantages of the proposed method.
Collapse
Affiliation(s)
- Desheng Huang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, 55455, USA
| | | | | |
Collapse
|
50
|
Murthy KR, Hua LJ. Improved fourier transform method for unsupervised cell-cycle regulated gene prediction. PROCEEDINGS. IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE 2006:194-203. [PMID: 16448013 DOI: 10.1109/csb.2004.1332433] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
MOTIVATION Cell-cycle regulated gene prediction using microarray time-course measurements of the mRNA expression levels of genes has been used by several researchers. The popularly employed approach is Fourier transform (FT) method in conjunction with the set of known cell-cycle regulated genes. In the absence of training data, fourier transform method is sensitive to noise, additive monotonic component arising from cell population growth and deviation from strict sinusoidal form of expression. Known cell cycle regulated genes may not be available for certain organisms or using them for training may bias the prediction. RESULTS In this paper we propose an Improved Fourier Transform (IFT) method which takes care of several factors such as monotonic additive component of the cell-cycle expression, irregular or partial-cycle sampling of gene expression. The proposed algorithm does not need any known cell-cycle regulated genes for prediction. Apart from alleviating need for training set, it also removes bias towards genes similar to the training set. We have evaluated the developed method on two publicly available datasets: yeast cell-cycle data and HeLa cell-cycle data. The proposed algorithm has performed competitively on both datasets with that of the supervised fourier transform method used. It outperformed other unsupervised methods such as Partial Least Squares (PLS) and Single Pulse Modeling (SPM). This method is easy to comprehend and implement, and runs faster.
Collapse
|