1
|
Abdelkhalek A, Qari SH, Abu-Saied MAAR, Khalil AM, Younes HA, Nehela Y, Behiry SI. Chitosan Nanoparticles Inactivate Alfalfa Mosaic Virus Replication and Boost Innate Immunity in Nicotiana glutinosa Plants. PLANTS (BASEL, SWITZERLAND) 2021; 10:2701. [PMID: 34961172 PMCID: PMC8703458 DOI: 10.3390/plants10122701] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 11/26/2021] [Accepted: 12/03/2021] [Indexed: 06/01/2023]
Abstract
Plant viral infection is one of the most severe issues in food security globally, resulting in considerable crop production losses. Chitosan is a well-known biocontrol agent against a variety of plant infections. However, research on combatting viral infections is still in its early stages. The current study investigated the antiviral activities (protective, curative, and inactivation) of the prepared chitosan/dextran nanoparticles (CDNPs, 100 µg mL-1) on Nicotiana glutinosa plants. Scanning electron microscope (SEM) and dynamic light scattering analysis revealed that the synthesized CDNPs had a uniform, regular sphere shapes ranging from 20 to 160 nm in diameter, with an average diameter of 91.68 nm. The inactivation treatment was the most effective treatment, which resulted in a 100% reduction in the alfalfa mosaic virus (AMV, Acc# OK413670) accumulation level. On the other hand, the foliar application of CDNPs decreased disease severity and significantly reduced viral accumulation levels by 70.43% and 61.65% in protective and curative treatments, respectively, under greenhouse conditions. Additionally, the induction of systemic acquired resistance, increasing total carbohydrates and total phenolic contents, as well as triggering the transcriptional levels of peroxidase, pathogen-related protein-1, and phenylalanine ammonia-lyase were observed. In light of the results, we propose that the potential application of CDNPs could be an eco-friendly approach to enhance yield and a more effective therapeutic elicitor for disease management in plants upon induction of defense systems.
Collapse
Affiliation(s)
- Ahmed Abdelkhalek
- Plant Protection and Biomolecular Diagnosis Department, ALCRI, City of Scientific Research and Technological Applications, New Borg El-Arab City 21934, Alexandria, Egypt
| | - Sameer H. Qari
- Biology Department, Al-Jumum University College, Umm Al-Qura University, Mecca 25376, Saudi Arabia;
| | - Mohamed Abd Al-Raheem Abu-Saied
- Polymeric Materials Research Department, Advanced Technology and New Materials Research Institute, City of Scientific Research and Technological Applications (SRTA-City), New Borg El-Arab City 21934, Alexandria, Egypt;
| | - Abdallah Mohamed Khalil
- Plant Botany Department, Faculty of Science, Omar Al-Mukhtar University, Al Bayda 00218-84, Libya;
| | - Hosny A. Younes
- Agricultural Botany Department, Faculty of Agriculture (Saba Basha), Alexandria University, Alexandria 21531, Egypt;
| | - Yasser Nehela
- Department of Agricultural Botany, Faculty of Agriculture, Tanta University, Tanta 31511, Egypt;
- Citrus Research and Education Center, Department of Plant Pathology, University of Florida, 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| | - Said I. Behiry
- Agricultural Botany Department, Faculty of Agriculture (Saba Basha), Alexandria University, Alexandria 21531, Egypt;
| |
Collapse
|
2
|
Abstract
Neurodegenerative diseases (NDs) collectively afflict more than 40 million people worldwide. The majority of these diseases lack therapies to slow or stop progression due in large part to the challenge of disentangling the simultaneous presentation of broad, multifaceted pathophysiologic changes. Present technologies and computational capabilities suggest an optimistic future for deconvolving these changes to identify novel mechanisms driving ND onset and progression. In particular, integration of highly multi-dimensional omic analytical techniques (e.g., microarray, mass spectrometry) with computational systems biology approaches provides a systematic methodology to elucidate new mechanisms driving NDs. In this review, we begin by summarizing the complex pathophysiology of NDs associated with protein aggregation, emphasizing the shared complex dysregulation found in all of these diseases, and discuss available experimental ND models. Next, we provide an overview of technological and computational techniques used in systems biology that are applicable to studying NDs. We conclude by reviewing prior studies that have applied these approaches to NDs and comment on the necessity of combining analysis from both human tissues and model systems to identify driving mechanisms. We envision that the integration of computational approaches with multiple omic analyses of human tissues, and mouse and in vitro models, will enable the discovery of new therapeutic strategies for these devastating diseases.
Collapse
Affiliation(s)
- Levi B Wood
- Cancer Research Institute, Beth Israel Deaconess Medical Center and Department of Medicine, Harvard Medical School, Boston, MA 02215, USA.
| | | | | |
Collapse
|
3
|
Martini P, Sales G, Calura E, Cagnin S, Chiogna M, Romualdi C. timeClip: pathway analysis for time course data without replicates. BMC Bioinformatics 2014; 15 Suppl 5:S3. [PMID: 25077979 PMCID: PMC4095003 DOI: 10.1186/1471-2105-15-s5-s3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Background Time-course gene expression experiments are useful tools for exploring biological processes. In this type of experiments, gene expression changes are monitored along time. Unfortunately, replication of time series is still costly and usually long time course do not have replicates. Many approaches have been proposed to deal with this data structure, but none of them in the field of pathway analysis. Pathway analyses have acquired great relevance for helping the interpretation of gene expression data. Several methods have been proposed to this aim: from the classical enrichment to the more complex topological analysis that gains power from the topology of the pathway. None of them were devised to identify temporal variations in time course data. Results Here we present timeClip, a topology based pathway analysis specifically tailored to long time series without replicates. timeClip combines dimension reduction techniques and graph decomposition theory to explore and identify the portion of pathways that is most time-dependent. In the first step, timeClip selects the time-dependent pathways; in the second step, the most time dependent portions of these pathways are highlighted. We used timeClip on simulated data and on a benchmark dataset regarding mouse muscle regeneration model. Our approach shows good performance on different simulated settings. On the real dataset, we identify 76 time-dependent pathways, most of which known to be involved in the regeneration process. Focusing on the 'mTOR signaling pathway' we highlight the timing of key processes of the muscle regeneration: from the early pathway activation through growth factor signals to the late burst of protein production needed for the fiber regeneration. Conclusions timeClip represents a new improvement in the field of time-dependent pathway analysis. It allows to isolate and dissect pathways characterized by time-dependent components. Furthermore, using timeClip on a mouse muscle regeneration dataset we were able to characterize the process of muscle fiber regeneration with its correct timing.
Collapse
|
4
|
Stirewalt DL, Pogosova-Agadjanyan EL, Tsuchiya K, Joaquin J, Meshinchi S. Copy-neutral loss of heterozygosity is prevalent and a late event in the pathogenesis of FLT3/ITD AML. Blood Cancer J 2014; 4:e208. [PMID: 24786392 PMCID: PMC4042297 DOI: 10.1038/bcj.2014.27] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2014] [Accepted: 03/14/2014] [Indexed: 01/31/2023] Open
Abstract
Patients with high FLT3 internal tandem duplication allelic ratios (FLT3/ITD-ARs) have a poor prognosis. Single-nucleotide polymorphism/comparative genomic hybridization, single-cell PCR and colony-forming assays were used to evaluate genotypic evolution of high FLT3/ITD-ARs in 85 acute myeloid leukemia (AML) patients. Microarrays were used to examine molecular pathways disrupted in leukemic blasts with high FLT3/ITD-ARs. Copy-neutral loss of heterozygosity (CN-LOH) was identified at the FLT3 locus in diagnostic samples with high FLT3/ITD-ARs (N=11), but not in samples with low FLT3/ITD-ARs (N=24), FLT3-activating loop mutations (N=11) or wild-type FLT3 (N=39). Single-cell assays showed that homozygous FLT3/ITD genotype was present in subsets of leukemic blasts at diagnosis but became the dominant clone at relapse. Less differentiated CD34+/CD33− progenitor colonies were heterozygous for FLT3/ITD, whereas more differentiated CD34+/CD33+ progenitor colonies were homozygous for FLT3/ITD. Expression profiling revealed that samples harboring high FLT3/ITD-ARs aberrantly expressed genes within the recombination/DNA repair pathway. Thus, the development of CN-LOH at the FLT3 locus, which results in high FLT3/ITD-ARs, likely represents a late genomic event that occurs after the acquisition of the FLT3/ITD. Although the etiology underlying the development of CN-LOH remains to be clarified, the disruption in recombination/DNA repair pathway, which is present before the development of LOH, may have a role.
Collapse
Affiliation(s)
- D L Stirewalt
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | | | - K Tsuchiya
- 1] Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA [2] Department of Pathology, Seattle Children's Hospital, Seattle, WA, USA [3] Department of Laboratory Medicine, University of Washington Medical Center, Seattle, WA, USA
| | - J Joaquin
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - S Meshinchi
- 1] Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA [2] Department of Pediatrics, University of Washington Medical Center, Seattle, WA, USA [3] Children's Oncology Group, Arcadia, CA, USA [4] Department of Hematology-Oncology, Seattle Children's Hospital, Seattle, WA, USA
| |
Collapse
|
5
|
Oh S, Song S, Dasgupta N, Grabowski G. The analytical landscape of static and temporal dynamics in transcriptome data. Front Genet 2014; 5:35. [PMID: 24600473 PMCID: PMC3929947 DOI: 10.3389/fgene.2014.00035] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Accepted: 01/30/2014] [Indexed: 12/16/2022] Open
Abstract
Interpreting gene expression profiles often involves statistical analysis of large numbers of differentially expressed genes, isoforms, and alternative splicing events at either static or dynamic spectrums. Reduced sequencing costs have made feasible dense time-series analysis of gene expression using RNA-seq; however, statistical methods in the context of temporal RNA-seq data are poorly developed. Here we will review current methods for identifying temporal changes in gene expression using RNA-seq, which are limited to static pairwise comparisons of time points and which fail to account for temporal dependencies in gene expression patterns. We also review recently developed very few number of temporal dynamic RNA-seq specific methods. Application and development of RNA-specific temporal dynamic methods have been continuously under the development, yet, it is still in infancy. We fully cover microarray specific temporal methods and transcriptome studies in initial digital technology (e.g., SAGE) between traditional microarray and new RNA-seq.
Collapse
Affiliation(s)
- Sunghee Oh
- Division of Human Genetics, Department of Pediatrics, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| | - Seongho Song
- Department of Mathematical Sciences, McMicken College of Arts and Sciences, University of Cincinnati Cincinnati, OH, USA
| | - Nupur Dasgupta
- Division of Human Genetics, Department of Pediatrics, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| | - Gregory Grabowski
- Division of Human Genetics, Department of Pediatrics, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| |
Collapse
|
6
|
Wu S, Wu H. More powerful significant testing for time course gene expression data using functional principal component analysis approaches. BMC Bioinformatics 2013; 14:6. [PMID: 23323795 PMCID: PMC3617096 DOI: 10.1186/1471-2105-14-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Accepted: 11/07/2012] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND One of the fundamental problems in time course gene expression data analysis is to identify genes associated with a biological process or a particular stimulus of interest, like a treatment or virus infection. Most of the existing methods for this problem are designed for data with longitudinal replicates. But in reality, many time course gene experiments have no replicates or only have a small number of independent replicates. RESULTS We focus on the case without replicates and propose a new method for identifying differentially expressed genes by incorporating the functional principal component analysis (FPCA) into a hypothesis testing framework. The data-driven eigenfunctions allow a flexible and parsimonious representation of time course gene expression trajectories, leaving more degrees of freedom for the inference compared to that using a prespecified basis. Moreover, the information of all genes is borrowed for individual gene inferences. CONCLUSION The proposed approach turns out to be more powerful in identifying time course differentially expressed genes compared to the existing methods. The improved performance is demonstrated through simulation studies and a real data application to the Saccharomyces cerevisiae cell cycle data.
Collapse
Affiliation(s)
- Shuang Wu
- Department of Biostatistics and Computational Biology, University of Rochester, 601 Elmwood Avenue, Rochester, NY, 14642, USA
| | - Hulin Wu
- Department of Biostatistics and Computational Biology, University of Rochester, 601 Elmwood Avenue, Rochester, NY, 14642, USA
| |
Collapse
|
7
|
Wang K, Ng SK, McLachlan GJ. Clustering of time-course gene expression profiles using normal mixture models with autoregressive random effects. BMC Bioinformatics 2012; 13:300. [PMID: 23151154 PMCID: PMC3574839 DOI: 10.1186/1471-2105-13-300] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Accepted: 11/07/2012] [Indexed: 11/26/2022] Open
Abstract
Background Time-course gene expression data such as yeast cell cycle data may be periodically expressed. To cluster such data, currently used Fourier series approximations of periodic gene expressions have been found not to be sufficiently adequate to model the complexity of the time-course data, partly due to their ignoring the dependence between the expression measurements over time and the correlation among gene expression profiles. We further investigate the advantages and limitations of available models in the literature and propose a new mixture model with autoregressive random effects of the first order for the clustering of time-course gene-expression profiles. Some simulations and real examples are given to demonstrate the usefulness of the proposed models. Results We illustrate the applicability of our new model using synthetic and real time-course datasets. We show that our model outperforms existing models to provide more reliable and robust clustering of time-course data. Our model provides superior results when genetic profiles are correlated. It also gives comparable results when the correlation between the gene profiles is weak. In the applications to real time-course data, relevant clusters of coregulated genes are obtained, which are supported by gene-function annotation databases. Conclusions Our new model under our extension of the EMMIX-WIRE procedure is more reliable and robust for clustering time-course data because it adopts a random effects model that allows for the correlation among observations at different time points. It postulates gene-specific random effects with an autocorrelation variance structure that models coregulation within the clusters. The developed R package is flexible in its specification of the random effects through user-input parameters that enables improved modelling and consequent clustering of time-course data.
Collapse
Affiliation(s)
- Kui Wang
- Department of Mathematics, University of Queensland, Brisbane, QLD 4072, Australia
| | | | | |
Collapse
|
8
|
Guo X, Pan W. USING WEIGHTED PERMUTATION SCORES TO DETECT DIFFERENTIAL GENE EXPRESSION WITH MICROARRAY DATA. J Bioinform Comput Biol 2011; 3:989-1006. [PMID: 16078371 DOI: 10.1142/s021972000500134x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2004] [Revised: 01/25/2005] [Accepted: 01/26/2005] [Indexed: 11/18/2022]
Abstract
A class of nonparametric statistical methods, including a nonparametric empirical Bayes (EB) method, the Significance Analysis of Microarrays (SAM) and the mixture model method (MMM) have been proposed to detect differential gene expression for replicated microarray experiments. They all depend on constructing a test statistic, for example, a t-statistic, and then using permutation to draw inferences. However, due to special features of microarray data, using standard permutation scores may not estimate the null distribution of the test statistic well, leading to possibly too conservative inferences. We propose a new method of constructing weighted permutation scores to overcome the problem: posterior probabilities of having no differential expression from the EB method are used as weights for genes to better estimate the null distribution of the test statistic. We also propose a weighted method to estimate the false discovery rate (FDR) using the posterior probabilities. Using simulated data and real data for time-course microarray experiments, we show the improved performance of the proposed methods when implemented in MMM, EB and SAM.
Collapse
Affiliation(s)
- Xu Guo
- Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building, MMC 303, Minneapolis, MN 55455-0378, USA
| | | |
Collapse
|
9
|
Zhang B, Chen B, Wu T, Xuan Z, Zhu X, Chen R. Estimating developmental states of tumors and normal tissues using a linear time-ordered model. BMC Bioinformatics 2011; 12:53. [PMID: 21310084 PMCID: PMC3223864 DOI: 10.1186/1471-2105-12-53] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2010] [Accepted: 02/11/2011] [Indexed: 11/18/2022] Open
Abstract
Background Tumor cells are considered to have an aberrant cell state, and some evidence indicates different development states appearing in the tumorigenesis. Embryonic development and stem cell differentiation are ordered processes in which the sequence of events over time is highly conserved. The "cancer attractor" concept integrates normal developmental processes and tumorigenesis into a high-dimensional "cell state space", and provides a reasonable explanation of the relationship between these two biological processes from theoretical viewpoint. However, it is hard to describe such relationship by using existed experimental data; moreover, the measurement of different development states is also difficult. Results Here, by applying a novel time-ordered linear model based on a co-bisector which represents the joint direction of a series of vectors, we described the trajectories of development process by a line and showed different developmental states of tumor cells from developmental timescale perspective in a cell state space. This model was used to transform time-course developmental expression profiles of human ESCs, normal mouse liver, ovary and lung tissue into "cell developmental state lines". Then these cell state lines were applied to observe the developmental states of different tumors and their corresponding normal samples. Mouse liver and ovarian tumors showed different similarity to early development stage. Similarly, human glioma cells and ovarian tumors became developmentally "younger". Conclusions The time-ordered linear model captured linear projected development trajectories in a cell state space. Meanwhile it also reflected the change tendency of gene expression over time from the developmental timescale perspective, and our finding indicated different development states during tumorigenesis processes in different tissues.
Collapse
Affiliation(s)
- Bo Zhang
- Laboratory of Bioinformatics and Noncoding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, PR China
| | | | | | | | | | | |
Collapse
|
10
|
Méndez E, Lohavanichbutr P, Fan W, Houck JR, Rue TC, Doody DR, Futran ND, Upton MP, Yueh B, Zhao LP, Schwartz SM, Chen C. Can a metastatic gene expression profile outperform tumor size as a predictor of occult lymph node metastasis in oral cancer patients? Clin Cancer Res 2011; 17:2466-73. [PMID: 21300763 DOI: 10.1158/1078-0432.ccr-10-0175] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
PURPOSE To determine the differential gene expression between oral squamous cell carcinoma (OSCC) with and without metastasis to cervical lymph nodes and to assess prediction of nodal metastasis by using molecular features. EXPERIMENTAL DESIGN We used Affymetrix U133 2.0 plus arrays to compare the tumor genome-wide gene expression of 73 node-positive OSCCs with 40 node-negative OSCCs (≥ 18 months). Multivariate linear regression was used to estimate the association between gene expression and nodal metastasis. Stepwise logistic regression and receiver operating characteristics (ROC) analysis were used to generate predictive models and to compare these with models by using tumor size alone. RESULTS We identified five genes differentially expressed between node-positive and node-negative OSCCs after adjusting for tumor size and human papillomavirus status: REEP1, RNF145, CTONG2002744, MYO5A, and FBXO32. Stepwise regression identified a four-gene model (MYO5A, RFN145, FBXO32, and CTONG2002744) as the most predictive of nodal metastasis. A leave-one-out ROC analysis revealed that our model had a higher area under the curve (AUC) for identifying occult nodal metastasis compared with that of a model by tumor size alone (respective AUC: 0.85 and 0.61; P = 0.011). A model combining tumor size and gene expression did not further improve the prediction of occult metastasis. Independent validation using 31 metastatic and 13 nonmetastatic cases revealed a significant underexpression of CTONG2002744 (P = 0.0004). CONCLUSIONS These results suggest that our gene expression markers of OSCC metastasis hold promise for improving current clinical practice. Confirmation by others and functional studies of CTONG2002744 is warranted.
Collapse
Affiliation(s)
- Eduardo Méndez
- Department of Otolaryngology-Head and Neck Surgery, University of Washington, Seattle, Washington, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Abstract
Over the past 20 years, Omics technologies emerged as the consensual denomination of holistic molecular profiling. These techniques enable parallel measurements of biological -omes, or "all constituents considered collectively", and utilize the latest advancements in transcriptomics, proteomics, metabolomics, imaging, and bioinformatics. The technological accomplishments in increasing the sensitivity and throughput of the analytical devices, the standardization of the protocols and the widespread availability of reagents made the capturing of static molecular portraits of biological systems a routine task. The next generation of time course molecular profiling already allows for extensive molecular snapshots to be taken along the trajectory of time evolution of the investigated biological systems. Such datasets provide the basis for application of the inverse scientific approach. It consists in the inference of scientific hypotheses and theories about the structure and dynamics of the investigated biological system without any a priori knowledge, solely relying on data analysis to unveil the underlying patterns. However, most temporal Omics data still contain a limited number of time points, taken over arbitrary time intervals, through measurements on biological processes shifted in time. The analysis of the resulting short and noisy time series data sets is a challenge. Traditional statistical methods for the study of static Omics datasets are of limited relevance and new methods are required. This chapter discusses such algorithms which enable the application of the inverse analysis approach to short Omics time series.
Collapse
|
12
|
Pogosova-Agadjanyan EL, Fan W, Georges GE, Schwartz JL, Kepler CM, Lee H, Suchanek AL, Cronk MR, Brumbaugh A, Engel JH, Yukawa M, Zhao LP, Heimfeld S, Stirewalt DL. Identification of radiation-induced expression changes in nonimmortalized human T cells. Radiat Res 2010; 175:172-84. [PMID: 21268710 DOI: 10.1667/rr1977.1] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In the event of a radiation accident or attack, it will be imperative to quickly assess the amount of radiation exposure to accurately triage victims for appropriate care. RNA-based radiation dosimetry assays offer the potential to rapidly screen thousands of individuals in an efficient and cost-effective manner. However, prior to the development of these assays, it will be critical to identify those genes that will be most useful to delineate different radiation doses. Using global expression profiling, we examined expression changes in nonimmortalized T cells across a wide range of doses (0.15-12 Gy). Because many radiation responses are highly dependent on time, expression changes were examined at three different times (3, 8, and 24 h). Analyses identified 61, 512 and 1310 genes with significant linear dose-dependent expression changes at 3, 8 and 24 h, respectively. Using a stepwise regression procedure, a model was developed to estimate in vitro radiation exposures using the expression of three genes (CDKN1A, PSRC1 and TNFSF4) and validated in an independent test set with 86% accuracy. These findings suggest that RNA-based expression assays for a small subset of genes can be employed to develop clinical biodosimetry assays to be used in assessments of radiation exposure and toxicity.
Collapse
Affiliation(s)
- Era L Pogosova-Agadjanyan
- Clinical Research Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N., Seattle, WA 98109, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Aryee MJ, Gutiérrez-Pabello JA, Kramnik I, Maiti T, Quackenbush J. An improved empirical bayes approach to estimating differential gene expression in microarray time-course data: BETR (Bayesian Estimation of Temporal Regulation). BMC Bioinformatics 2009; 10:409. [PMID: 20003283 PMCID: PMC2801687 DOI: 10.1186/1471-2105-10-409] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2009] [Accepted: 12/10/2009] [Indexed: 12/02/2022] Open
Abstract
Background Microarray gene expression time-course experiments provide the opportunity to observe the evolution of transcriptional programs that cells use to respond to internal and external stimuli. Most commonly used methods for identifying differentially expressed genes treat each time point as independent and ignore important correlations, including those within samples and between sampling times. Therefore they do not make full use of the information intrinsic to the data, leading to a loss of power. Results We present a flexible random-effects model that takes such correlations into account, improving our ability to detect genes that have sustained differential expression over more than one time point. By modeling the joint distribution of the samples that have been profiled across all time points, we gain sensitivity compared to a marginal analysis that examines each time point in isolation. We assign each gene a probability of differential expression using an empirical Bayes approach that reduces the effective number of parameters to be estimated. Conclusions Based on results from theory, simulated data, and application to the genomic data presented here, we show that BETR has increased power to detect subtle differential expression in time-series data. The open-source R package betr is available through Bioconductor. BETR has also been incorporated in the freely-available, open-source MeV software tool available from http://www.tm4.org/mev.html.
Collapse
Affiliation(s)
- Martin J Aryee
- Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, Massachusetts 02115, USA.
| | | | | | | | | |
Collapse
|
14
|
Gene expression changes in normal haematopoietic cells. Best Pract Res Clin Haematol 2009; 22:249-69. [PMID: 19698932 DOI: 10.1016/j.beha.2009.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The complexity of the healthy haematopoietic system is immense, and as such, one must understand the biology driving normal haematopoietic expression profiles when designing experiments and interpreting expression data that involve normal cells. This article seeks to present an organised approach to the use and interpretation of gene profiling in normal haematopoiesis and broadly illustrates the challenges of selecting appropriate controls for high-throughput expression studies.
Collapse
|
15
|
Nueda MJ, Sebastián P, Tarazona S, García-García F, Dopazo J, Ferrer A, Conesa A. Functional assessment of time course microarray data. BMC Bioinformatics 2009; 10 Suppl 6:S9. [PMID: 19534758 PMCID: PMC2697656 DOI: 10.1186/1471-2105-10-s6-s9] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Motivation Time-course microarray experiments study the progress of gene expression along time across one or several experimental conditions. Most developed analysis methods focus on the clustering or the differential expression analysis of genes and do not integrate functional information. The assessment of the functional aspects of time-course transcriptomics data requires the use of approaches that exploit the activation dynamics of the functional categories to where genes are annotated. Methods We present three novel methodologies for the functional assessment of time-course microarray data. i) maSigFun derives from the maSigPro method, a regression-based strategy to model time-dependent expression patterns and identify genes with differences across series. maSigFun fits a regression model for groups of genes labeled by a functional class and selects those categories which have a significant model. ii) PCA-maSigFun fits a PCA model of each functional class-defined expression matrix to extract orthogonal patterns of expression change, which are then assessed for their fit to a time-dependent regression model. iii) ASCA-functional uses the ASCA model to rank genes according to their correlation to principal time expression patterns and assess functional enrichment on a GSA fashion. We used simulated and experimental datasets to study these novel approaches. Results were compared to alternative methodologies. Results Synthetic and experimental data showed that the different methods are able to capture different aspects of the relationship between genes, functions and co-expression that are biologically meaningful. The methods should not be considered as competitive but they provide different insights into the molecular and functional dynamic events taking place within the biological system under study.
Collapse
Affiliation(s)
- María José Nueda
- Department of Statistics and Operation Research, University of Alicante, Ctra, San Vicente del Raspeig, S/N 03690 Alicante, Spain.
| | | | | | | | | | | | | |
Collapse
|
16
|
|
17
|
Méndez E, Houck JR, Doody DR, Fan W, Lohavanichbutr P, Rue TC, Yueh B, Futran ND, Upton MP, Farwell DG, Heagerty PJ, Zhao LP, Schwartz SM, Chen C. A genetic expression profile associated with oral cancer identifies a group of patients at high risk of poor survival. Clin Cancer Res 2009; 15:1353-61. [PMID: 19228736 DOI: 10.1158/1078-0432.ccr-08-1816] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
PURPOSE To determine if gene expression signature of invasive oral squamous cell carcinoma (OSCC) can subclassify OSCC based on survival. EXPERIMENTAL DESIGN We analyzed the expression of 131 genes in 119 OSCC, 35 normal, and 17 dysplastic mucosa to identify cluster-defined subgroups. Multivariate Cox regression was used to estimate the association between gene expression and survival. By stepwise Cox regression, the top predictive models of OSCC-specific survival were determined and compared by receiver operating characteristic analysis. RESULTS The 3-year overall mean+/-SE survival for a cluster of 45 OSCC patients was 38.7+/-0.09% compared with 69.1+/-0.08% for the remaining patients. Multivariate analysis adjusted for age, sex, and stage showed that the 45 OSCC patient cluster had worse overall and OSCC-specific survival (hazard ratio, 3.31; 95% confidence interval, 1.66-6.58 and hazard ratio, 5.43; 95% confidence interval, 2.32-12.73, respectively). Stepwise Cox regression on the 131 probe sets revealed that a model with a term for LAMC2 (laminin gamma2) gene expression best identified patients with worst OSCC-specific survival. We fit a Cox model with a term for a principal component analysis-derived risk score marker and two other models that combined stage with either LAMC2 or PCA. The area under the curve for models combining stage with either LAMC2 or PCA was 0.80 or 0.82, respectively, compared with 0.70 for stage alone (P=0.013 and 0.008, respectively). CONCLUSIONS Gene expression and stage combined predict survival of OSCC patients better than stage alone.
Collapse
Affiliation(s)
- Eduardo Méndez
- Department of Otolaryngology-Head and Neck Surgery, University of Washington, Seattle, WA 98109-1024, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Lohavanichbutr P, Houck J, Fan W, Yueh B, Mendez E, Futran N, Doody DR, Upton MP, Farwell DG, Schwartz SM, Zhao LP, Chen C. Genomewide gene expression profiles of HPV-positive and HPV-negative oropharyngeal cancer: potential implications for treatment choices. ACTA ACUST UNITED AC 2009; 135:180-8. [PMID: 19221247 DOI: 10.1001/archoto.2008.540] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
OBJECTIVE To study the difference in gene expression between human papillomavirus (HPV)-positive and HPV-negative oral cavity and oropharyngeal squamous cell carcinoma (OSCC). DESIGN We used Affymetrix U133 plus 2.0 arrays to examine gene expression profiles of OSCC and normal oral tissue. The HPV DNA was detected using polymerase chain reaction followed by the Roche LINEAR ARRAY HPV Genotyping Test, and the differentially expressed genes were analyzed to examine their potential biological roles using the Ingenuity Pathway Analysis Software, version 5.0. SETTING Three medical centers affiliated with the University of Washington. PATIENTS A total of 119 patients with primary OSCC and 35 patients without cancer, all of whom were treated at the setting institutions, provided tissues samples for the study. RESULTS Human papillomavirus DNA was found in 41 of 119 tumors (34.5%) and 2 of 35 normal tissue samples (5.7%); 39 of the 43 HPV specimens were HPV-16. A higher prevalence of HPV DNA was found in oropharyngeal cancer (23 of 31) than in oral cavity cancer (18 of 88). We found no significant difference in gene expression between HPV-positive and HPV-negative oral cavity cancer but found 446 probe sets (347 known genes) differentially expressed in HPV-positive oropharyngeal cancer than in HPV-negative oropharyngeal cancer. The most prominent functions of these genes are DNA replication, DNA repair, and cell cycling. Some genes differentially expressed between HPV-positive and HPV-negative oropharyngeal cancer (eg, TYMS, STMN1, CCND1, and RBBP4) are involved in chemotherapy or radiation sensitivity. CONCLUSION These results suggest that differences in the biology of HPV-positive and HPV-negative oropharyngeal cancer may have implications for the management of patients with these different tumors.
Collapse
Affiliation(s)
- Pawadee Lohavanichbutr
- Program in Epidemiology, Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Mail Stop M5-C800, 1100 Fairview Ave N, Seattle, WA 98109-1024, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Towards functional phosphoproteomics by mapping differential phosphorylation events in signaling networks. Proteomics 2008; 8:4453-65. [DOI: 10.1002/pmic.200800175] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
20
|
Chen C, Méndez E, Houck J, Fan W, Lohavanichbutr P, Doody D, Yueh B, Futran ND, Upton M, Farwell DG, Schwartz SM, Zhao LP. Gene expression profiling identifies genes predictive of oral squamous cell carcinoma. Cancer Epidemiol Biomarkers Prev 2008; 17:2152-62. [PMID: 18669583 DOI: 10.1158/1055-9965.epi-07-2893] [Citation(s) in RCA: 191] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Oral squamous cell carcinoma (OSCC) is associated with substantial mortality and morbidity. To identify potential biomarkers for the early detection of invasive OSCC, we compared the gene expressions of incident primary OSCC, oral dysplasia, and clinically normal oral tissue from surgical patients without head and neck cancer or preneoplastic oral lesions (controls), using Affymetrix U133 2.0 Plus arrays. We identified 131 differentially expressed probe sets using a training set of 119 OSCC patients and 35 controls. Forward and stepwise logistic regression analyses identified 10 successive combinations of genes which expression differentiated OSCC from controls. The best model included LAMC2, encoding laminin-gamma2 chain, and COL4A1, encoding collagen, type IV alpha1 chain. Subsequent modeling without these two markers showed that COL1A1, encoding collagen, type I alpha1 chain, and PADI1, encoding peptidyl arginine deiminase, type 1, could also distinguish OSCC from controls. We validated these two models using an internal independent testing set of 48 invasive OSCC and 10 controls and an external testing set of 42 head and neck squamous cell carcinoma cases and 14 controls (GEO GSE6791), with sensitivity and specificity above 95%. These two models were also able to distinguish dysplasia (n = 17) from control (n = 35) tissue. Differential expression of these four genes was confirmed by quantitative reverse transcription-PCR. If confirmed in larger studies, the proposed models may hold promise for monitoring local recurrence at surgical margins and the development of second primary oral cancer in patients with OSCC.
Collapse
Affiliation(s)
- Chu Chen
- Program in Epidemiology, Fred Hutchinson Cancer Research Center, DEpartment of Epidemiology, 1100 Fairview Avenue North, M5-C800 P.O. Box 19024, Seattle, WA 98109-1024, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
|
22
|
Wei P, Pan W. Incorporating gene functions into regression analysis of DNA-protein binding data and gene expression data to construct transcriptional networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:401-415. [PMID: 18670043 DOI: 10.1109/tcbb.2007.1062] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Useful information on transcriptional networks has been extracted by regression analyses of gene expression data and DNA-protein binding data. However, a potential limitation of these approaches is their assumption on the common and constant activity level of a transcription factor (TF) on all the genes in any given experimental condition; for example, any TF is assumed to be either an activator or a repressor, but not both, while it is known that some TFs can be dual regulators. Rather than assuming a common linear regression model for all the genes, we propose using separate regression models for various gene groups; the genes can be grouped based on their functions or some clustering results. Furthermore, to take advantage of the hierarchical structure of many existing gene function annotation systems, such as Gene Ontology (GO), we propose a shrinkage method that borrows information from relevant gene groups. Applications to a yeast dataset and simulations lend support for our proposed methods. In particular, we find that the shrinkage method consistently works well under various scenarios. We recommend the use of the shrinkage method as a useful alternative to the existing methods.
Collapse
Affiliation(s)
- Peng Wei
- Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building, MMC 303, Minneapolis, MN 55455-0378, USA.
| | | |
Collapse
|
23
|
Stirewalt DL, Meshinchi S, Kopecky KJ, Fan W, Pogosova-Agadjanyan EL, Engel JH, Cronk MR, Dorcy KS, McQuary AR, Hockenbery D, Wood B, Heimfeld S, Radich JP. Identification of genes with abnormal expression changes in acute myeloid leukemia. Genes Chromosomes Cancer 2008; 47:8-20. [PMID: 17910043 DOI: 10.1002/gcc.20500] [Citation(s) in RCA: 129] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Acute myeloid leukemia (AML) is one of the most common and deadly forms of hematopoietic malignancies. We hypothesized that microarray studies could identify previously unrecognized expression changes that occur only in AML blasts. We were particularly interested in those genes with increased expression in AML, believing that these genes may be potential therapeutic targets. To test this hypothesis, we compared gene expression profiles between normal hematopoietic cells from 38 healthy donors and leukemic blasts from 26 AML patients. Normal hematopoietic samples included CD34+ selected cells (N = 18), unselected bone marrows (N = 10), and unselected peripheral bloods (N = 10). Twenty genes displayed AML-specific expression changes that were not found in the normal hematopoietic cells. Subsequent analyses using microarray data from 285 additional AML patients confirmed expression changes for 13 of the 20 genes. Seven genes (BIK, CCNA1, FUT4, IL3RA, HOMER3, JAG1, WT1) displayed increased expression in AML, while 6 genes (ALDHA1A, PELO, PLXNC1, PRUNE, SERPINB9, TRIB2) displayed decreased expression. Quantitative RT/PCR studies for the 7 over-expressed genes were performed in an independent set of 9 normal and 21 pediatric AML samples. All 7 over-expressed genes displayed an increased expression in the AML samples compared to normals. Three of the 7 over-expressed genes (WT1, CCNA1, and IL3RA) have already been linked to leukemogenesis and/or AML prognosis, while little is known about the role of the other 4 over-expressed genes in AML. Future studies will determine their potential role in leukemogenesis and their clinical significance.
Collapse
Affiliation(s)
- Derek L Stirewalt
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Cortical and brainstem LTP-like plasticity in Huntington's disease. Brain Res Bull 2008; 75:107-14. [DOI: 10.1016/j.brainresbull.2007.07.029] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2007] [Revised: 07/27/2007] [Accepted: 07/27/2007] [Indexed: 11/24/2022]
|
25
|
Stirewalt DL, Mhyre AJ, Marcondes M, Pogosova-Agadjanyan E, Abbasi N, Radich JP, Deeg HJ. Tumour necrosis factor-induced gene expression in human marrow stroma: clues to the pathophysiology of MDS? Br J Haematol 2007; 140:444-53. [PMID: 18162123 DOI: 10.1111/j.1365-2141.2007.06923.x] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Aberrant regulation of the tumour necrosis factor alpha gene (TNF) and stroma-derived signals are involved in the pathophysiology of myelodysplasia. Therefore, KG1a, a myeloid leukaemia cell line, was exposed to Tnf in the absence or presence of either HS-5 or HS-27a cells, two human stroma cell lines. While KG1a cells were resistant to Tnf-induced apoptosis in the absence of stroma cells, Tnf-promoted apoptosis of KG1a cells in co-culture experiments with stroma cells. To investigate the Tnf-induced signals from the stroma cells, we examined expression changes in HS-5 and HS-27a cells after Tnf exposure. DNA microarray studies found both discordant and concordant Tnf-induced expression responses in the two stroma cell lines. Tnf promoted an increased mRNA expression of pro-inflammatory cytokines [e.g. interleukin (IL)6, IL8 and IL32]. At the same time, Tnf decreased the mRNA expression of anti-apoptotic genes (e.g. BCL2L1) and increased the mRNA expression of pro-apoptotic genes (e.g. BID). Overall, the results suggested that Tnf induced a complex set of pro-inflammatory and pro-apoptotic signals in stroma cells that promote apoptosis in malignant myeloid clones. Additional studies will be required to determine which of these signals are critical for the induction of apoptosis in the malignant clones. Those insights, in turn, may point the way to novel therapeutic approaches.
Collapse
Affiliation(s)
- Derek L Stirewalt
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| | | | | | | | | | | | | |
Collapse
|
26
|
Méndez E, Fan W, Choi P, Agoff SN, Whipple M, Farwell DG, Futran ND, Weymuller EA, Zhao LP, Chen C. Tumor-specific genetic expression profile of metastatic oral squamous cell carcinoma. Head Neck 2007; 29:803-14. [PMID: 17573689 DOI: 10.1002/hed.20598] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
BACKGROUND Metastasis is the most important predictor of survival in patients with oral squamous cell carcinoma (OSCC). We tested the hypothesis that there is a genetic expression profile associated with OSCC metastasis. METHODS We obtained samples from 6 OSCC node-positive primary tumors and their matched metastatic lymph nodes, and 5 OSCC node-negative primary tumors. Using laser capture microdissection, we isolated OSCC cells from metastatic lymph nodes and compared them with those from matched primary tumors and unmatched node-negative primary tumors using Affymetrix Human Genome Focus arrays. RESULTS Comparison of tumor cells from the lymph nodes with those from the unmatched, node-negative primary tumors revealed differential expression of 160 genes. Hierarchical clustering and principal component analysis using this 160-gene set showed that the node-negative samples were distinguishable from both, node-positive primary tumors and tumors in the lymph nodes. Many of the expression changes found in the metastatic cells from the lymph nodes were also found in the node-positive primary tumors. Immunohistochemical analysis for transglutaminase-3 and keratin 16 confirmed the differential genetic expression for these genes. CONCLUSION These preliminary results suggest that there may be a metastatic gene expression profile present in node-positive primary OSCC.
Collapse
Affiliation(s)
- Eduardo Méndez
- Department of Otolaryngology - Head and Neck Surgery, University of Washington, Seattle, Washington 98195, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Han X, Sung WK, Feng L. Identifying differentially expressed genes in time-course microarray experiment without replicate. J Bioinform Comput Biol 2007; 5:281-96. [PMID: 17589962 DOI: 10.1142/s0219720007002655] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2006] [Revised: 11/27/2006] [Accepted: 12/05/2006] [Indexed: 11/18/2022]
Abstract
Replication of time series in microarray experiments is costly. To analyze time series data with no replicate, many model-specific approaches have been proposed. However, they fail to identify the genes whose expression patterns do not fit the pre-defined models. Besides, modeling the temporal expression patterns is difficult when the dynamics of gene expression in the experiment is poorly understood. We propose a method called Partial Energy ratio for Microarray (PEM) for the analysis of time course microarray data. In the PEM method, we assume the gene expressions vary smoothly in the temporal domain. This assumption is comparatively weak and hence the method is general enough to identify genes expressed in unexpected patterns. To identify the differentially expressed genes, a new statistic is developed by comparing the energies of two convoluted profiles. We further improve the statistic for microarray analysis by introducing the concept of partial energy. The PEM statistic can be easily incorporated into the SAM framework for significance analysis. We evaluated the PEM method with an artificial dataset and two published time course cDNA microarray datasets on yeast. The experimental results show the robustness and the generality of the PEM method in identifying the genes of interest.
Collapse
Affiliation(s)
- Xu Han
- Genome Institute of Singapore, 60, Biopolis Street, Singapore 138672, Singapore.
| | | | | |
Collapse
|
28
|
Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data. BMC Bioinformatics 2007; 8:240. [PMID: 17615071 PMCID: PMC1934919 DOI: 10.1186/1471-2105-8-240] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2007] [Accepted: 07/05/2007] [Indexed: 11/28/2022] Open
Abstract
Background Researchers using RNA expression microarrays in experimental designs with more than two treatment groups often identify statistically significant genes with ANOVA approaches. However, the ANOVA test does not discriminate which of the multiple treatment groups differ from one another. Thus, post hoc tests, such as linear contrasts, template correlations, and pairwise comparisons are used. Linear contrasts and template correlations work extremely well, especially when the researcher has a priori information pointing to a particular pattern/template among the different treatment groups. Further, all pairwise comparisons can be used to identify particular, treatment group-dependent patterns of gene expression. However, these approaches are biased by the researcher's assumptions, and some treatment-based patterns may fail to be detected using these approaches. Finally, different patterns may have different probabilities of occurring by chance, importantly influencing researchers' conclusions about a pattern and its constituent genes. Results We developed a four step, post hoc pattern matching (PPM) algorithm to automate single channel gene expression pattern identification/significance. First, 1-Way Analysis of Variance (ANOVA), coupled with post hoc 'all pairwise' comparisons are calculated for all genes. Second, for each ANOVA-significant gene, all pairwise contrast results are encoded to create unique pattern ID numbers. The # genes found in each pattern in the data is identified as that pattern's 'actual' frequency. Third, using Monte Carlo simulations, those patterns' frequencies are estimated in random data ('random' gene pattern frequency). Fourth, a Z-score for overrepresentation of the pattern is calculated ('actual' against 'random' gene pattern frequencies). We wrote a Visual Basic program (StatiGen) that automates PPM procedure, constructs an Excel workbook with standardized graphs of overrepresented patterns, and lists of the genes comprising each pattern. The visual basic code, installation files for StatiGen, and sample data are available as supplementary material. Conclusion The PPM procedure is designed to augment current microarray analysis procedures by allowing researchers to incorporate all of the information from post hoc tests to establish unique, overarching gene expression patterns in which there is no overlap in gene membership. In our hands, PPM works well for studies using from three to six treatment groups in which the researcher is interested in treatment-related patterns of gene expression. Hardware/software limitations and extreme number of theoretical expression patterns limit utility for larger numbers of treatment groups. Applied to a published microarray experiment, the StatiGen program successfully flagged patterns that had been manually assigned in prior work, and further identified other gene expression patterns that may be of interest. Thus, over a moderate range of treatment groups, PPM appears to work well. It allows researchers to assign statistical probabilities to patterns of gene expression that fit a priori expectations/hypotheses, it preserves the data's ability to show the researcher interesting, yet unanticipated gene expression patterns, and assigns the majority of ANOVA-significant genes to non-overlapping patterns.
Collapse
|
29
|
Xie Y, Pan W, Jeong KS, Khodursky A. Incorporating prior information via shrinkage: a combined analysis of genome-wide location data and gene expression data. Stat Med 2007; 26:2258-75. [PMID: 16958153 DOI: 10.1002/sim.2703] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Transcriptional control is a critical step in regulation of gene expression. Understanding such a control on a genomic level involves deciphering the mechanisms and structures of regulatory programmes and networks. A difficulty arises due to the weak signal and high noise in various sources of data while most current approaches are limited to analysis of a single source of data. A natural alternative is to improve statistical efficiency and power by a combined analysis of multiple sources of data. Here we propose a shrinkage method to combine genome-wide location data and gene expression data to detect the binding sites or target genes of a transcription factor. Specifically, a prior 'non-target' gene list is generated by analysing the expression data, and then this information is incorporated into the subsequent binding data analysis via a shrinkage method. There is a Bayesian justification for this shrinkage method. Both simulated and real data were used to evaluate the proposed method and compare it with analysing binding data alone. In simulation studies, the proposed method gives higher sensitivity and lower false discovery rate (FDR) in detecting the target genes. In real data example, the proposed method can reduce the estimated FDR and increase the power to detect the previously known target genes of a broad transcription regulator, leucine responsive regulatory protein (Lrp) in Escherichia coli. This method can also be used to incorporate other information, such as gene ontology (GO), to microarray data analysis to detect differentially expressed genes.
Collapse
Affiliation(s)
- Yang Xie
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | | | | | | |
Collapse
|
30
|
Fischer EA, Friedman MA, Markey MK. Empirical comparison of tests for differential expression on time-series microarray experiments. Genomics 2007; 89:460-70. [PMID: 17188839 DOI: 10.1016/j.ygeno.2006.10.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2006] [Revised: 10/26/2006] [Accepted: 10/30/2006] [Indexed: 11/16/2022]
Abstract
Methods for identifying differentially expressed genes were compared on time-series microarray data simulated from artificial gene networks. Select methods were further analyzed on existing immune response data of Boldrick et al. (2002, Proc. Natl. Acad. Sci. USA 99, 972-977). Based on the simulations, we recommend the ANOVA variants of Cui and Churchill. Efron and Tibshirani's empirical Bayes Wilcoxon rank sum test is recommended when the background cannot be effectively corrected. Our proposed GSVD-based differential expression method was shown to detect subtle changes. ANOVA combined with GSVD was consistent on background-normalized simulation data. GSVD with empirical Bayes was consistent without background correction. Based on the Boldrick et al. data, ANOVA is best suited to detect changes in temporal data, while GSVD and empirical Bayes effectively detect individual spikes or overall shifts, respectively. For methods tested on simulation data, lowess after background correction improved results. On simulation data without background correction, lowess decreased performance compared to median centering.
Collapse
Affiliation(s)
- Ernest A Fischer
- Department of Biomedical Engineering, University of Texas at Austin, Campus Code C0800, 1 University Station, Austin, TX 78712, USA
| | | | | |
Collapse
|
31
|
Di Camillo B, Toffolo G, Nair SK, Greenlund LJ, Cobelli C. Significance analysis of microarray transcript levels in time series experiments. BMC Bioinformatics 2007; 8 Suppl 1:S10. [PMID: 17430554 PMCID: PMC1885839 DOI: 10.1186/1471-2105-8-s1-s10] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Background Microarray time series studies are essential to understand the dynamics of molecular events. In order to limit the analysis to those genes that change expression over time, a first necessary step is to select differentially expressed transcripts. A variety of methods have been proposed to this purpose; however, these methods are seldom applicable in practice since they require a large number of replicates, often available only for a limited number of samples. In this data-poor context, we evaluate the performance of three selection methods, using synthetic data, over a range of experimental conditions. Application to real data is also discussed. Results Three methods are considered, to assess differentially expressed genes in data-poor conditions. Method 1 uses a threshold on individual samples based on a model of the experimental error. Method 2 calculates the area of the region bounded by the time series expression profiles, and considers the gene differentially expressed if the area exceeds a threshold based on a model of the experimental error. These two methods are compared to Method 3, recently proposed in the literature, which exploits splines fit to compare time series profiles. Application of the three methods to synthetic data indicates that Method 2 outperforms the other two both in Precision and Recall when short time series are analyzed, while Method 3 outperforms the other two for long time series. Conclusion These results help to address the choice of the algorithm to be used in data-poor time series expression study, depending on the length of the time series.
Collapse
Affiliation(s)
- Barbara Di Camillo
- Information Engineering Department, University of Padova, 35131 Padova, Italy
| | - Gianna Toffolo
- Information Engineering Department, University of Padova, 35131 Padova, Italy
| | | | - Laura J Greenlund
- Endocrinology Division, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Claudio Cobelli
- Information Engineering Department, University of Padova, 35131 Padova, Italy
| |
Collapse
|
32
|
|
33
|
Liu Y, Yokota H. Artificial ants deposit pheromone to search for regulatory DNA elements. BMC Genomics 2006; 7:221. [PMID: 16942615 PMCID: PMC1586019 DOI: 10.1186/1471-2164-7-221] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2006] [Accepted: 08/30/2006] [Indexed: 01/31/2023] Open
Abstract
Background Identification of transcription-factor binding motifs (DNA sequences) can be formulated as a combinatorial problem, where an efficient algorithm is indispensable to predict the role of multiple binding motifs. An ant algorithm is a biology-inspired computational technique, through which a combinatorial problem is solved by mimicking the behavior of social insects such as ants. We developed a unique version of ant algorithms to select a set of binding motifs by considering a potential contribution of each of all random DNA sequences of 4- to 7-bp in length. Results Human chondrogenesis was used as a model system. The results revealed that the ant algorithm was able to identify biologically known binding motifs in chondrogenesis such as AP-1, NFκB, and sox9. Some of the predicted motifs were identical to those previously derived with the genetic algorithm. Unlike the genetic algorithm, however, the ant algorithm was able to evaluate a contribution of individual binding motifs as a spectrum of distributed information and predict core consensus motifs from a wider DNA pool. Conclusion The ant algorithm offers an efficient, reproducible procedure to predict a role of individual transcription-factor binding motifs using a unique definition of artificial ants.
Collapse
Affiliation(s)
- Yunlong Liu
- Division of Biostatistics, Department of Medicine, Center for Computational Biology and Bioinformatics, Indiana University – Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Hiroki Yokota
- Department of Biomedical Engineering, Indiana University – Purdue University Indianapolis, Indianapolis, IN 46202, USA
| |
Collapse
|
34
|
Vinciotti V, Liu X, Turk R, de Meijer EJ, 't Hoen PAC. Exploiting the full power of temporal gene expression profiling through a new statistical test: application to the analysis of muscular dystrophy data. BMC Bioinformatics 2006; 7:183. [PMID: 16584545 PMCID: PMC1450310 DOI: 10.1186/1471-2105-7-183] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2005] [Accepted: 04/03/2006] [Indexed: 11/21/2022] Open
Abstract
Background The identification of biologically interesting genes in a temporal expression profiling dataset is challenging and complicated by high levels of experimental noise. Most statistical methods used in the literature do not fully exploit the temporal ordering in the dataset and are not suited to the case where temporal profiles are measured for a number of different biological conditions. We present a statistical test that makes explicit use of the temporal order in the data by fitting polynomial functions to the temporal profile of each gene and for each biological condition. A Hotelling T2-statistic is derived to detect the genes for which the parameters of these polynomials are significantly different from each other. Results We validate the temporal Hotelling T2-test on muscular gene expression data from four mouse strains which were profiled at different ages: dystrophin-, beta-sarcoglycan and gamma-sarcoglycan deficient mice, and wild-type mice. The first three are animal models for different muscular dystrophies. Extensive biological validation shows that the method is capable of finding genes with temporal profiles significantly different across the four strains, as well as identifying potential biomarkers for each form of the disease. The added value of the temporal test compared to an identical test which does not make use of temporal ordering is demonstrated via a simulation study, and through confirmation of the expression profiles from selected genes by quantitative PCR experiments. The proposed method maximises the detection of the biologically interesting genes, whilst minimising false detections. Conclusion The temporal Hotelling T2-test is capable of finding relatively small and robust sets of genes that display different temporal profiles between the conditions of interest. The test is simple, it can be used on gene expression data generated from any experimental design and for any number of conditions, and it allows fast interpretation of the temporal behaviour of genes. The R code is available from V.V. The microarray data have been submitted to GEO under series GSE1574 and GSE3523.
Collapse
Affiliation(s)
- Veronica Vinciotti
- Department of Information Systems and Computing, Brunel University, Uxbridge UB8 3PH, UK
| | - Xiaohui Liu
- Department of Information Systems and Computing, Brunel University, Uxbridge UB8 3PH, UK
- Leiden Institute of Advanced Computer Science, Leiden University, PO Box 9512, 2300 RA Leiden, Netherlands
| | - Rolf Turk
- Center for Human and Clinical Genetics, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, Netherlands
- Present affiliation: Howard Hughes Medical Institute, Department of Physiology and Biophysics, Iowa City, Iowa, USA
| | - Emile J de Meijer
- Center for Human and Clinical Genetics, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, Netherlands
| | - Peter AC 't Hoen
- Center for Human and Clinical Genetics, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, Netherlands
| |
Collapse
|
35
|
Conesa A, Nueda MJ, Ferrer A, Talón M. maSigPro: a method to identify significantly differential expression profiles in time-course microarray experiments. Bioinformatics 2006; 22:1096-102. [PMID: 16481333 DOI: 10.1093/bioinformatics/btl056] [Citation(s) in RCA: 292] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Multi-series time-course microarray experiments are useful approaches for exploring biological processes. In this type of experiments, the researcher is frequently interested in studying gene expression changes along time and in evaluating trend differences between the various experimental groups. The large amount of data, multiplicity of experimental conditions and the dynamic nature of the experiments poses great challenges to data analysis. RESULTS In this work, we propose a statistical procedure to identify genes that show different gene expression profiles across analytical groups in time-course experiments. The method is a two-regression step approach where the experimental groups are identified by dummy variables. The procedure first adjusts a global regression model with all the defined variables to identify differentially expressed genes, and in second a variable selection strategy is applied to study differences between groups and to find statistically significant different profiles. The methodology is illustrated on both a real and a simulated microarray dataset.
Collapse
Affiliation(s)
- Ana Conesa
- Centro de Genómica. Instituto Valenciano de Investigaciones Agrarias, Apartado Oficial 46113 Moncada, Valencia, Spain.
| | | | | | | |
Collapse
|
36
|
Hodges A, Strand AD, Aragaki AK, Kuhn A, Sengstag T, Hughes G, Elliston LA, Hartog C, Goldstein DR, Thu D, Hollingsworth ZR, Collin F, Synek B, Holmans PA, Young AB, Wexler NS, Delorenzi M, Kooperberg C, Augood SJ, Faull RLM, Olson JM, Jones L, Luthi-Carter R. Regional and cellular gene expression changes in human Huntington's disease brain. Hum Mol Genet 2006; 15:965-77. [PMID: 16467349 DOI: 10.1093/hmg/ddl013] [Citation(s) in RCA: 563] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Huntington's disease (HD) pathology is well understood at a histological level but a comprehensive molecular analysis of the effect of the disease in the human brain has not previously been available. To elucidate the molecular phenotype of HD on a genome-wide scale, we compared mRNA profiles from 44 human HD brains with those from 36 unaffected controls using microarray analysis. Four brain regions were analyzed: caudate nucleus, cerebellum, prefrontal association cortex [Brodmann's area 9 (BA9)] and motor cortex [Brodmann's area 4 (BA4)]. The greatest number and magnitude of differentially expressed mRNAs were detected in the caudate nucleus, followed by motor cortex, then cerebellum. Thus, the molecular phenotype of HD generally parallels established neuropathology. Surprisingly, no mRNA changes were detected in prefrontal association cortex, thereby revealing subtleties of pathology not previously disclosed by histological methods. To establish that the observed changes were not simply the result of cell loss, we examined mRNA levels in laser-capture microdissected neurons from Grade 1 HD caudate compared to control. These analyses confirmed changes in expression seen in tissue homogenates; we thus conclude that mRNA changes are not attributable to cell loss alone. These data from bona fide HD brains comprise an important reference for hypotheses related to HD and other neurodegenerative diseases.
Collapse
Affiliation(s)
- Angela Hodges
- Department of Psychological Medicine, Wales College of Medicine and School of Biosciences, Cardiff University, Heath Park, Cardiff CF14 4XN, Wales, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Guo Y, Breeden LL, Fan W, Zhao LP, Eaton DL, Zarbl H. Analysis of cellular responses to aflatoxin B(1) in yeast expressing human cytochrome P450 1A2 using cDNA microarrays. Mutat Res 2006; 593:121-42. [PMID: 16122766 DOI: 10.1016/j.mrfmmm.2005.07.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2005] [Revised: 06/22/2005] [Accepted: 07/01/2005] [Indexed: 05/04/2023]
Abstract
Aflatoxin B1 (AFB(1)) is a potent human hepatotoxin and hepatocarcinogen produced by the mold Aspergillus flavus. In human, AFB(1) is bioactivated by cytochrome P450 (CYP450) enzymes, primarily CYP1A2, to the genotoxic epoxide that forms N(7)-guanine DNA adducts. To characterize the transcriptional responses to genotoxic insults from AFB(1), a strain of Saccharomyces cerevisiae engineered to express human CYP1A2 was exposed to doses of AFB(1) that resulted in minimal lethality, but substantial genotoxicity. Flow cytometric analysis demonstrated a dose and time dependent S phase delay under the same treatment conditions, indicating a checkpoint response to DNA damage. Replicate cDNA microarray analyses of AFB(1) treated cells showed that about 200 genes were significantly affected by the exposure. The genes activated by AFB(1)-treatment included RAD51, DUN1 and other members of the DNA damage response signature reported in a previous study with methylmethane sulfonate and ionizing radiation [A.P. Gasch, M. Huang, S. Metzner, D. Botstein, S.J. Elledge, P.O. Brown, Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p, Mol. Biol. Cell 12 (2001) 2987-3003]. However, unlike previous studies using highly cytotoxic doses, environmental stress response genes [A.P. Gasch, P.T. Spellman, C.M. Kao, O. Carmel-Harel, M.B. Eisen, G. Storz, D. Botstein, P.O. Brown, Genomic expression programs in the response of yeast cells to environmental changes, Mol. Biol. Cell 11 (2000) 4241-4257] were largely unaffected by our dosing regimen. About half of the transcripts affected are also known to be cell cycle regulated. The most strongly repressed transcripts were those encoding the histone genes and a group of genes that are cell cycle regulated and peak in M phase and early G1. These include most of the known daughter-specific genes. The rapid and coordinated repression of histones and M/G1-specific transcripts cannot be explained by cell cycle arrest, and suggested that there are additional signaling pathways that directly repress these genes in cells under genotoxic stress.
Collapse
Affiliation(s)
- Yingying Guo
- Departmental of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA, USA
| | | | | | | | | | | |
Collapse
|
38
|
Hong F, Li H. Functional Hierarchical Models for Identifying Genes with Different Time-Course Expression Profiles. Biometrics 2006; 62:534-44. [PMID: 16918918 DOI: 10.1111/j.1541-0420.2005.00505.x] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Time-course studies of gene expression are essential in biomedical research to understand biological phenomena that evolve in a temporal fashion. We introduce a functional hierarchical model for detecting temporally differentially expressed (TDE) genes between two experimental conditions for cross-sectional designs, where the gene expression profiles are treated as functional data and modeled by basis function expansions. A Monte Carlo EM algorithm was developed for estimating both the gene-specific parameters and the hyperparameters in the second level of modeling. We use a direct posterior probability approach to bound the rate of false discovery at a pre-specified level and evaluate the methods by simulations and application to microarray time-course gene expression data on Caenorhabditis elegans developmental processes. Simulation results suggested that the procedure performs better than the two-way ANOVA in identifying TDE genes, resulting in both higher sensitivity and specificity. Genes identified from the C. elegans developmental data set show clear patterns of changes between the two experimental conditions.
Collapse
Affiliation(s)
- F Hong
- Plant Biology Laboratory, The Salk Institute, La Jolla, California 92037, USA
| | | |
Collapse
|
39
|
Xie Y, Pan W, Khodursky AB. A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data. Bioinformatics 2005; 21:4280-8. [PMID: 16188930 DOI: 10.1093/bioinformatics/bti685] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION False discovery rate (FDR) is defined as the expected percentage of false positives among all the claimed positives. In practice, with the true FDR unknown, an estimated FDR can serve as a criterion to evaluate the performance of various statistical methods under the condition that the estimated FDR approximates the true FDR well, or at least, it does not improperly favor or disfavor any particular method. Permutation methods have become popular to estimate FDR in genomic studies. The purpose of this paper is 2-fold. First, we investigate theoretically and empirically whether the standard permutation-based FDR estimator is biased, and if so, whether the bias inappropriately favors or disfavors any method. Second, we propose a simple modification of the standard permutation to yield a better FDR estimator, which can in turn serve as a more fair criterion to evaluate various statistical methods. RESULTS Both simulated and real data examples are used for illustration and comparison. Three commonly used test statistics, the sample mean, SAM statistic and Student's t-statistic, are considered. The results show that the standard permutation method overestimates FDR. The overestimation is the most severe for the sample mean statistic while the least for the t-statistic with the SAM-statistic lying between the two extremes, suggesting that one has to be cautious when using the standard permutation-based FDR estimates to evaluate various statistical methods. In addition, our proposed FDR estimation method is simple and outperforms the standard method.
Collapse
Affiliation(s)
- Yang Xie
- Division of Biostatistics, School of Public Health, University of Minnesota Minneapolis, MN 55455, USA.
| | | | | |
Collapse
|
40
|
Zhang JJ, Yi T, Zhao LP. Evaluation of nine strategies for analyzing a cDNA toxicology microarray data set. J Biopharm Stat 2005; 15:403-18. [PMID: 15920888 DOI: 10.1081/bip-200056518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Microarray technology with two-color-based cDNA is commonly used for drug development, as well as for a much broader range of biomedical research. Among all the applications, two-group design is probably most commonly used for comparing, e.g., normal and abnormal tissue samples, tissues treated and untreated, or individuals responded and not responded to a drug. Despite the apparent simplicity, there are numerous methods for analyzing such data in a statistically rigorous manner. Here, we discuss nine different analytical strategies, each of which is derived under a set of "reasonable" assumptions. Some of them resemble methods developed for different contexts. In the absence of the truth, investigators should consider underlying assumptions before taking one or more of these strategies for analyzing data from a particular experiment. The issue here is what are the similarities and differences between these analytical strategies. We present these strategies in the context of an actual microarray experiment performed at the U.S. Food and Drug Administration.
Collapse
Affiliation(s)
- Juan Joanne Zhang
- Office of Biostatistics, Center for Drug Evaluation and Research, US Food and Drug Administration, Rockville, Maryland 20857, USA.
| | | | | |
Collapse
|
41
|
Liu H, Tarima S, Borders AS, Getchell TV, Getchell ML, Stromberg AJ. Quadratic regression analysis for gene discovery and pattern recognition for non-cyclic short time-course microarray experiments. BMC Bioinformatics 2005; 6:106. [PMID: 15850479 PMCID: PMC1127068 DOI: 10.1186/1471-2105-6-106] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2004] [Accepted: 04/25/2005] [Indexed: 02/02/2023] Open
Abstract
Background Cluster analyses are used to analyze microarray time-course data for gene discovery and pattern recognition. However, in general, these methods do not take advantage of the fact that time is a continuous variable, and existing clustering methods often group biologically unrelated genes together. Results We propose a quadratic regression method for identification of differentially expressed genes and classification of genes based on their temporal expression profiles for non-cyclic short time-course microarray data. This method treats time as a continuous variable, therefore preserves actual time information. We applied this method to a microarray time-course study of gene expression at short time intervals following deafferentation of olfactory receptor neurons. Nine regression patterns have been identified and shown to fit gene expression profiles better than k-means clusters. EASE analysis identified over-represented functional groups in each regression pattern and each k-means cluster, which further demonstrated that the regression method provided more biologically meaningful classifications of gene expression profiles than the k-means clustering method. Comparison with Peddada et al.'s order-restricted inference method showed that our method provides a different perspective on the temporal gene profiles. Reliability study indicates that regression patterns have the highest reliabilities. Conclusion Our results demonstrate that the proposed quadratic regression method improves gene discovery and pattern recognition for non-cyclic short time-course microarray data. With a freely accessible Excel macro, investigators can readily apply this method to their microarray data.
Collapse
Affiliation(s)
- Hua Liu
- Department of Statistics, University of Kentucky, Lexington, KY 40506, USA
| | - Sergey Tarima
- Department of Statistics, University of Kentucky, Lexington, KY 40506, USA
| | - Aaron S Borders
- Department of Physiology, University of Kentucky, Lexington, KY 40536, USA
| | - Thomas V Getchell
- Department of Physiology, University of Kentucky, Lexington, KY 40536, USA
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY 40536, USA
| | - Marilyn L Getchell
- Department of Anatomy and Neurobiology, University of Kentucky, Lexington, KY 40536, USA
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY 40536, USA
| | - Arnold J Stromberg
- Department of Statistics, University of Kentucky, Lexington, KY 40506, USA
| |
Collapse
|
42
|
Tan Q, Brusgaard K, Kruse TA, Oakeley E, Hemmings B, Beck-Nielsen H, Hansen L, Gaster M. Correspondence analysis of microarray time-course data in case-control design. J Biomed Inform 2005; 37:358-65. [PMID: 15488749 DOI: 10.1016/j.jbi.2004.06.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2004] [Indexed: 11/25/2022]
Abstract
Although different statistical approaches have been proposed for analyzing microarray time-course data, method for analyzing such data collected using the popular case-control design in clinical investigations has not been proposed perhaps due to the increased complexity for the existing parametric or non-parametric approaches. In this paper, we introduce a new multivariate data analyzing technique, the correspondence analysis, to analyze the high dimensional microarray time-course data in case-control design. We show, through an example on type 2 diabetes, how the nice features of the correspondence analysis can be use to explore the various time-course gene expression profiles that exist in the data. By coordinating and examining the projections on the reduced dimensions by both the genes and the time-course experiments, we are able to identify important genes and time-course patterns and make inferences on their biological relevance. Using the sample replicates, we propose a bootstrap procedure for inferring the significance of contributions on the leading dimensions by both the time-course experiments and the genes. Striking differences in the time-course patterns in the normal controls and diabetes patients have been revealed. In addition, the method also identifies genes that display similar or comparable time-course expression patterns shared by both the cases and the controls. We conclude that our correspondence analysis based approach can be a useful tool for analyzing high dimensional microarray data collected in clinical investigations.
Collapse
Affiliation(s)
- Qihua Tan
- Odense University Hospital, Odense, Denmark.
| | | | | | | | | | | | | | | |
Collapse
|
43
|
Luo W, Fan W, Xie H, Jing L, Ricicki E, Vouros P, Zhao LP, Zarbl H. Phenotypic Anchoring of Global Gene Expression Profiles Induced by N-Hydroxy-4-acetylaminobiphenyl and Benzo[a]pyrene Diol Epoxide Reveals Correlations between Expression Profiles and Mechanism of Toxicity. Chem Res Toxicol 2005; 18:619-29. [PMID: 15833022 DOI: 10.1021/tx049828f] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The goal of this study was to compare changes in gene expression induced by exposure to different carcinogens and to anchor these changes to the induced levels of toxicity and mutagenesis. The human TK6 lymphoblastoid cell line was used as an in vitro model system, and reactive metabolites of two human carcinogens, benzo[a]pyrene and 4-aminobiphenyl, were used as model compounds. We first determined the toxicity of the model compounds N-hydroxy-4-acetylaminobiphenyl (N-OH-AABP) and benzo[a]pyrene diol epoxide (BPDE) in TK6 cells. BPDE was about 1000-fold more toxic and mutagenic than N-OH-AABP in TK6 cells on a molar basis. We next treated cells with three doses of each compound that resulted in low, medium, and high toxicities (5, 15, and 40%) and harvested cells at different times after exposure. Using comparable levels of toxicity as the phenotypic anchor, we compared the patterns of gene expression induced by each reactive metabolite using printed cDNA microarrays comprising approximately 18,000 human gene/EST sequences. The microarray data from the N-OH-AABP and BPDE treatment groups were compared using self-organizing map clustering algorithms, as well as a statistical regression modeling approach. While subsets of genes indicative of a generalized stress response [Hsp 40 homologue (DNAJ), Hsp70, Hsp105, and Hsp 125] were detected after exposure to both compounds at all concentrations, there were also many differentially regulated genes, including phase I xenobiotic metabolism [e.g., glutathione transferase omega (GSTTLp28) and antioxidant enzymes (Apxl)]. Other differentially regulated genes included those encoding proteins involved in all major DNA repair pathways, including excision repair (e.g., ERCC5), mismatch repair (e.g., MLH3), damage specific DNA binding protein (e.g., DDB2), and cisplatin resistance-associated overexpressed protein (LUC7A, CRA). Differences in the transcriptional response of TK6 cells to N-OH-AABP or BPDE exposure may explain the dramatic differences in the toxicity and mutagenicity of these human carcinogens.
Collapse
Affiliation(s)
- Wen Luo
- Division of Human Biology, Fred Hutchison Cancer Research Center, 1100 Fairview Avenue North, Seattle, Washington 98109, USA
| | | | | | | | | | | | | | | |
Collapse
|
44
|
Abstract
Clustering of gene expression data and gene network inference from such data has been a major research topic in recent years. In clustering, pairwise measurements are performed when calculating the distance matrix upon which the clustering is based. Pairwise measurements can also be used for gene network inference, by deriving potential interactions above a certain correlation or distance threshold. Our experiments show how interaction networks derived by this simple approach exhibit low-but significant-sensitivity and specificity. We also explore the effects that normalization and prefiltering have on the results of methods for identifying interactions from expression data. Before derivation of interactions or clustering, preprocessing is often performed by applying normalization to rescale the expression profiles and prefiltering where genes that do not appear to contribute to regulation are removed. In this paper, different ways of normalizing in combination with different distance measurements are tested on both unfiltered and prefiltered data, different prefiltering criteria are considered.
Collapse
Affiliation(s)
- Angelica Lindlöf
- Department of Computer Science, University of Skövde, 54128 Skovde, Sweden.
| | | |
Collapse
|
45
|
Bar-Joseph Z, Gerber G, Simon I, Gifford DK, Jaakkola TS. Comparing the continuous representation of time-series expression profiles to identify differentially expressed genes. Proc Natl Acad Sci U S A 2003; 100:10146-51. [PMID: 12934016 PMCID: PMC193530 DOI: 10.1073/pnas.1732547100] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2003] [Indexed: 11/18/2022] Open
Abstract
We present a general algorithm to detect genes differentially expressed between two nonhomogeneous time-series data sets. As increasing amounts of high-throughput biological data become available, a major challenge in genomic and computational biology is to develop methods for comparing data from different experimental sources. Time-series whole-genome expression data are a particularly valuable source of information because they can describe an unfolding biological process such as the cell cycle or immune response. However, comparisons of time-series expression data sets are hindered by biological and experimental inconsistencies such as differences in sampling rate, variations in the timing of biological processes, and the lack of repeats. Our algorithm overcomes these difficulties by using a continuous representation for time-series data and combining a noise model for individual samples with a global difference measure. We introduce a corresponding statistical method for computing the significance of this differential expression measure. We used our algorithm to compare cell-cycle-dependent gene expression in wild-type and knockout yeast strains. Our algorithm identified a set of 56 differentially expressed genes, and these results were validated by using independent protein-DNA-binding data. Unlike previous methods, our algorithm was also able to identify 22 non-cell-cycle-regulated genes as differentially expressed. This set of genes is significantly correlated in a set of independent expression experiments, suggesting additional roles for the transcription factors Fkh1 and Fkh2 in controlling cellular activity in yeast.
Collapse
Affiliation(s)
- Ziv Bar-Joseph
- Laboratory for Computer Science, Massachusetts Institute of Technology, 200 Technology Square, Cambridge, MA 02139, USA.
| | | | | | | | | |
Collapse
|
46
|
Gadbury GL, Page GP, Heo M, Mountz JD, Allison DB. Randomization tests for small samples: an application for genetic expression data. J R Stat Soc Ser C Appl Stat 2003. [DOI: 10.1111/1467-9876.00410] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
47
|
Serikawa KA, Xu XL, MacKay VL, Law GL, Zong Q, Zhao LP, Bumgarner R, Morris DR. The transcriptome and its translation during recovery from cell cycle arrest in Saccharomyces cerevisiae. Mol Cell Proteomics 2003; 2:191-204. [PMID: 12684541 DOI: 10.1074/mcp.d200002-mcp200] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Complete genome sequences together with high throughput technologies have made comprehensive characterizations of gene expression patterns possible. While genome-wide measurement of mRNA levels was one of the first applications of these advances, other important aspects of gene expression are also amenable to a genomic approach, for example, the translation of message into protein. Earlier we reported a high throughput technology for simultaneously studying mRNA level and translation, which we termed translation state array analysis, or TSAA. The current studies test the proposition that TSAA can identify novel instances of translation regulation at the genome-wide level. As a biological model, cultures of Saccharomyces cerevisiae were cell cycle-arrested using either alpha-factor or the temperature-sensitive cdc15-2 allele. Forty-eight mRNAs were found to change significantly in translation state following release from alpha-factor arrest, including genes involved in pheromone response and cell cycle arrest such as BAR1, SST2, and FAR1. After the shift of the cdc15-2 strain from 37 degrees C to 25 degrees C, 54 mRNAs were altered in translation state, including the products of the stress genes HSP82, HSC82, and SSA2. Thus, regulation at the translational level seems to play a significant role in the response of yeast cells to external physical or biological cues. In contrast, surprisingly few genes were found to be translationally controlled as cells progressed through the cell cycle. Additional refinements of TSAA should allow characterization of both transcriptional and translational regulatory networks on a genomic scale, providing an additional layer of information that can be integrated into models of system biology and function.
Collapse
Affiliation(s)
- Kyle A Serikawa
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
| | | | | | | | | | | | | | | |
Collapse
|