1
|
An atlas of causal and mechanistic drivers of interpatient heterogeneity in glioma. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.05.24305380. [PMID: 38633778 PMCID: PMC11023657 DOI: 10.1101/2024.04.05.24305380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Grade IV glioma, formerly known as glioblastoma multiforme (GBM) is the most aggressive and lethal type of brain tumor, and its treatment remains challenging in part due to extensive interpatient heterogeneity in disease driving mechanisms and lack of prognostic and predictive biomarkers. Using mechanistic inference of node-edge relationship (MINER), we have analyzed multiomics profiles from 516 patients and constructed an atlas of causal and mechanistic drivers of interpatient heterogeneity in GBM (gbmMINER). The atlas has delineated how 30 driver mutations act in a combinatorial scheme to causally influence a network of regulators (306 transcription factors and 73 miRNAs) of 179 transcriptional "programs", influencing disease progression in patients across 23 disease states. Through extensive testing on independent patient cohorts, we share evidence that a machine learning model trained on activity profiles of programs within gbmMINER significantly augments risk stratification, identifying patients who are super-responders to standard of care and those that would benefit from 2 nd line treatments. In addition to providing mechanistic hypotheses regarding disease prognosis, the activity of programs containing targets of 2 nd line treatments accurately predicted efficacy of 28 drugs in killing glioma stem-like cells from 43 patients. Our findings demonstrate that interpatient heterogeneity manifests from differential activities of transcriptional programs, providing actionable strategies for mechanistically characterizing GBM from a systems perspective and developing better prognostic and predictive biomarkers for personalized medicine.
Collapse
|
2
|
Graded Coexpression of Ion Channel, Neurofilament, and Synaptic Genes in Fast-Spiking Vestibular Nucleus Neurons. J Neurosci 2019; 40:496-508. [PMID: 31719168 DOI: 10.1523/jneurosci.1500-19.2019] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 10/11/2019] [Accepted: 10/25/2019] [Indexed: 11/21/2022] Open
Abstract
Computations that require speed and temporal precision are implemented throughout the nervous system by neurons capable of firing at very high rates, rapidly encoding and transmitting a rich amount of information, but with substantial metabolic and physical costs. For economical fast spiking and high throughput information processing, neurons need to optimize multiple biophysical properties in parallel, but the mechanisms of this coordination remain unknown. We hypothesized that coordinated gene expression may underlie the coordinated tuning of the biophysical properties required for rapid firing and signal transmission. Taking advantage of the diversity of fast-spiking cell types in the medial vestibular nucleus of mice of both sexes, we examined the relationship between gene expression, ionic currents, and neuronal firing capacity. Across excitatory and inhibitory cell types, genes encoding voltage-gated ion channels responsible for depolarizing and repolarizing the action potential were tightly coexpressed, and their absolute expression levels increased with maximal firing rate. Remarkably, this coordinated gene expression extended to neurofilaments and specific presynaptic molecules, providing a mechanism for coregulating axon caliber and transmitter release to match firing capacity. These findings suggest the presence of a module of genes, which is coexpressed in a graded manner and jointly tunes multiple biophysical properties for economical differentiation of firing capacity. The graded tuning of fast-spiking capacity by the absolute expression levels of specific ion channels provides a counterexample to the widely held assumption that cell-type-specific firing patterns can be achieved via a vast combination of different ion channels.SIGNIFICANCE STATEMENT Although essential roles of fast-spiking neurons in various neural circuits have been widely recognized, it remains unclear how neurons efficiently coordinate the multiple biophysical properties required to maintain high rates of action potential firing and transmitter release. Taking advantage of diverse fast-firing capacities among medial vestibular nucleus neurons of mice, we identify a group of ion channel, synaptic, and structural genes that exhibit mutually correlated expression levels, which covary with firing capacity. Coexpression of this fast-spiking gene module may be a basic strategy for neurons to efficiently and coordinately tune the speed of action potential generation and propagation and transmitter release at presynaptic terminals.
Collapse
|
3
|
Variation among intact tissue samples reveals the core transcriptional features of human CNS cell classes. Nat Neurosci 2018; 21:1171-1184. [PMID: 30154505 PMCID: PMC6192711 DOI: 10.1038/s41593-018-0216-z] [Citation(s) in RCA: 114] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Accepted: 07/10/2018] [Indexed: 02/08/2023]
Abstract
It is widely assumed that cells must be physically isolated to study their molecular profiles. However, intact tissue samples naturally exhibit variation in cellular composition, which drives covariation of cell-class-specific molecular features. By analyzing transcriptional covariation in 7221 intact CNS samples from 840 neurotypical individuals representing billions of cells, we reveal the core transcriptional identities of major CNS cell classes in humans. By modeling intact CNS transcriptomes as a function of variation in cellular composition, we identify cell-class-specific transcriptional differences in Alzheimer’s disease, among brain regions, and between species. Among these, we show that PMP2 is expressed by human but not mouse astrocytes and significantly increases mouse astrocyte size upon ectopic expression in vivo, causing them to more closely resemble their human counterparts. Our work is available as an online resource (http://oldhamlab.ctec.ucsf.edu/) and provides a generalizable strategy for determining the core molecular features of cellular identity in intact biological systems.
Collapse
|
4
|
Differential Network Analysis Reveals Evolutionary Complexity in Secondary Metabolism of Rauvolfia serpentina over Catharanthus roseus. FRONTIERS IN PLANT SCIENCE 2016; 7:1229. [PMID: 27588023 PMCID: PMC4988974 DOI: 10.3389/fpls.2016.01229] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2016] [Accepted: 08/02/2016] [Indexed: 05/07/2023]
Abstract
Comparative co-expression analysis of multiple species using high-throughput data is an integrative approach to determine the uniformity as well as diversification in biological processes. Rauvolfia serpentina and Catharanthus roseus, both members of Apocyanacae family, are reported to have remedial properties against multiple diseases. Despite of sharing upstream of terpenoid indole alkaloid pathway, there is significant diversity in tissue-specific synthesis and accumulation of specialized metabolites in these plants. This led us to implement comparative co-expression network analysis to investigate the modules and genes responsible for differential tissue-specific expression as well as species-specific synthesis of metabolites. Toward these goals differential network analysis was implemented to identify candidate genes responsible for diversification of metabolites profile. Three genes were identified with significant difference in connectivity leading to differential regulatory behavior between these plants. These genes may be responsible for diversification of secondary metabolism, and thereby for species-specific metabolite synthesis. The network robustness of R. serpentina, determined based on topological properties, was also complemented by comparison of gene-metabolite networks of both plants, and may have evolved to have complex metabolic mechanisms as compared to C. roseus under the influence of various stimuli. This study reveals evolution of complexity in secondary metabolism of R. serpentina, and key genes that contribute toward diversification of specific metabolites.
Collapse
|
5
|
Analysis of discordant Affymetrix probesets casts serious doubt on idea of microarray data reutilization. BMC Genomics 2014; 15 Suppl 12:S8. [PMID: 25563078 PMCID: PMC4303952 DOI: 10.1186/1471-2164-15-s12-s8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Background Affymetrix microarray technology allows one to investigate expression of thousands of genes simultaneously upon a variety of conditions. In a popular U133A microarray platform, the expression of 37% of genes is measured by more than one probeset. The discordant expression observed for two different probesets that match the same gene is a widespread phenomenon which is usually underestimated, ignored or disregarded. Results Here we evaluate the prevalence of discordant expression in data collected using Affymetrix HG-U133A microarray platform. In U133A, about 30% of genes annotated by two different probesets demonstrate a substantial correlation between independently measured expression values. To our surprise, sorting the probesets according to the nature of the discrepancy in their expression levels allowed the classification of the respective genes according to their fundamental functional properties, including observed enrichment by tissue-specific transcripts and alternatively spliced variants. On another hand, an absence of discrepancies in probesets that simultaneously match several different genes allowed us to pinpoint non-expressed pseudogenes and gene groups with highly correlated expression patterns. Nevertheless, in many cases, the nature of discordant expression of two probesets that match the same transcript remains unexplained. It is possible that these probesets report differently regulated sets of transcripts, or, in best case scenario, two different sets of transcripts that represent the same gene. Conclusion The majority of absolute gene expression values collected using Affymetrix microarrays may not be suitable for typical interpretative downstream analysis.
Collapse
|
6
|
Systems-based analyses of brain regions functionally impacted in Parkinson's disease reveals underlying causal mechanisms. PLoS One 2014; 9:e102909. [PMID: 25170892 PMCID: PMC4149353 DOI: 10.1371/journal.pone.0102909] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Accepted: 06/25/2014] [Indexed: 12/20/2022] Open
Abstract
Detailed analysis of disease-affected tissue provides insight into molecular mechanisms contributing to pathogenesis. Substantia nigra, striatum, and cortex are functionally connected with increasing degrees of alpha-synuclein pathology in Parkinson's disease. We undertook functional and causal pathway analysis of gene expression and proteomic alterations in these three regions, and the data revealed pathways that correlated with disease progression. In addition, microarray and RNAseq experiments revealed previously unidentified causal changes related to oligodendrocyte function and synaptic vesicle release, and these and other changes were reflected across all brain regions. Importantly, subsets of these changes were replicated in Parkinson's disease blood; suggesting peripheral tissue may provide important avenues for understanding and measuring disease status and progression. Proteomic assessment revealed alterations in mitochondria and vesicular transport proteins that preceded gene expression changes indicating defects in translation and/or protein turnover. Our combined approach of proteomics, RNAseq and microarray analyses provides a comprehensive view of the molecular changes that accompany functional loss and alpha-synuclein pathology in Parkinson's disease, and may be instrumental to understand, diagnose and follow Parkinson's disease progression.
Collapse
|
7
|
Network methods for describing sample relationships in genomic datasets: application to Huntington's disease. BMC SYSTEMS BIOLOGY 2012; 6:63. [PMID: 22691535 PMCID: PMC3441531 DOI: 10.1186/1752-0509-6-63] [Citation(s) in RCA: 99] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2012] [Accepted: 05/03/2012] [Indexed: 01/08/2023]
Abstract
BACKGROUND Genomic datasets generated by new technologies are increasingly prevalent in disparate areas of biological research. While many studies have sought to characterize relationships among genomic features, commensurate efforts to characterize relationships among biological samples have been less common. Consequently, the full extent of sample variation in genomic studies is often under-appreciated, complicating downstream analytical tasks such as gene co-expression network analysis. RESULTS Here we demonstrate the use of network methods for characterizing sample relationships in microarray data generated from human brain tissue. We describe an approach for identifying outlying samples that does not depend on the choice or use of clustering algorithms. We introduce a battery of measures for quantifying the consistency and integrity of sample relationships, which can be compared across disparate studies, technology platforms, and biological systems. Among these measures, we provide evidence that the correlation between the connectivity and the clustering coefficient (two important network concepts) is a sensitive indicator of homogeneity among biological samples. We also show that this measure, which we refer to as cor(K,C), can distinguish biologically meaningful relationships among subgroups of samples. Specifically, we find that cor(K,C) reveals the profound effect of Huntington's disease on samples from the caudate nucleus relative to other brain regions. Furthermore, we find that this effect is concentrated in specific modules of genes that are naturally co-expressed in human caudate nucleus, highlighting a new strategy for exploring the effects of disease on sets of genes. CONCLUSIONS These results underscore the importance of systematically exploring sample relationships in large genomic datasets before seeking to analyze genomic feature activity. We introduce a standardized platform for this purpose using freely available R software that has been designed to enable iterative and interactive exploration of sample networks.
Collapse
|
8
|
Exploring the transcriptome of ciliated cells using in silico dissection of human tissues. PLoS One 2012; 7:e35618. [PMID: 22558177 PMCID: PMC3338421 DOI: 10.1371/journal.pone.0035618] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2011] [Accepted: 03/21/2012] [Indexed: 01/11/2023] Open
Abstract
Cilia are cell organelles that play important roles in cell motility, sensory and developmental functions and are involved in a range of human diseases, known as ciliopathies. Here, we search for novel human genes related to cilia using a strategy that exploits the previously reported tendency of cell type-specific genes to be coexpressed in the transcriptome of complex tissues. Gene coexpression networks were constructed using the noise-resistant WGCNA algorithm in 12 publicly available microarray datasets from human tissues rich in motile cilia: airways, fallopian tubes and brain. A cilia-related coexpression module was detected in 10 out of the 12 datasets. A consensus analysis of this module's gene composition recapitulated 297 known and predicted 74 novel cilia-related genes. 82% of the novel candidates were supported by tissue-specificity expression data from GEO and/or proteomic data from the Human Protein Atlas. The novel findings included a set of genes (DCDC2, DYX1C1, KIAA0319) related to a neurological disease dyslexia suggesting their potential involvement in ciliary functions. Furthermore, we searched for differences in gene composition of the ciliary module between the tissues. A multidrug-and-toxin extrusion transporter MATE2 (SLC47A2) was found as a brain-specific central gene in the ciliary module. We confirm the localization of MATE2 in cilia by immunofluorescence staining using MDCK cells as a model. While MATE2 has previously gained attention as a pharmacologically relevant transporter, its potential relation to cilia is suggested for the first time. Taken together, our large-scale analysis of gene coexpression networks identifies novel genes related to human cell cilia.
Collapse
|
9
|
Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network. Nucleic Acids Res 2011; 39:3864-78. [PMID: 21247874 PMCID: PMC3089475 DOI: 10.1093/nar/gkq1348] [Citation(s) in RCA: 440] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Although accumulating evidence has provided insight into the various functions of long-non-coding RNAs (lncRNAs), the exact functions of the majority of such transcripts are still unknown. Here, we report the first computational annotation of lncRNA functions based on public microarray expression profiles. A coding–non-coding gene co-expression (CNC) network was constructed from re-annotated Affymetrix Mouse Genome Array data. Probable functions for altogether 340 lncRNAs were predicted based on topological or other network characteristics, such as module sharing, association with network hubs and combinations of co-expression and genomic adjacency. The functions annotated to the lncRNAs mainly involve organ or tissue development (e.g. neuron, eye and muscle development), cellular transport (e.g. neuronal transport and sodium ion, acid or lipid transport) or metabolic processes (e.g. involving macromolecules, phosphocreatine and tyrosine).
Collapse
|
10
|
Probabilistic analysis of probe reliability in differential gene expression studies with short oligonucleotide arrays. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:217-225. [PMID: 21071809 DOI: 10.1109/tcbb.2009.38] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Probe defects are a major source of noise in gene expression studies. While existing approaches detect noisy probes based on external information such as genomic alignments, we introduce and validate a targeted probabilistic method for analyzing probe reliability directly from expression data and independently of the noise source. This provides insights into the various sources of probe-level noise and gives tools to guide probe design.
Collapse
|
11
|
An approach to evaluate the reliability of hybridization-based and sequencing-based gene expression profiling technologies. Biotechnol Prog 2010; 26:1230-9. [DOI: 10.1002/btpr.459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
12
|
Systems genetics analysis of gene-by-environment interactions in human cells. Am J Hum Genet 2010; 86:399-410. [PMID: 20170901 DOI: 10.1016/j.ajhg.2010.02.002] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2009] [Revised: 01/21/2010] [Accepted: 02/02/2010] [Indexed: 01/15/2023] Open
Abstract
Gene by environment (GxE) interactions are clearly important in many human diseases, but they have proven to be difficult to study on a molecular level. We report genetic analysis of thousands of transcript abundance traits in human primary endothelial cell (EC) lines in response to proinflammatory oxidized phospholipids implicated in cardiovascular disease. Of the 59 most regulated transcripts, approximately one-third showed evidence of GxE interactions. The interactions resulted primarily from effects of distal-, trans-acting loci, but a striking example of a local-GxE interaction was also observed for FGD6. Some of the distal interactions were validated by siRNA knockdown experiments, including a locus involved in the regulation of multiple transcripts involved in the ER stress pathway. Our findings add to the understanding of the overall architecture of complex human traits and are consistent with the possibility that GxE interactions are responsible, in part, for the failure of association studies to more fully explain common disease variation.
Collapse
|
13
|
Optimization of the BLASTN substitution matrix for prediction of non-specific DNA microarray hybridization. Nucleic Acids Res 2009; 38:e27. [PMID: 19969549 PMCID: PMC2831327 DOI: 10.1093/nar/gkp1116] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
DNA microarray measurements are susceptible to error caused by non-specific hybridization between a probe and a target (cross-hybridization), or between two targets (bulk-hybridization). Search algorithms such as BLASTN can quickly identify potentially hybridizing sequences. We set out to improve BLASTN accuracy by modifying the substitution matrix and gap penalties. We generated gene expression microarray data for samples in which 1 or 10% of the target mass was an exogenous spike of known sequence. We found that the 10% spike induced 2-fold intensity changes in 3% of the probes, two-third of which were decreases in intensity likely caused by bulk-hybridization. These changes were correlated with similarity between the spike and probe sequences. Interestingly, even very weak similarities tended to induce a change in probe intensity with the 10% spike. Using this data, we optimized the BLASTN substitution matrix to more accurately identify probes susceptible to non-specific hybridization with the spike. Relative to the default substitution matrix, the optimized matrix features a decreased score for A–T base pairs relative to G–C base pairs, resulting in a 5–15% increase in area under the ROC curve for identifying affected probes. This optimized matrix may be useful in the design of microarray probes, and in other BLASTN-based searches for hybridization partners.
Collapse
|
14
|
Abstract
Standard Affymetrix technology evaluates gene expression by measuring the intensity of mRNA hybridization with a panel of the 25-mer oligonucleotide probes, and summarizing the probe signal intensities by a robust average method. However, in many cases, signal intensity of the probe does not correlate with gene expression. This could be due to the hybridization of the probe to a transcript of another gene, mapping of the probe to an intron, alternative splicing, single nucleotide polymorphisms and other reasons. We have developed a database, PLANdbAffy (available at http://affymetrix2.bioinf.fbb.msu.ru), that contains the results of the alignment of probe sequences from five Affymetrix expression microarrays to the human genome. We have determined the probes matching the transcript-coding regions in the correct orientation. For each such probe alignment region, we determined the mRNA and EST sequences that contain the probe sequence. In the textual part of the database interface we summarize the data on the sequences that cover the probe alignment region and SNPs that are located inside it. The graphical part of our database interface is implemented as custom tracks to the UCSC genome browser that allows one to utilize all the data that are offered by UCSC browser.
Collapse
|
15
|
A systems genetics approach implicates USF1, FADS3, and other causal candidate genes for familial combined hyperlipidemia. PLoS Genet 2009; 5:e1000642. [PMID: 19750004 PMCID: PMC2730565 DOI: 10.1371/journal.pgen.1000642] [Citation(s) in RCA: 133] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2009] [Accepted: 08/12/2009] [Indexed: 01/08/2023] Open
Abstract
We hypothesized that a common SNP in the 3' untranslated region of the upstream transcription factor 1 (USF1), rs3737787, may affect lipid traits by influencing gene expression levels, and we investigated this possibility utilizing the Mexican population, which has a high predisposition to dyslipidemia. We first associated rs3737787 genotypes in Mexican Familial Combined Hyperlipidemia (FCHL) case/control fat biopsies, with global expression patterns. To identify sets of co-expressed genes co-regulated by similar factors such as transcription factors, genetic variants, or environmental effects, we utilized weighted gene co-expression network analysis (WGCNA). Through WGCNA in the Mexican FCHL fat biopsies we identified two significant Triglyceride (TG)-associated co-expression modules. One of these modules was also associated with FCHL, the other FCHL component traits, and rs3737787 genotypes. This USF1-regulated FCHL-associated (URFA) module was enriched for genes involved in lipid metabolic processes. Using systems genetics procedures we identified 18 causal candidate genes in the URFA module. The FCHL causal candidate gene fatty acid desaturase 3 (FADS3) was associated with TGs in a recent Caucasian genome-wide significant association study and we replicated this association in Mexican FCHL families. Based on a USF1-regulated FCHL-associated co-expression module and SNP rs3737787, we identify a set of causal candidate genes for FCHL-related traits. We then provide evidence from two independent datasets supporting FADS3 as a causal gene for FCHL and elevated TGs in Mexicans. By integrating a genetic polymorphism with genome-wide gene expression levels, we were able to attribute function to a genetic polymorphism in the USF1 gene. The USF1 gene has previously been associated with a common dyslipidemia, FCHL. FCHL is characterized by elevated levels of total cholesterol, triglycerides, or both. We demonstrate that this genetic polymorphism in USF1 contributes to FCHL disease risk by modulating the expression of a group of genes functionally related to lipid metabolism, and that this modulation is mediated by USF1. One of the genes whose expression is modulated by USF1 is FADS3, which was also implicated in a recent genome-wide association study for lipid traits. We demonstrated that a genetic polymorphism from the FADS3 region, which was associated with triglycerides in a GWAS study of Caucasians, was also associated with triglycerides in Mexican FCHL families. Our analysis provides novel insight into the gene expression profile contributing to FCHL disease risk, and identifies FADS3 as a new gene for FCHL in Mexicans.
Collapse
|
16
|
Evaluation of cDNA microarray data by multiple clones mapping to the same transcript. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2009; 13:493-9. [PMID: 19715395 DOI: 10.1089/omi.2009.0077] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Although novel technologies are rapidly emerging, the cDNA microarray data accumulated is still and will be an important source for bioinformatics and biological studies. Thus, the reliability and applicability of the cDNA microarray data warrants further evaluation. In cDNA microarrays, multiple clones are measured for a transcript, which can be exploited to evaluate the consistency of microarray data. We show that even for pairs of RCs, the average Pearson correlation coefficient of their measurements is not high. However, this low consistency could largely be explained by random noise signals for a fraction of unexpressed genes and/or low signal-to-noise ratios for low abundance transcripts. Encouragingly, a large fraction of inconsistent data will be filtered out in the procedure of selecting differentially expressed genes (DEGs). Therefore, although cDNA microarray data are of low consistency, applications based on DEGs selections could still reach correct biological results, especially at the functional modules level.
Collapse
|
17
|
A longitudinal study of gene expression in healthy individuals. BMC Med Genomics 2009; 2:33. [PMID: 19500411 PMCID: PMC2713969 DOI: 10.1186/1755-8794-2-33] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2008] [Accepted: 06/07/2009] [Indexed: 12/12/2022] Open
Abstract
Background The use of gene expression in venous blood either as a pharmacodynamic marker in clinical trials of drugs or as a diagnostic test requires knowledge of the variability in expression over time in healthy volunteers. Here we defined a normal range of gene expression over 6 months in the blood of four cohorts of healthy men and women who were stratified by age (22–55 years and > 55 years) and gender. Methods Eleven immunomodulatory genes likely to play important roles in inflammatory conditions such as rheumatoid arthritis and infection in addition to four genes typically used as reference genes were examined by quantitative reverse transcription-polymerase chain reaction (qRT-PCR), as well as the full genome as represented by Affymetrix HG U133 Plus 2.0 microarrays. Results Gene expression levels as assessed by qRT-PCR and microarray were relatively stable over time with ~2% of genes as measured by microarray showing intra-subject differences over time periods longer than one month. Fifteen genes varied by gender. The eleven genes examined by qRT-PCR remained within a limited dynamic range for all individuals. Specifically, for the seven most stably expressed genes (CXCL1, HMOX1, IL1RN, IL1B, IL6R, PTGS2, and TNF), 95% of all samples profiled fell within 1.5–2.5 Ct, the equivalent of a 4- to 6-fold dynamic range. Two subjects who experienced severe adverse events of cancer and anemia, had microarray gene expression profiles that were distinct from normal while subjects who experienced an infection had only slightly elevated levels of inflammatory markers. Conclusion This study defines the range and variability of gene expression in healthy men and women over a six-month period. These parameters can be used to estimate the number of subjects needed to observe significant differences from normal gene expression in clinical studies. A set of genes that varied by gender was also identified as were a set of genes with elevated expression in a subject with iron deficiency anemia and another subject being treated for lung cancer.
Collapse
|
18
|
Characterizing multiple exogenous and endogenous small RNA populations in parallel with subfemtomolar sensitivity using a streptavidin gel-shift assay. RNA (NEW YORK, N.Y.) 2009; 15:724-31. [PMID: 19237462 PMCID: PMC2661842 DOI: 10.1261/rna.1235109] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Here we present a simple and inexpensive gel-shift assay for the detection and quantification of small RNAs. The assay is at least 5-10 times more sensitive than a conventional Northern, and is highly scalable. Total RNA is first size purified to enrich the desired size range, phosphatase treated, and then radiolabeled to high specific activity using polynucleotide kinase. The resulting RNA stock is then hybridized to an excess of biotinylated DNA probe oligonucleotide, prior to mixing with streptavidin and loading on a native gel. The amount of supershifted material was proportional to the amount of labeled target RNA in the sample. We applied this method to verify sequencing data originally obtained from a four-point comparison study on the effect of endogenous expression of HC-Pro on Y-satellite/cucumber mosaic virus infection in tobacco plants. The results of the streptavidin gel-shift assay were consistent with the concentrations of small RNA infected plants inferred by our original cloning data, and rapidly provided information about the relative concentration of a number of viral and endogenous small RNAs. Further straightforward improvements to this simple methodology might be expected to improve the methods sensitivity by as much as another 10-fold.
Collapse
|
19
|
Consistency analysis of redundant probe sets on affymetrix three-prime expression arrays and applications to differential mRNA processing. PLoS One 2009; 4:e4229. [PMID: 19165320 PMCID: PMC2621337 DOI: 10.1371/journal.pone.0004229] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2008] [Accepted: 11/11/2008] [Indexed: 11/19/2022] Open
Abstract
Affymetrix three-prime expression microarrays contain thousands of redundant probe sets that interrogate different regions of the same gene. Differential expression analysis methods rarely consider probe redundancy, which can lead to inaccurate inference about overall gene expression or cause investigators to overlook potentially valuable information about differential regulation of variant mRNA products. We investigated the behaviour and consistency of redundant probe sets in a publicly-available data set containing samples from mouse brain amygdala and hippocampus and asked how applying filtering methods to the data affected consistency of results obtained from redundant probe sets. A genome-based filter that screens and groups probe sets according to their overlapping genomic alignments significantly improved redundant probe set consistency. Screening based on qualitative Present-Absent calls from MAS5 also improved consistency. However, even after applying these filters, many redundant probe sets showed significant fold-change differences relative to each other, suggesting differential regulation of alternative transcript production. Visual inspection of these loci using an interactive genome visualization tool (igb.bioviz.org) exposed thirty putative examples of differential regulation of alternative splicing or polyadenylation across brain regions in mouse. This work demonstrates how P/A-call and genome-based filtering can improve consistency among redundant probe sets while at the same time exposing possible differential regulation of RNA processing pathways across sample types.
Collapse
|
20
|
Reproducible chemical-induced changes in gene expression profiles in human hepatoma HepaRG cells under various experimental conditions. Toxicol In Vitro 2008; 23:466-75. [PMID: 19159669 DOI: 10.1016/j.tiv.2008.12.018] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2008] [Revised: 12/11/2008] [Accepted: 12/22/2008] [Indexed: 11/19/2022]
Abstract
The use of in vitro human liver cell models is an attractive approach in toxicogenomic studies designed to analyze gene expression changes induced by a toxic chemical. However, in such studies, reliability, reproducibility and interlaboratory concordance of microarrays, as well as the choice of the most suitable cell model, remain a matter of debate. This work was aimed at evaluating the robustness of microarray technologies and the suitability of the highly differentiated human HepaRG cell line in the investigation of gene expression changes induced by a toxic compound in human liver. The influence of various experimental conditions including cell cultures grown at different test sites, different generations of microarrays, RNA analysis platforms and softwares, was tested on gene expression profiles induced by a 20h treatment with an 8mM concentration of phenobarbital as the toxic compound. As many as 1099 genes (p-value<0.01 and 1.5-fold-change), representing 74% and 30% of the signature genes detected with Agilent 22 and 44K pangenomic microarrays, respectively, were shown to be modulated in common in six independently performed experiments. The most modulated genes included both those known to be regulated by phenobarbital, such as cytochromes P450 and membrane transporters, and those involved in oxidative stress, inflammation and apoptosis, typifying a toxic insult. These data provide strong support for the use of a toxicogenomic approach for the in vitro prediction of chemical toxicity, and for the choice of human HepaRG cells as a promising model system for human hepatotoxicity testing.
Collapse
|
21
|
Differential regulation of Listeria monocytogenes internalin and internalin-like genes by sigmaB and PrfA as revealed by subgenomic microarray analyses. Foodborne Pathog Dis 2008; 5:417-35. [PMID: 18713061 DOI: 10.1089/fpd.2008.0085] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The Listeria monocytogenes genome contains more than 20 genes that encode cell surface-associated internalins. To determine the contributions of the alternative sigma factor sigma(B) and the virulence gene regulator PrfA to internalin gene expression, a subgenomic microarray was designed to contain two probes for each of 24 internalin-like genes identified in the L. monocytogenes 10403S genome. Competitive microarray hybridization was performed on RNA extracted from (i) the 10403S parent strain and an isogenic Delta sigB strain; (ii) 10403S and an isogenic Delta prfA strain; (iii) a (G155S) 10403S derivative that expresses the constitutively active PrfA (PrfA*) and the Delta prfA strain; and (iv) 10403S and an isogenic Delta sigB Delta prfA strain. Sigma(B)- and PrfA-dependent transcription of selected genes was further confirmed by quantitative reverse-transcriptase polymerase chain reaction. For the 24 internalin-like genes examined, (i) both sigma(B) and PrfA contributed to transcription of inlA and inlB, (ii) only sigma(B) contributed to transcription of inlC2, inlD, lmo0331, and lmo0610; (iii) only PrfA contributed to transcription of inlC and lmo2445; and (iv) neither sigma(B) nor PrfA contributed to transcription of the remaining 16 internalin-like genes under the conditions tested.
Collapse
|
22
|
Functional organization of the transcriptome in human brain. Nat Neurosci 2008; 11:1271-82. [PMID: 18849986 DOI: 10.1038/nn.2207] [Citation(s) in RCA: 556] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2008] [Accepted: 09/09/2008] [Indexed: 01/19/2023]
Abstract
The enormous complexity of the human brain ultimately derives from a finite set of molecular instructions encoded in the human genome. These instructions can be directly studied by exploring the organization of the brain's transcriptome through systematic analysis of gene coexpression relationships. We analyzed gene coexpression relationships in microarray data generated from specific human brain regions and identified modules of coexpressed genes that correspond to neurons, oligodendrocytes, astrocytes and microglia. These modules provide an initial description of the transcriptional programs that distinguish the major cell classes of the human brain and indicate that cell type-specific information can be obtained from whole brain tissue without isolating homogeneous populations of cells. Other modules corresponded to additional cell types, organelles, synaptic function, gender differences and the subventricular neurogenic niche. We found that subventricular zone astrocytes, which are thought to function as neural stem cells in adults, have a distinct gene expression pattern relative to protoplasmic astrocytes. Our findings provide a new foundation for neurogenetic inquiries by revealing a robust and previously unrecognized organization to the human brain transcriptome.
Collapse
|
23
|
Abstract
DNA microarrays serve to monitor a wide range of molecular events, but emerging applications like measurements of weakly expressed genes or of proteins and their interaction patterns will require enhanced performance to improve specificity of detection and dynamic range. To further extend the utility of DNA microarray-based approaches we present a high-performance tag microarray procedure that enables probe-based analysis of as little as 100 target cDNA molecules, and with a linear dynamic range close to 10(5). Furthermore, the protocol radically decreases the risk of cross-hybridization on microarrays compared to current approaches, and it also allows for quantification by single-molecule analysis and real-time on-chip monitoring of rolling-circle amplification. We provide proof of concept for microarray-based measurement of both mRNA molecules and of proteins, converted to tag DNA sequences by padlock and proximity probe ligation, respectively.
Collapse
|
24
|
Specificity of DNA microarray hybridization: characterization, effectors and approaches for data correction. Nucleic Acids Res 2008; 36:2395-405. [PMID: 18299281 PMCID: PMC2367720 DOI: 10.1093/nar/gkn087] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Microarray-hybridization specificity is one of the main effectors of microarray result quality. In the present review, we suggest a definition for specificity that spans four hybridization levels, from the single probe to the microarray platform. For increased hybridization specificity, it is important to quantify the extent of the specificity at each of these levels, and correct the data accordingly. We outline possible effects of low hybridization specificity on the obtained results and list possible effectors of hybridization specificity. In addition, we discuss several studies in which theoretical approaches, empirical means or data filtration were used to identify specificity effectors, and increase the specificity of the hybridization results. However, these various approaches may not yet provide an ultimate solution; rather, further tool development is needed to enhance microarray-hybridization specificity.
Collapse
|
25
|
A highly sensitive and specific system for large-scale gene expression profiling. BMC Genomics 2008; 9:9. [PMID: 18186939 PMCID: PMC2267712 DOI: 10.1186/1471-2164-9-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2007] [Accepted: 01/10/2008] [Indexed: 12/02/2022] Open
Abstract
Background Rapid progress in the field of gene expression-based molecular network integration has generated strong demand on enhancing the sensitivity and data accuracy of experimental systems. To meet the need, a high-throughput gene profiling system of high specificity and sensitivity has been developed. Results By using specially designed primers, the new system amplifies sequences in neighboring exons separated by big introns so that mRNA sequences may be effectively discriminated from other highly related sequences including their genes, unprocessed transcripts, pseudogenes and pseudogene transcripts. Probes used for microarray detection consist of sequences in the two neighboring exons amplified by the primers. In conjunction with a newly developed high-throughput multiplex amplification system and highly simplified experimental procedures, the system can be used to analyze >1,000 mRNA species in a single assay. It may also be used for gene expression profiling of very few (n = 100) or single cells. Highly reproducible results were obtained from duplicate samples with the same number of cells, and from those with a small number (100) and a large number (10,000) of cells. The specificity of the system was demonstrated by comparing results from a breast cancer cell line, MCF-7, and an ovarian cancer cell line, NCI/ADR-RES, and by using genomic DNA as starting material. Conclusion Our approach may greatly facilitate the analysis of combinatorial expression of known genes in many important applications, especially when the amount of RNA is limited.
Collapse
|
26
|
In situ analysis of cross-hybridisation on microarrays and the inference of expression correlation. BMC Bioinformatics 2007; 8:461. [PMID: 18039370 PMCID: PMC2213692 DOI: 10.1186/1471-2105-8-461] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2007] [Accepted: 11/26/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Microarray co-expression signatures are an important tool for studying gene function and relations between genes. In addition to genuine biological co-expression, correlated signals can result from technical deficiencies like hybridization of reporters with off-target transcripts. An approach that is able to distinguish these factors permits the detection of more biologically relevant co-expression signatures. RESULTS We demonstrate a positive relation between off-target reporter alignment strength and expression correlation in data from oligonucleotide genechips. Furthermore, we describe a method that allows the identification, from their expression data, of individual probe sets affected by off-target hybridization. CONCLUSION The effects of off-target hybridization on expression correlation coefficients can be substantial, and can be alleviated by more accurate mapping between microarray reporters and the target transcriptome. We recommend attention to the mapping for any microarray analysis of gene expression patterns.
Collapse
|
27
|
Novel definition files for human GeneChips based on GeneAnnot. BMC Bioinformatics 2007; 8:446. [PMID: 18005434 PMCID: PMC2216044 DOI: 10.1186/1471-2105-8-446] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2007] [Accepted: 11/15/2007] [Indexed: 10/29/2022] Open
Abstract
BACKGROUND Improvements in genome sequence annotation revealed discrepancies in the original probeset/gene assignment in Affymetrix microarray and the existence of differences between annotations and effective alignments of probes and transcription products. In the current generation of Affymetrix human GeneChips, most probesets include probes matching transcripts from more than one gene and probes which do not match any transcribed sequence. RESULTS We developed a novel set of custom Chip Definition Files (CDF) and the corresponding Bioconductor libraries for Affymetrix human GeneChips, based on the information contained in the GeneAnnot database. GeneAnnot-based CDFs are composed of unique custom-probesets, including only probes matching a single gene. CONCLUSION GeneAnnot-based custom CDFs solve the problem of a reliable reconstruction of expression levels and eliminate the existence of more than one probeset per gene, which often leads to discordant expression signals for the same transcript when gene differential expression is the focus of the analysis. GeneAnnot CDFs are freely distributed and fully compliant with Affymetrix standards and all available software for gene expression analysis. The CDF libraries are available from http://www.xlab.unimo.it/GA_CDF, along with supplementary information (CDF libraries, installation guidelines and R code, CDF statistics, and analysis results).
Collapse
|
28
|
Construction of an open-access database that integrates cross-reference information from the transcriptome and proteome of immune cells. ACTA ACUST UNITED AC 2007; 23:2934-41. [PMID: 17893089 DOI: 10.1093/bioinformatics/btm430] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Although a huge amount of mammalian genomic data does become publicly available, there are still hurdles for biologists to overcome before such data can be fully exploited. One of the challenges for gaining biological insight from genomic data has been the inability to cross-reference transcriptomic and proteomic data using a single informational platform. To address this, we constructed an open-access database that enabled us to cross-reference transcriptomic and proteomic data obtained from immune cells. RESULTS The database, named RefDIC (Reference genomics Database of Immune Cells), currently contains: (i) quantitative mRNA profiles for human and mouse immune cells/tissues obtained using Affymetrix GeneChip technology; (ii) quantitative protein profiles for mouse immune cells obtained using two-dimensional gel electrophoresis (2-DE) followed by image analysis and mass spectrometry and (iii) various visualization tools to cross-reference the mRNA and protein profiles of immune cells. RefDIC is the first open-access database for immunogenomics and serves as an important information-sharing platform, enabling a focused genomic approach in immunology. AVAILABILITY All raw data and information can be accessed from http://refdic.rcai.riken.jp/. The microarray data is also available at http://cibex.nig.ac.jp/ under CIBEX accession no. CBX19, and http://www.ebi.ac.uk/pride/ under PRIDE accession numbers 2354-2378 and 2414.
Collapse
|
29
|
AffyMAPSDetector: a software tool to characterize Affymetrix GeneChip expression arrays with respect to SNPs. BMC Bioinformatics 2007; 8:276. [PMID: 17663786 PMCID: PMC1959249 DOI: 10.1186/1471-2105-8-276] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2007] [Accepted: 07/30/2007] [Indexed: 12/02/2022] Open
Abstract
Background Affymetrix gene expression arrays incorporate paired perfect match (PM) and mismatch (MM) probes to distinguish true signals from those arising from cross-hybridization events. A MM signal often shows greater intensity than a PM signal; we propose that one underlying cause is the presence of allelic variants arising from single nucleotide polymorphisms (SNPs). To annotate and characterize SNP contributions to anomalous probe binding behavior we have developed a software tool called AffyMAPSDetector. Results AffyMAPSDetector can be used to describe any Affymetrix expression GeneChip™ with respect to SNPs. When AffyMAPSDetector was run on GeneChip™ HG-U95Av2 against dbSNP-build-123, we found 7286 probes (belonging to 2,582 probesets) containing SNPs, out of which 325 probes contained at least one SNP at position 13. Against dbSNP-build-126, 8758 probes (belonging to 3,002 probesets) contained SNPs, of which 409 probes contained at least one SNP at position 13. Therefore, depending on the expressed allele, the MM probe can sometimes be the transcript complement. This information was used to characterize probe measurements reported in a published, well-replicated lung adenocarcinoma study. The total intensity distributions showed that the SNP-containing probes had a larger negative mean intensity difference (PM-MM) and greater range of the difference than did probes without SNPs. In the sample replicates, SNP-containing probes with reproducible intensity ratios were identified, allowing selection of SNP probesets that yielded unique sample signatures. At the gene expression level, use of the (MM-PM) value for SNP-containing probes resulted in different Presence/Absence calls for some genes. Such a change in status of the genes has the clear potential for influencing downstream clustering and classification results. Conclusion Output from this tool characterizes SNP-containing probes on GeneChip™ microarrays, thus improving our understanding of factors contributing to expression measurements. The pattern of SNP binding examined so far indicates distinct behavior of the SNP-containing probes and has the potential to help us identify new SNPs. Knowing which probes contain SNPs provides flexibility in determining whether to include or exclude them from gene-expression intensity calculations; selected sets of SNP-containing probes produce sample-unique signatures. AffyMAPSDetector information is available at
Collapse
|
30
|
Transcript-level annotation of Affymetrix probesets improves the interpretation of gene expression data. BMC Bioinformatics 2007; 8:194. [PMID: 17559689 PMCID: PMC1913542 DOI: 10.1186/1471-2105-8-194] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2006] [Accepted: 06/11/2007] [Indexed: 11/22/2022] Open
Abstract
Background The wide use of Affymetrix microarray in broadened fields of biological research has made the probeset annotation an important issue. Standard Affymetrix probeset annotation is at gene level, i.e. a probeset is precisely linked to a gene, and probeset intensity is interpreted as gene expression. The increased knowledge that one gene may have multiple transcript variants clearly brings up the necessity of updating this gene-level annotation to a refined transcript-level. Results Through performing rigorous alignments of the Affymetrix probe sequences against a comprehensive pool of currently available transcript sequences, and further linking the probesets to the International Protein Index, we generated transcript-level or protein-level annotation tables for two popular Affymetrix expression arrays, Mouse Genome 430A 2.0 Array and Human Genome U133A Array. Application of our new annotations in re-examining existing expression data sets shows increased expression consistency among synonymous probesets and strengthened expression correlation between interacting proteins. Conclusion By refining the standard Affymetrix annotation of microarray probesets from the gene level to the transcript level and protein level, one can achieve a more reliable interpretation of their experimental data, which may lead to discovery of more profound regulatory mechanism.
Collapse
|
31
|
A verification protocol for the probe sequences of Affymetrix genome arrays reveals high probe accuracy for studies in mouse, human and rat. BMC Bioinformatics 2007; 8:132. [PMID: 17448222 PMCID: PMC1865557 DOI: 10.1186/1471-2105-8-132] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2006] [Accepted: 04/20/2007] [Indexed: 01/09/2023] Open
Abstract
Background The Affymetrix GeneChip technology uses multiple probes per gene to measure its expression level. Individual probe signals can vary widely, which hampers proper interpretation. This variation can be caused by probes that do not properly match their target gene or that match multiple genes. To determine the accuracy of Affymetrix arrays, we developed an extensive verification protocol, for mouse arrays incorporating the NCBI RefSeq, NCBI UniGene Unique, NIA Mouse Gene Index, and UCSC mouse genome databases. Results Applying this protocol to Affymetrix Mouse Genome arrays (the earlier U74Av2 and the newer 430 2.0 array), the number of sequence-verified probes with perfect matches was no less than 85% and 95%, respectively; and for 74% and 85% of the probe sets all probes were sequence verified. The latter percentages increased to 80% and 94% after discarding one or two unverifiable probes per probe set, and even further to 84% and 97% when, in addition, allowing for one or two mismatches between probe and target gene. Similar results were obtained for other mouse arrays, as well as for human and rat arrays. Based on these data, refined chip definition files for all arrays are provided online. Researchers can choose the version appropriate for their study to (re)analyze expression data. Conclusion The accuracy of Affymetrix probe sequences is higher than previously reported, particularly on newer arrays. Yet, refined probe set definitions have clear effects on the detection of differentially expressed genes. We demonstrate that the interpretation of the results of Affymetrix arrays is improved when the new chip definition files are used.
Collapse
|
32
|
A survey of methods for classification of gene expression data using evolutionary algorithms. Expert Rev Mol Diagn 2007; 6:101-10. [PMID: 16359271 DOI: 10.1586/14737159.6.1.101] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The rapid increase in the quantity of available biologic data over the last decade, brought about by the introduction of massively parallel methods for gene expression measurements, has highlighted the need for more efficient computational techniques for analysis. This paper reviews the use of evolutionary algorithms (EAs) in connection with classification based on gene expression data matrices. Brief introductions to data classification methods and EAs are given, followed by a survey of studies dealing with the application of evolutionary algorithms to various (cancer related) data sets. The general conclusion, based on the published results surveyed here, is that EAs may constitute an efficient method for optimal gene selection, and can also help in reducing the size (number of features used) of classifiers. In many cases, the classification accuracy obtained using EAs, often in conjunction with other methods, represents a significant improvement over results obtained without the use of EAs. However, long-term, independent clinical follow-up studies will be essential to validate prognostic markers identified by the use of EA-based methods.
Collapse
|
33
|
Cross-species microarray hybridizations: a developing tool for studying species diversity. Trends Genet 2007; 23:200-7. [PMID: 17313995 DOI: 10.1016/j.tig.2007.02.003] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2006] [Revised: 12/14/2006] [Accepted: 02/06/2007] [Indexed: 11/29/2022]
Abstract
The use of cross-species hybridization (CSH) to DNA microarrays, in which the target RNA and microarray probe are from different species, has increased in the past few years. CSH is used in comparative, evolutionary and ecological studies of closely related species, and for gene-expression profiling of many species that lack a representative microarray platform. However, unlike species-specific hybridization, CSH is still considered a non-standard use of microarrays. Here, we present the recent developments in the field of CSH for cDNA and oligomer microarray platforms. We discuss issues that influence the quality of CSH results, including platform choice, experiment design and data analysis, and suggest strategies that can lead to improvement of CSH studies to investigate species diversity.
Collapse
|
34
|
Transcript-based redefinition of grouped oligonucleotide probe sets using AceView: high-resolution annotation for microarrays. BMC Bioinformatics 2007; 8:108. [PMID: 17394657 PMCID: PMC1853115 DOI: 10.1186/1471-2105-8-108] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2006] [Accepted: 03/29/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Extracting biological information from high-density Affymetrix arrays is a multi-step process that begins with the accurate annotation of microarray probes. Shortfalls in the original Affymetrix probe annotation have been described; however, few studies have provided rigorous solutions for routine data analysis. RESULTS Using AceView, a comprehensive human transcript database, we have reannotated the probes by matching them to RNA transcripts instead of genes. Based on this transcript-level annotation, a new probe set definition was created in which every probe in a probe set maps to a common set of AceView gene transcripts. In addition, using artificial data sets we identified that a minimal probe set size of 4 is necessary for reliable statistical summarization. We further demonstrate that applying the new probe set definition can detect specific transcript variants contributing to differential expression and it also improves cross-platform concordance. CONCLUSION We conclude that our transcript-level reannotation and redefinition of probe sets complement the original Affymetrix design. Redefinitions introduce probe sets whose sizes may not support reliable statistical summarization; therefore, we advocate using our transcript-level mapping redefinition in a secondary analysis step rather than as a replacement. Knowing which specific transcripts are differentially expressed is important to properly design probe/primer pairs for validation purposes. For convenience, we have created custom chip-description-files (CDFs) and annotation files for our new probe set definitions that are compatible with Bioconductor, Affymetrix Expression Console or third party software.
Collapse
|
35
|
Characteristics of oligonucleotide tiling arrays measured by hybridizing full-length cDNA clones: causes of signal variation and false positive signals. Genomics 2007; 89:541-51. [PMID: 17292583 DOI: 10.1016/j.ygeno.2006.12.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2006] [Revised: 11/14/2006] [Accepted: 12/29/2006] [Indexed: 10/23/2022]
Abstract
An assessment of the hybridization characteristics of oligonucleotide tiling arrays was carried out using 162 full-length sequenced cDNA clones in spike-in experiments. The properties of array probes that influence signal intensity were investigated, and their capability in the detection of the cDNA exons was evaluated. The signal intensities detected in exonic and nonexonic genomic regions were examined by focusing on the features of probe sequences that raise or lower the level of intensity and on the causes of false positive signals found in nonexonic regions. The effectiveness of measures used in published protocols to improve the separation between signal and background intensity distributions, including the use of replicates and threshold parameterization of signal intensity, was assessed. Sensitivity and specificity in the detection of exons were measured using various sets of threshold parameters, and the effects of each parameter on the detection efficiency and the rate of false positives were evaluated. It was also demonstrated that hybridization of full-length cDNA clones is an excellent method to investigate the characteristics of oligonucleotide tiling arrays.
Collapse
|
36
|
Interpretation of multiple probe sets mapping to the same gene in Affymetrix GeneChips. BMC Bioinformatics 2007; 8:13. [PMID: 17224057 PMCID: PMC1784106 DOI: 10.1186/1471-2105-8-13] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2006] [Accepted: 01/15/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Affymetrix GeneChip technology enables the parallel observations of tens of thousands of genes. It is important that the probe set annotations are reliable so that biological inferences can be made about genes which undergo differential expression. Probe sets representing the same gene might be expected to show similar fold changes/z-scores, however this is in fact not the case. RESULTS We have made a case study of the mouse Surf4, chosen because it is a gene that was reported to be represented by the same eight probe sets on the MOE430A array by both Affymetrix and Bioconductor in early 2004. Only five of the probe sets actually detect Surf4 transcripts. Two of the probe sets detect splice variants of Surf2. We have also studied the expression changes of the eight probe sets in a public-domain microarray experiment. The transcripts for Surf4 are correlated in time, and similarly the transcripts for Surf2 are also correlated in time. However, the transcripts for Surf4 and Surf2 are not correlated. This proof of principle shows that observations of expression can be used to confirm, or otherwise, annotation discrepancies. We have also investigated groups of probe sets on the RAE230A array that are assigned to the same LocusID, but which show large variances in differential expression in any one of three different experiments on rat. The probe set groups with high variances are found to represent cases of alternative splicing, use of alternative poly(A) signals, or incorrect annotations. CONCLUSION Our results indicate that some probe sets should not be considered as unique measures of transcription, because the individual probes map to more than one transcript dependent upon the biological condition. Our results highlight the need for care when assessing whether groups of probe sets all measure the same transcript.
Collapse
|
37
|
|
38
|
The promise and perils of microarray analysis. Am J Obstet Gynecol 2006; 195:389-93. [PMID: 16643826 DOI: 10.1016/j.ajog.2006.02.035] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2005] [Revised: 02/15/2006] [Accepted: 02/26/2006] [Indexed: 01/17/2023]
Abstract
Microarray analysis has provided a novel means of identifying clues into the mechanisms of disease development. As a methodology, microarray analysis holds the promise for genome-wide screening in which 2 tissues (diseased and normal) are compared, and molecular pathways that defined the phenotype of the disease could be precisely defined. Alternatively, microarray experiments can be used to differentially compare pathologically similar diseased tissues to predict response to chemotherapy and risk of recurrence. However, the clinician should be aware that various sources of error can influence microarray analysis results. Sources of error can be minimized but not eliminated, explaining why meticulously conducted experiments in different laboratories or using different platforms result in different lists of genes. Confirmation and validation of genome-wide microarray results using ancillary methods remains a critical step. With proper confirmatory studies and cautious interpretation, microarray analysis represents a powerful tool for molecular discovery.
Collapse
|
39
|
An analysis of intra array repeats: the good, the bad and the non informative. BMC Genomics 2006; 7:136. [PMID: 16753054 PMCID: PMC1501018 DOI: 10.1186/1471-2164-7-136] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2006] [Accepted: 06/05/2006] [Indexed: 11/10/2022] Open
Abstract
Background On most common microarray platforms many genes are represented by multiple probes. Although this is quite common no one has systematically explored the concordance between probes mapped to the same gene. Results Here we present an analysis of all the cases of multiple probe sets measuring the same gene on the Affymetrix U133a GeneChip and found that although in the majority of cases both measurements tend to agree there are a significant number of cases in which the two measurements differ from each other. In these cases the measurements can not be simply averaged but rather should be handled individually. Conclusion Our analysis allows us to provide a comprehensive list of the correlation between all pairs of probe sets that are mapped to the same gene and thus allows microarray users to sort out the cases that deserve further analysis. Comparison between the set of highly correlated pairs and the set of pairs that tend to differ from each other reveals potential factors that may affect it.
Collapse
|
40
|
Abstract
The Affymetrix GeneChip is a popular microarray platform for genome-wide expression profiling and has been widely used in functional genomics especially in the classification of cancers. Due to the updating of genome data, much of the genome information with which the chips were designed is out-of-date and it has been reported that many of the genes/transcripts on the chips differ from their original definition when mapping the probes to the new genome information. Dai et al. have reported that the updated definition can cause as much as 30-50% discrepancy in the genes selected as differentially expressed on a heart tissue expression profiling dataset. Understanding the nature of this difference is therefore very important for the utilization of the data. In this work, with a large cancer dataset as an example, we compared two major definitions and investigated their effects on classification, clustering, discovery of differentially expressed genes and gene-set-based analysis. Results show that the two definitions agree well on clustering and classification results but genes and gene sets discovered as differentially expressed or enriched can be very different. Discoveries based on the Affymetrix definition can cover most of those based on the new definition, but tend to have more false positives.
Collapse
|
41
|
Reliability and reproducibility issues in DNA microarray measurements. Trends Genet 2005; 22:101-9. [PMID: 16380191 PMCID: PMC2386979 DOI: 10.1016/j.tig.2005.12.005] [Citation(s) in RCA: 402] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2005] [Revised: 11/16/2005] [Accepted: 12/08/2005] [Indexed: 11/16/2022]
Abstract
DNA microarrays enable researchers to monitor the expression of thousands of genes simultaneously. However, the current technology has several limitations. Here we discuss problems related to the sensitivity, accuracy, specificity and reproducibility of microarray results. The existing data suggest that for relatively abundant transcripts the existence and direction (but not the magnitude) of expression changes can be reliably detected. However, accurate measurements of absolute expression levels and the reliable detection of low abundance genes are difficult to achieve. The main problems seem to be the sub-optimal design or choice of probes and some incorrect probe annotations. Well-designed data-analysis approaches can rectify some of these problems.
Collapse
|
42
|
Targeted disruption of glycerol kinase gene in mice: expression analysis in liver shows alterations in network partners related to glycerol kinase activity. Hum Mol Genet 2005; 15:405-15. [PMID: 16368706 DOI: 10.1093/hmg/ddi457] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Glycerol kinase deficiency (GKD) is an X-linked inborn error of metabolism with metabolic and neurological crises. Liver shows the highest level of glycerol kinase (GK) activity in humans and mice. Absence of genotype-phenotype correlations in patients with GKD indicates the involvement of modifier genes, including other network partners. To understand the molecular pathogenesis of GKD, we performed microarray analysis on liver mRNA from neonatal glycerol kinase (Gyk) knockout (KO) and wild-type (WT) mice. Unsupervised learning revealed that the overall gene expression profile of the KO mice was different from that of WT. Real-time PCR confirmed the differences for selected genes. Functional gene enrichment analysis was used to find 56 increased and 37 decreased gene functional categories. PathwayAssist analysis identified changes in gene expression levels of genes involved in organic acid metabolism indicating that GK was part of the same metabolic network which correlates well with the patients with GKD having metabolic acidemia during their episodic crises. Network component analysis (NCA) showed that transcription factors sterol regulatory element-binding protein (SREBP)-1c, carbohydrate response element-binding protein (ChREBP), hepatocyte nuclear factor-4 alpha (HNF-4alpha) and peroxisome proliferative-activated receptor-alpha (PPARalpha) had increased activity in the Gyk KO mice compared with WT mice, whereas SREBP-2 was less active in the Gyk KO mice. These studies show that Gyk deletion causes alterations in expression of genes in several regulatory networks and is the first time NCA has been used to expand on microarray data from a mouse KO model of a human disease.
Collapse
|
43
|
Abstract
There is an urgent need for bioinformatic methods that allow integrative analysis of multiple microarray data sets. While previous studies have mainly concentrated on reproducibility of gene expression levels within or between different platforms, we propose a novel meta-analytic method that takes into account the vast amount of available probe-level information to combine the expression changes across different studies. We first show that the comparability of relative expression changes and the consistency of differentially expressed genes between different Affymetrix array generations can be considerably improved by determining the expression changes at the probe-level and by considering the latest information on probe-level sequence matching instead of the probe annotations provided by the manufacturer. With the improved probe-level expression change estimates, data from different generations of Affymetrix arrays can be combined more effectively. This will allow for the full exploitation of existing results when designing and analyzing new experiments.
Collapse
|
44
|
Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res 2005; 33:e175. [PMID: 16284200 PMCID: PMC1283542 DOI: 10.1093/nar/gni179] [Citation(s) in RCA: 1417] [Impact Index Per Article: 74.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Genome-wide expression profiling is a powerful tool for implicating novel gene ensembles in cellular mechanisms of health and disease. The most popular platform for genome-wide expression profiling is the Affymetrix GeneChip. However, its selection of probes relied on earlier genome and transcriptome annotation which is significantly different from current knowledge. The resultant informatics problems have a profound impact on analysis and interpretation the data. Here, we address these critical issues and offer a solution. We identified several classes of problems at the individual probe level in the existing annotation, under the assumption that current genome and transcriptome databases are more accurate than those used for GeneChip design. We then reorganized probes on more than a dozen popular GeneChips into gene-, transcript- and exon-specific probe sets in light of up-to-date genome, cDNA/EST clustering and single nucleotide polymorphism information. Comparing analysis results between the original and the redefined probe sets reveals approximately 30-50% discrepancy in the genes previously identified as differentially expressed, regardless of analysis method. Our results demonstrate that the original Affymetrix probe set definitions are inaccurate, and many conclusions derived from past GeneChip analyses may be significantly flawed. It will be beneficial to re-analyze existing GeneChip data with updated probe set definitions.
Collapse
|
45
|
Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping. Trends Genet 2005; 21:466-75. [PMID: 15979196 PMCID: PMC1855044 DOI: 10.1016/j.tig.2005.06.007] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2005] [Revised: 05/17/2005] [Accepted: 06/08/2005] [Indexed: 10/25/2022]
Abstract
Traditional microarrays use probes complementary to known genes to quantitate the differential gene expression between two or more conditions. Genomic tiling microarray experiments differ in that probes that span a genomic region at regular intervals are used to detect the presence or absence of transcription. This difference means the same sets of biases and the methods for addressing them are unlikely to be relevant to both types of experiment. We introduce the informatics challenges arising in the analysis of tiling microarray experiments as open problems to the scientific community and present initial approaches for the analysis of this nascent technology.
Collapse
|
46
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2005. [PMCID: PMC2447491 DOI: 10.1002/cfg.425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|