Mutch DM, Berger A, Mansourian R, Rytz A, Roberts MA. Microarray data analysis: a practical approach for selecting differentially expressed genes.
Genome Biol 2002;
2:PREPRINT0009. [PMID:
11790248 DOI:
10.1186/gb-2001-2-12-preprint0009]
[Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2001] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND
The biomedical community is rapidly developing new methods of data analysis for microarray experiments, with the goal of establishing new standards to objectively process the massive datasets produced from functional genomic experiments. Each microarray experiment measures thousands of genes simultaneously producing an unprecedented amount of biological information across increasingly numerous experiments; however, in general, only a very small percentage of the genes present on any given array are identified as differentially regulated. The challenge then is to process this information objectively and efficiently in order to obtain knowledge of the biological system under study and by which to compare information gained across multiple experiments. In this context, systematic and objective mathematical approaches, which are simple to apply across a large number of experimental designs, become fundamental to correctly handle the mass of data and to understand the true complexity of the biological systems under study.
RESULTS
The present report develops a method of extracting differentially expressed genes across any number of experimental samples by first evaluating the maximum fold change (FC) across all experimental parameters and across the entire range of absolute expression levels. The model developed works by first evaluating the FC across the entire range of absolute expression levels in any number of experimental conditions. The selection of those genes within the top X% of highest FCs observed within absolute expression bins was evaluated both with and without the use of replicates. Lastly, the FC model was validated by both real time polymerase chain reaction (RT-PCR) and variance data. Semi-quantitative RT-PCR analysis demonstrated 73% concordance with the microarray data from Mu11K Affymetrix GeneChips. Furthermore, 94.1% of those genes selected by the 5% FC model were found to lie above measurement variability using a SDwithin confidence level of 99.9%.
CONCLUSION
As evidenced by the high rate of validation, the FC model has the potential to minimize the number of required replicates in expensive microarray experiments by extracting information on gene expression patterns (e.g. characterizing biological and/or measurement variance) within an experiment. The simplicity of the overall process allows the analyst to easily select model limits which best describe the data. The genes selected by this process can be compared between experiments and are shown to objectively extract information which is biologically & statistically significant.
Collapse