26
|
Zarbl H, Gallo MA, Glick J, Yeung KY, Vouros P. The vanishing zero revisited: thresholds in the age of genomics. Chem Biol Interact 2010; 184:273-8. [PMID: 20109442 DOI: 10.1016/j.cbi.2010.01.031] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2009] [Revised: 01/11/2010] [Accepted: 01/18/2010] [Indexed: 10/19/2022]
Abstract
The concept of the vanishing zero, which was first discussed 50 years ago in relation to pesticide residues in foods and food crops, focused on the unintended regulatory consequences created by ever-increasing sensitivity and selectivity of analytical methods, in conjunction with the ambiguous wording of legislation meant to protect public health. In the interim, the ability to detect xenobiotics in most substrates has increased from tens of parts per million to parts per trillion or less, challenging our ability to interpret the biological significance of exposures at the lowest detectable levels. As a result the focus of risk assessment, especially for potential carcinogens, has shifted from defining an acceptable level, to extrapolating from the best available analytical results. Analysis of gene expression profiles in exposed target cells using genomic technologies can identify biological pathways induced or repressed by the exposure as a function of dose and time. This treatise explores how toxicogenomic responses at low doses may inform risk assessment and risk management by defining thresholds for cellular responses linked to modes or mechanisms of toxicity at the molecular level.
Collapse
|
27
|
Abstract
In this chapter, we discuss a number of approaches to network inference from large-scale functional genomics data. Our goal is to describe current methods that can be used to infer predictive networks. At present, one of the most effective methods to produce networks with predictive value is the Bayesian network approach. This approach was initially instantiated by Friedman et al. and further refined by Eric Schadt and his research group. The Bayesian network approach has the virtue of identifying predictive relationships between genes from a combination of expression and eQTL data. However, the approach does not provide a mechanistic bases for predictive relationships and is ultimately hampered by an inability to model feedback. A challenge for the future is to produce networks that are both predictive and provide mechanistic understanding. To do so, the methods described in several chapters of this book will need to be integrated. Other chapters of this book describe a number of methods to identify or predict network components such as physical interactions. At the end of this chapter, we speculate that some of the approaches from other chapters could be integrated and used to "annotate" the edges of the Bayesian networks. This would take the Bayesian networks one step closer to providing mechanistic "explanations" for the relationships between the network nodes.
Collapse
|
28
|
Annest A, Bumgarner RE, Raftery AE, Yeung KY. Iterative Bayesian Model Averaging: a method for the application of survival analysis to high-dimensional microarray data. BMC Bioinformatics 2009; 10:72. [PMID: 19245714 PMCID: PMC2657791 DOI: 10.1186/1471-2105-10-72] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2008] [Accepted: 02/26/2009] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes. RESULTS We applied the iterative BMA algorithm to two cancer datasets: breast cancer and diffuse large B-cell lymphoma (DLBCL) data. On the breast cancer data, the algorithm selected a total of 15 predictor genes across 84 contending models from the training data. The maximum likelihood estimates of the selected genes and the posterior probabilities of the selected models from the training data were used to divide patients in the test (or validation) dataset into high- and low-risk categories. Using the genes and models determined from the training data, we assigned patients from the test data into highly distinct risk groups (as indicated by a p-value of 7.26e-05 from the log-rank test). Moreover, we achieved comparable results using only the 5 top selected genes with 100% posterior probabilities. On the DLBCL data, our iterative BMA procedure selected a total of 25 genes across 3 contending models from the training data. Once again, we assigned the patients in the validation set to significantly distinct risk groups (p-value = 0.00139). CONCLUSION The strength of the iterative BMA algorithm for survival analysis lies in its ability to account for model uncertainty. The results from this study demonstrate that our procedure selects a small number of genes while eclipsing other methods in predictive performance, making it a highly accurate and cost-effective prognostic tool in the clinical setting.
Collapse
|
29
|
Chu VT, Gottardo R, Raftery AE, Bumgarner RE, Yeung KY. MeV+R: using MeV as a graphical user interface for Bioconductor applications in microarray analysis. Genome Biol 2008; 9:R118. [PMID: 18652698 PMCID: PMC2530872 DOI: 10.1186/gb-2008-9-7-r118] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2008] [Revised: 06/01/2008] [Accepted: 07/24/2008] [Indexed: 11/10/2022] Open
Abstract
We present MeV+R, an integration of the JAVA MultiExperiment Viewer program with Bioconductor packages. This integration of MultiExperiment Viewer and R is easily extensible to other R packages and provides users with point and click access to traditionally command line driven tools written in R. We demonstrate the ability to use MultiExperiment Viewer as a graphical user interface for Bioconductor applications in microarray data analysis by incorporating three Bioconductor packages, RAMA, BRIDGE and iterativeBMA.
Collapse
|
30
|
Gottardo R, Raftery AE, Yeung KY, Bumgarner RE. Bayesian robust inference for differential gene expression in microarrays with multiple samples. Biometrics 2006; 62:10-8. [PMID: 16542223 DOI: 10.1111/j.1541-0420.2005.00397.x] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We consider the problem of identifying differentially expressed genes under different conditions using gene expression microarrays. Because of the many steps involved in the experimental process, from hybridization to image analysis, cDNA microarray data often contain outliers. For example, an outlying data value could occur because of scratches or dust on the surface, imperfections in the glass, or imperfections in the array production. We develop a robust Bayesian hierarchical model for testing for differential expression. Errors are modeled explicitly using a t-distribution, which accounts for outliers. The model includes an exchangeable prior for the variances, which allows different variances for the genes but still shrinks extreme empirical variances. Our model can be used for testing for differentially expressed genes among multiple samples, and it can distinguish between the different possible patterns of differential expression when there are three or more samples. Parameter estimation is carried out using a novel version of Markov chain Monte Carlo that is appropriate when the model puts mass on subspaces of the full parameter space. The method is illustrated using two publicly available gene expression data sets. We compare our method to six other baseline and commonly used techniques, namely the t-test, the Bonferroni-adjusted t-test, significance analysis of microarrays (SAM), Efron's empirical Bayes, and EBarrays in both its lognormal-normal and gamma-gamma forms. In an experiment with HIV data, our method performed better than these alternatives, on the basis of between-replicate agreement and disagreement.
Collapse
|
31
|
Liu X, Sivaganesan S, Yeung KY, Guo J, Bumgarner RE, Medvedovic M. Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset. Bioinformatics 2006; 22:1737-44. [PMID: 16709591 PMCID: PMC1617036 DOI: 10.1093/bioinformatics/btl184] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identifying groups of co-regulated genes by monitoring their expression over various experimental conditions is complicated by the fact that such co-regulation is condition-specific. Ignoring the context-specific nature of co-regulation significantly reduces the ability of clustering procedures to detect co-expressed genes due to additional 'noise' introduced by non-informative measurements. RESULTS We have developed a novel Bayesian hierarchical model and corresponding computational algorithms for clustering gene expression profiles across diverse experimental conditions and studies that accounts for context-specificity of gene expression patterns. The model is based on the Bayesian infinite mixtures framework and does not require a priori specification of the number of clusters. We demonstrate that explicit modeling of context-specificity results in increased accuracy of the cluster analysis by examining the specificity and sensitivity of clusters in microarray data. We also demonstrate that probabilities of co-expression derived from the posterior distribution of clusterings are valid estimates of statistical significance of created clusters. AVAILABILITY The open-source package gimm is available at http://eh3.uc.edu/gimm.
Collapse
|
32
|
Gottardo R, Raftery AE, Yeung KY, Bumgarner RE. Quality Control and Robust Estimation for cDNA Microarrays With Replicates. J Am Stat Assoc 2006. [DOI: 10.1198/016214505000001096] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
33
|
Li Q, Fraley C, Bumgarner RE, Yeung KY, Raftery AE. Donuts, scratches and blanks: robust model-based segmentation of microarray images. Bioinformatics 2005; 21:2875-82. [PMID: 15845656 DOI: 10.1093/bioinformatics/bti447] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Inner holes, artifacts and blank spots are common in microarray images, but current image analysis methods do not pay them enough attention. We propose a new robust model-based method for processing microarray images so as to estimate foreground and background intensities. The method starts with a very simple but effective automatic gridding method, and then proceeds in two steps. The first step applies model-based clustering to the distribution of pixel intensities, using the Bayesian Information Criterion (BIC) to choose the number of groups up to a maximum of three. The second step is spatial, finding the large spatially connected components in each cluster of pixels. The method thus combines the strengths of the histogram-based and spatial approaches. It deals effectively with inner holes in spots and with artifacts. It also provides a formal inferential basis for deciding when the spot is blank, namely when the BIC favors one group over two or three. RESULTS We apply our methods for gridding and segmentation to cDNA microarray images from an HIV infection experiment. In these experiments, our method had better stability across replicates than a fixed-circle segmentation method or the seeded region growing method in the SPOT software, without introducing noticeable bias when estimating the intensities of differentially expressed genes. AVAILABILITY spotSegmentation, an R language package implementing both the gridding and segmentation methods is available through the Bioconductor project (http://www.bioconductor.org). The segmentation method requires the contributed R package MCLUST for model-based clustering (http://cran.us.r-project.org). CONTACT fraley@stat.washington.edu.
Collapse
|
34
|
Yeung KY, Bumgarner RE, Raftery AE. Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 2005; 21:2394-402. [PMID: 15713736 DOI: 10.1093/bioinformatics/bti319] [Citation(s) in RCA: 200] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Selecting a small number of relevant genes for accurate classification of samples is essential for the development of diagnostic tests. We present the Bayesian model averaging (BMA) method for gene selection and classification of microarray data. Typical gene selection and classification procedures ignore model uncertainty and use a single set of relevant genes (model) to predict the class. BMA accounts for the uncertainty about the best set to choose by averaging over multiple models (sets of potentially overlapping relevant genes). RESULTS We have shown that BMA selects smaller numbers of relevant genes (compared with other methods) and achieves a high prediction accuracy on three microarray datasets. Our BMA algorithm is applicable to microarray datasets with any number of classes, and outputs posterior probabilities for the selected genes and models. Our selected models typically consist of only a few genes. The combination of high accuracy, small numbers of genes and posterior probabilities for the predictions should make BMA a powerful tool for developing diagnostics from expression data. AVAILABILITY The source codes and datasets used are available from our Supplementary website.
Collapse
|
35
|
Vanasse GJ, Winn RK, Rodov S, Zieske AW, Li JT, Tupper JC, Tang J, Raines EW, Peters MA, Yeung KY, Harlan JM. Bcl-2 Overexpression Leads to Increases in Suppressor of Cytokine Signaling-3 Expression in B Cells and De novo Follicular Lymphoma. Mol Cancer Res 2004. [DOI: 10.1158/1541-7786.620.2.11] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
The t(14;18)(q32;q21), resulting in deregulated expression of B-cell-leukemia/lymphoma-2 (Bcl-2), represents the genetic hallmark in human follicular lymphomas. Substantial evidence supports the hypothesis that the t(14;18) and Bcl-2 overexpression are necessary but not solely responsible for neoplastic transformation and require cooperating genetic derangements for neoplastic transformation to occur. To investigate genes that cooperate with Bcl-2 to influence cellular signaling pathways important for neoplastic transformation, we used oligonucleotide microarrays to determine differential gene expression patterns in CD19+ B cells isolated from Eμ-Bcl-2 transgenic mice and wild-type littermate control mice. Fifty-seven genes were induced and 94 genes were repressed by ≥2-fold in Eμ-Bcl-2 transgenic mice (P < 0.05). The suppressor of cytokine signaling-3 (SOCS3) gene was found to be overexpressed 5-fold in B cells from Eμ-Bcl-2 transgenic mice. Overexpression of Bcl-2 in both mouse embryo fibroblast-1 and hematopoietic cell lines resulted in induction of SOCS3 protein, suggesting a Bcl-2-associated mechanism underlying SOCS3 induction. Immunohistochemistry with SOCS3 antisera on tissue from a cohort of patients with de novo follicular lymphoma revealed marked overexpression of SOCS3 protein that, within the follicular center cell region, was limited to neoplastic follicular lymphoma cells and colocalized with Bcl-2 expression in 9 of 12 de novo follicular lymphoma cases examined. In contrast, SOCS3 protein expression was not detected in the follicular center cell region of benign hyperplastic tonsil tissue. These data suggest that Bcl-2 overexpression leads to the induction of activated signal transducer and activator of transcription 3 (STAT3) and to the induction of SOCS3, which may contribute to the pathogenesis of follicular lymphoma.
Collapse
|
36
|
Vanasse GJ, Winn RK, Rodov S, Zieske AW, Li JT, Tupper JC, Tang J, Raines EW, Peters MA, Yeung KY, Harlan JM. Bcl-2 overexpression leads to increases in suppressor of cytokine signaling-3 expression in B cells and de novo follicular lymphoma. Mol Cancer Res 2004; 2:620-31. [PMID: 15561778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Abstract
The t(14;18)(q32;q21), resulting in deregulated expression of B-cell-leukemia/lymphoma-2 (Bcl-2), represents the genetic hallmark in human follicular lymphomas. Substantial evidence supports the hypothesis that the t(14;18) and Bcl-2 overexpression are necessary but not solely responsible for neoplastic transformation and require cooperating genetic derangements for neoplastic transformation to occur. To investigate genes that cooperate with Bcl-2 to influence cellular signaling pathways important for neoplastic transformation, we used oligonucleotide microarrays to determine differential gene expression patterns in CD19+ B cells isolated from Emu-Bcl-2 transgenic mice and wild-type littermate control mice. Fifty-seven genes were induced and 94 genes were repressed by > or =2-fold in Emu-Bcl-2 transgenic mice (P < 0.05). The suppressor of cytokine signaling-3 (SOCS3) gene was found to be overexpressed 5-fold in B cells from Emu-Bcl-2 transgenic mice. Overexpression of Bcl-2 in both mouse embryo fibroblast-1 and hematopoietic cell lines resulted in induction of SOCS3 protein, suggesting a Bcl-2-associated mechanism underlying SOCS3 induction. Immunohistochemistry with SOCS3 antisera on tissue from a cohort of patients with de novo follicular lymphoma revealed marked overexpression of SOCS3 protein that, within the follicular center cell region, was limited to neoplastic follicular lymphoma cells and colocalized with Bcl-2 expression in 9 of 12 de novo follicular lymphoma cases examined. In contrast, SOCS3 protein expression was not detected in the follicular center cell region of benign hyperplastic tonsil tissue. These data suggest that Bcl-2 overexpression leads to the induction of activated signal transducer and activator of transcription 3 (STAT3) and to the induction of SOCS3, which may contribute to the pathogenesis of follicular lymphoma.
Collapse
|
37
|
Yeung KY, Medvedovic M, Bumgarner RE. From co-expression to co-regulation: how many microarray experiments do we need? Genome Biol 2004; 5:R48. [PMID: 15239833 PMCID: PMC463312 DOI: 10.1186/gb-2004-5-7-r48] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2004] [Revised: 04/19/2004] [Accepted: 05/28/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cluster analysis is often used to infer regulatory modules or biological function by associating unknown genes with other genes that have similar expression patterns and known regulatory elements or functions. However, clustering results may not have any biological relevance. RESULTS We applied various clustering algorithms to microarray datasets with different sizes, and we evaluated the clustering results by determining the fraction of gene pairs from the same clusters that share at least one known common transcription factor. We used both yeast transcription factor databases (SCPD, YPD) and chromatin immunoprecipitation (ChIP) data to evaluate our clustering results. We showed that the ability to identify co-regulated genes from clustering results is strongly dependent on the number of microarray experiments used in cluster analysis and the accuracy of these associations plateaus at between 50 and 100 experiments on yeast data. Moreover, the model-based clustering algorithm MCLUST consistently outperforms more traditional methods in accurately assigning co-regulated genes to the same clusters on standardized data. CONCLUSIONS Our results are consistent with respect to independent evaluation criteria that strengthen our confidence in our results. However, when one compares ChIP data to YPD, the false-negative rate is approximately 80% using the recommended p-value of 0.001. In addition, we showed that even with large numbers of experiments, the false-positive rate may exceed the true-positive rate. In particular, even when all experiments are included, the best results produce clusters with only a 28% true-positive rate using known gene transcription factor interactions.
Collapse
|
38
|
Medvedovic M, Yeung KY, Bumgarner RE. Bayesian mixture model based clustering of replicated microarray data. Bioinformatics 2004; 20:1222-32. [PMID: 14871871 DOI: 10.1093/bioinformatics/bth068] [Citation(s) in RCA: 145] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Identifying patterns of co-expression in microarray data by cluster analysis has been a productive approach to uncovering molecular mechanisms underlying biological processes under investigation. Using experimental replicates can generally improve the precision of the cluster analysis by reducing the experimental variability of measurements. In such situations, Bayesian mixtures allow for an efficient use of information by precisely modeling between-replicates variability. RESULTS We developed different variants of Bayesian mixture based clustering procedures for clustering gene expression data with experimental replicates. In this approach, the statistical distribution of microarray data is described by a Bayesian mixture model. Clusters of co-expressed genes are created from the posterior distribution of clusterings, which is estimated by a Gibbs sampler. We define infinite and finite Bayesian mixture models with different between-replicates variance structures and investigate their utility by analyzing synthetic and the real-world datasets. Results of our analyses demonstrate that (1) improvements in precision achieved by performing only two experimental replicates can be dramatic when the between-replicates variability is high, (2) precise modeling of intra-gene variability is important for accurate identification of co-expressed genes and (3) the infinite mixture model with the 'elliptical' between-replicates variance structure performed overall better than any other method tested. We also introduce a heuristic modification to the Gibbs sampler based on the 'reverse annealing' principle. This modification effectively overcomes the tendency of the Gibbs sampler to converge to different modes of the posterior distribution when started from different initial positions. Finally, we demonstrate that the Bayesian infinite mixture model with 'elliptical' variance structure is capable of identifying the underlying structure of the data without knowing the 'correct' number of clusters. AVAILABILITY The MS Windows based program named Gaussian Infinite Mixture Modeling (GIMM) implementing the Gibbs sampler and corresponding C++ code are available at http://homepages.uc.edu/~medvedm/GIMM.htm SUPPLEMENTAL INFORMATION: http://expression.microslu.washington.edu/expression/kayee/medvedovic2003/medvedovic_bioinf2003.html
Collapse
|
39
|
Yeung KY, Bumgarner RE. Multiclass classification of microarray data with repeated measurements: application to cancer. Genome Biol 2003; 4:R83. [PMID: 14659020 PMCID: PMC329422 DOI: 10.1186/gb-2003-4-12-r83] [Citation(s) in RCA: 91] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2003] [Revised: 08/14/2003] [Accepted: 10/17/2003] [Indexed: 11/21/2022] Open
Abstract
Prediction of the diagnostic category of a tissue sample from its gene-expression profile and selection of relevant genes for class prediction have important applications in cancer research. Uncorrelated shrunken centroid and error-weighted, uncorrelated shrunken centroid algorithms have been developed that are applicable to microarray data with any number of classes. Prediction of the diagnostic category of a tissue sample from its gene-expression profile and selection of relevant genes for class prediction have important applications in cancer research. We have developed the uncorrelated shrunken centroid (USC) and error-weighted, uncorrelated shrunken centroid (EWUSC) algorithms that are applicable to microarray data with any number of classes. We show that removing highly correlated genes typically improves classification results using a small set of genes.
Collapse
|
40
|
Yeung KY, Medvedovic M, Bumgarner RE. Clustering gene-expression data with repeated measurements. Genome Biol 2003; 4:R34. [PMID: 12734014 PMCID: PMC156590 DOI: 10.1186/gb-2003-4-5-r34] [Citation(s) in RCA: 136] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2002] [Revised: 02/11/2003] [Accepted: 03/07/2003] [Indexed: 11/26/2022] Open
Abstract
Clustering is a common methodology for the analysis of array data, and many research laboratories are generating array data with repeated measurements. We evaluated several clustering algorithms that incorporate repeated measurements, and show that algorithms that take advantage of repeated measurements yield more accurate and more stable clusters. In particular, we show that the infinite mixture model-based approach with a built-in error model produces superior results.
Collapse
|
41
|
Barrett MT, Yeung KY, Ruzzo WL, Hsu L, Blount PL, Sullivan R, Zarbl H, Delrow J, Rabinovitch PS, Reid BJ. Transcriptional analyses of Barrett's metaplasia and normal upper GI mucosae. Neoplasia 2002; 4:121-8. [PMID: 11896567 PMCID: PMC1550324 DOI: 10.1038/sj.neo.7900221] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2001] [Accepted: 09/14/2001] [Indexed: 12/29/2022]
Abstract
Over the last two decades, the incidence of esophageal adenocarcinoma (EA) has increased dramatically in the US and Western Europe. It has been shown that EAs evolve from premalignant Barrett's esophagus (BE) tissue by a process of clonal expansion and evolution. However, the molecular phenotype of the premalignant metaplasia, and its relationship to those of the normal upper gastrointestinal (GI) mucosae, including gastric, duodenal, and squamous epithelium of the esophagus, has not been systematically characterized. Therefore, we used oligonucleotide-based microarrays to characterize gene expression profiles in each of these tissues. The similarity of BE to each of the normal tissues was compared using a series of computational approaches. Our analyses included esophageal squamous epithelium, which is present at the same anatomic site and exposed to similar conditions as Barrett's epithelium, duodenum that shares morphologic similarity to Barrett's epithelium, and adjacent gastric epithelium. There was a clear distinction among the expression profiles of gastric, duodenal, and squamous epithelium whereas the BE profiles showed considerable overlap with normal tissues. Furthermore, we identified clusters of genes that are specific to each of the tissues, to the Barrett's metaplastic epithelia, and a cluster of genes that was distinct between squamous and non-squamous epithelia.
Collapse
|
42
|
Yeung KY, Baum L, Chan WM, Lam DS, Kwok AK, Pang CP. Molecular diagnostics for retinitis pigmentosa. Clin Chim Acta 2001; 313:209-15. [PMID: 11694261 DOI: 10.1016/s0009-8981(01)00674-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND At least 1 million people worldwide have retinitis pigmentosa (RP), making it relatively common among the inherited forms of blindness. Mutations in many genes may cause RP. The most common known mutation, Pro347Leu in rhodopsin, is found in no more than about 1% of unrelated patients, implying the impracticality of a diagnostic test which would screen only for a few, common mutation sites. CONCLUSIONS Ongoing discovery and study of RP genes makes it feasible to consider a molecular diagnostic test which would screen coding regions of all known RP genes by a mutation detection method such as conformation-sensitive gel electrophoresis followed by sequencing. The parallel development of RP genetic knowledge and treatments such as gene therapy will make such tests both possible and necessary.
Collapse
|
43
|
Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL. Model-based clustering and data transformations for gene expression data. Bioinformatics 2001; 17:977-87. [PMID: 11673243 DOI: 10.1093/bioinformatics/17.10.977] [Citation(s) in RCA: 594] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Clustering is a useful exploratory technique for the analysis of gene expression data. Many different heuristic clustering algorithms have been proposed in this context. Clustering algorithms based on probability models offer a principled alternative to heuristic algorithms. In particular, model-based clustering assumes that the data is generated by a finite mixture of underlying probability distributions such as multivariate normal distributions. The issues of selecting a 'good' clustering method and determining the 'correct' number of clusters are reduced to model selection problems in the probability framework. Gaussian mixture models have been shown to be a powerful tool for clustering in many applications. RESULTS We benchmarked the performance of model-based clustering on several synthetic and real gene expression data sets for which external evaluation criteria were available. The model-based approach has superior performance on our synthetic data sets, consistently selecting the correct model and the number of clusters. On real expression data, the model-based approach produced clusters of quality comparable to a leading heuristic clustering algorithm, but with the key advantage of suggesting the number of clusters and an appropriate model. We also explored the validity of the Gaussian mixture assumption on different transformations of real data. We also assessed the degree to which these real gene expression data sets fit multivariate Gaussian distributions both before and after subjecting them to commonly used data transformations. Suitably chosen transformations seem to result in reasonable fits. AVAILABILITY MCLUST is available at http://www.stat.washington.edu/fraley/mclust. The software for the diagonal model is under development. CONTACT kayee@cs.washington.edu. SUPPLEMENTARY INFORMATION http://www.cs.washington.edu/homes/kayee/model.
Collapse
|
44
|
Chan WM, Yeung KY, Pang CP, Baum L, Lau TC, Kwok AK, Lam DS. Rhodopsin mutations in Chinese patients with retinitis pigmentosa. Br J Ophthalmol 2001; 85:1046-8. [PMID: 11520753 PMCID: PMC1724134 DOI: 10.1136/bjo.85.9.1046] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
AIM To determine the pattern of rhodopsin mutations in Chinese retinitis pigmentosa (RP) patients. METHODS The rhodopsin gene was examined in 101 RP patients and 190 controls from Hong Kong. RESULTS Three coding changes were identified: Pro347Leu, Ala299Ser, and 5211delC. Each protein sequence alteration was found in one patient. Ala299Ser also existed in two controls. CONCLUSION The C-terminal nonsense mutation may cause mis-sorting of rhodopsin protein. The finding of controls with Ala299Ser suggests this is only the third missense alteration reported that does not cause RP. The expected frequency of rhodopsin mutations in RP is <7% (2/101=2.0%, 95% confidence interval: 0.2%-7.0%).
Collapse
|
45
|
Abstract
MOTIVATION There is a great need to develop analytical methodology to analyze and to exploit the information contained in gene expression data. Because of the large number of genes and the complexity of biological networks, clustering is a useful exploratory technique for analysis of gene expression data. Other classical techniques, such as principal component analysis (PCA), have also been applied to analyze gene expression data. Using different data analysis techniques and different clustering algorithms to analyze the same data set can lead to very different conclusions. Our goal is to study the effectiveness of principal components (PCs) in capturing cluster structure. Specifically, using both real and synthetic gene expression data sets, we compared the quality of clusters obtained from the original data to the quality of clusters obtained after projecting onto subsets of the principal component axes. RESULTS Our empirical study showed that clustering with the PCs instead of the original variables does not necessarily improve, and often degrades, cluster quality. In particular, the first few PCs (which contain most of the variation in the data) do not necessarily capture most of the cluster structure. We also showed that clustering with PCs has different impact on different algorithms and different similarity metrics. Overall, we would not recommend PCA before clustering except in special circumstances.
Collapse
|
46
|
Baum L, Chan WM, Yeung KY, Lam DS, Kwok AK, Pang CP. RP1 in Chinese: Eight novel variants and evidence that truncation of the extreme C-terminal does not cause retinitis pigmentosa. Hum Mutat 2001; 17:436. [PMID: 11317367 DOI: 10.1002/humu.1127] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Heterozygous truncating mutations in the RP1 gene cause approximately 7% of autosomal dominant retinitis pigmentosa (RP) cases. To examine the role of RP1 mutations in RP, we screened 101 unrelated Chinese RP patients (unselected for mode of inheritance) and 190 elderly normal control subjects for sequence changes in the coding exons for the 2156 amino acid RP1 protein. One patient had a mutation, thus RP1 mutations cause about 0.0% to 5.4% (95% confidence interval) of all RP among Chinese. The mutation was R677X, the most common found in Americans. Five other known sequence changes were found. In addition, nine novel sequence alterations were identified: 746G>A (R249H), 1437G>T (M479I), 2116G>C (G706R), 3024G>A (Q1008Q), 3188G>A (Q1063R), 5797C>T (R1933X), 6423A>G (I2141M), and the variants 6542C>T and 6676T>A, both in the 3' untranslated region. One control subject and three members of a non-RP family were heterozygous for R1933X, which is therefore likely to be a non-disease-causing variant. The most C-terminal truncation previously reported was due to Tyr1053 (1-bp del) and occurred in RP patients. Thus the presence of a normal level of at least part of RP1 between amino acids 1052 and 1933 appears necessary to prevent RP. Hum Mutat 17:436, 2001.
Collapse
|
47
|
Yeung KY, Barrett M, Delrow J, Blount P, Reid B, Rabinovitch P. Transcriptional analysis of Barrett's epithelium and normal gastrointestinal tissues. Nat Genet 2001. [DOI: 10.1038/87376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
48
|
Abstract
MOTIVATION Many clustering algorithms have been proposed for the analysis of gene expression data, but little guidance is available to help choose among them. We provide a systematic framework for assessing the results of clustering algorithms. Clustering algorithms attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. Our methodology is to apply a clustering algorithm to the data from all but one experimental condition. The remaining condition is used to assess the predictive power of the resulting clusters-meaningful clusters should exhibit less variation in the remaining condition than clusters formed by chance. RESULTS We successfully applied our methodology to compare six clustering algorithms on four gene expression data sets. We found our quantitative measures of cluster quality to be positively correlated with external standards of cluster quality.
Collapse
|
49
|
Ahlgren JD, Ellison NM, Gottlieb RJ, Laluna F, Lokich JJ, Sinclair PR, Ueno W, Wampler GL, Yeung KY, Alt D. Hormonal palliation of chemoresistant ovarian cancer: three consecutive phase II trials of the Mid-Atlantic Oncology Program. J Clin Oncol 1993; 11:1957-68. [PMID: 7691999 DOI: 10.1200/jco.1993.11.10.1957] [Citation(s) in RCA: 82] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
PURPOSE To evaluate the efficacy of three hormonal manipulations in the palliation of chemoresistant ovarian cancer, and to analyze the results in the light of other clinical trials. PATIENTS AND METHODS Three sequential phase II trials were performed in patients with refractory epithelial ovarian carcinoma, using high-dose megestrol acetate (800 mg/d for 30 days, then 400 mg/d), high-dose tamoxifen (80 mg/d for 30 days, then 40 mg/d), and aminoglutethimide (1 g/d plus tapering doses of hydrocortisone). Results were compared with those described in the world literature from trials of the same or similar agents. RESULTS No responses were seen among 30 assessable patients treated with megestrol acetate, and most (but not all) similar trials have reported low response rates. Five responses (17%) were seen among 29 patients treated with tamoxifen. Two responses exceeded 5 years in duration. No responses were seen among 15 patients treated with aminoglutethimide. CONCLUSION Antiestrogen therapy may offer the possibility of useful and, occasionally, long-term palliation of refractory epithelial ovarian carcinoma, with little toxicity. There may be a trend toward a dose-response effect, which represents a suitable topic for a future prospective trial.
Collapse
|
50
|
Cheung WH, Ha DK, Yeung KY, Hung RP. Methods for enumerating Escherichia coli in subtropical waters. Epidemiol Infect 1991; 106:345-54. [PMID: 2019302 PMCID: PMC2272005 DOI: 10.1017/s0950268800048494] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
The standard membrane filtration method of the UK has been modified in order to improve its specificity for enumerating Escherichia coli in the subtropical waters of Hong Kong. This involves incorporating into the membrane lauryl sulphate (mLS) method either an in situ urease test (the mLS-UA method), or an in situ beta-glucuronidase test (the mLS-GUD method). The false-positive errors of the mLS-UA and mLS-GUD methods are low, ranging from 3-5%. A comparison between the membrane filtration (mLS-UA) method and the multiple tube technique in testing E. coli in subtropical beach-waters has demonstrated that the former can give much more precise counts, and is the method of choice for such a purpose. The mLS-GUD method, for which automated counting of E. coli colonies is possible, is a good alternative to mLS-UA in routine enumeration of this bacterial indicator in environmental waters.
Collapse
|