901
|
Ghosh D, Chinnaiyan AM. Empirical Bayes Identication of Tumor Progression Genes from Microarray Data. Biom J 2007; 49:68-77. [PMID: 17342950 DOI: 10.1002/bimj.200610312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The use of microarray data has become quite commonplace in medical and scientific experiments. We focus here on microarray data generated from cancer studies. It is potentially important for the discovery of biomarkers to identify genes whose expression levels correlate with tumor progression. In this article, we propose a simple procedure for the identification of such genes, which we term tumor progression genes. The first stage involves estimation based on the proportional odds model. At the second stage, we calculate two quantities: a q-value, and a shrinkage estimator of the test statistic is constructed to adjust for the multiple testing problem. The relationship between the proposed method with the false discovery rate is studied. The proposed methods are applied to data from a prostate cancer microarray study.
Collapse
Affiliation(s)
- Debashis Ghosh
- Department of Biostatistics, University of Michigan, 1420 Washington Heights, Ann Arbor, Michigan 48109-2029, USA.
| | | |
Collapse
|
902
|
Grade M, Hörmann P, Becker S, Hummon AB, Wangsa D, Varma S, Simon R, Liersch T, Becker H, Difilippantonio MJ, Ghadimi BM, Ried T. Gene expression profiling reveals a massive, aneuploidy-dependent transcriptional deregulation and distinct differences between lymph node-negative and lymph node-positive colon carcinomas. Cancer Res 2007; 67:41-56. [PMID: 17210682 PMCID: PMC4721580 DOI: 10.1158/0008-5472.can-06-1514] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
To characterize patterns of global transcriptional deregulation in primary colon carcinomas, we did gene expression profiling of 73 tumors [Unio Internationale Contra Cancrum stage II (n = 33) and stage III (n = 40)] using oligonucleotide microarrays. For 30 of the tumors, expression profiles were compared with those from matched normal mucosa samples. We identified a set of 1,950 genes with highly significant deregulation between tumors and mucosa samples (P < 1e-7). A significant proportion of these genes mapped to chromosome 20 (P = 0.01). Seventeen genes had a >5-fold average expression difference between normal colon mucosa and carcinomas, including up-regulation of MYC and of HMGA1, a putative oncogene. Furthermore, we identified 68 genes that were significantly differentially expressed between lymph node-negative and lymph node-positive tumors (P < 0.001), the functional annotation of which revealed a preponderance of genes that play a role in cellular immune response and surveillance. The microarray-derived gene expression levels of 20 deregulated genes were validated using quantitative real-time reverse transcription-PCR in >40 tumor and normal mucosa samples with good concordance between the techniques. Finally, we established a relationship between specific genomic imbalances, which were mapped for 32 of the analyzed colon tumors by comparative genomic hybridization, and alterations of global transcriptional activity. Previously, we had conducted a similar analysis of primary rectal carcinomas. The systematic comparison of colon and rectal carcinomas revealed a significant overlap of genomic imbalances and transcriptional deregulation, including activation of the Wnt/beta-catenin signaling cascade, suggesting similar pathogenic pathways.
Collapse
Affiliation(s)
- Marian Grade
- Department of General Surgery, University Medical Center, Göttingen, Germany
| | - Patrick Hörmann
- Genetics Branch, National Cancer Institute, NIH, Bethesda, Maryland
| | - Sandra Becker
- Genetics Branch, National Cancer Institute, NIH, Bethesda, Maryland
| | - Amanda B. Hummon
- Genetics Branch, National Cancer Institute, NIH, Bethesda, Maryland
| | - Danny Wangsa
- Genetics Branch, National Cancer Institute, NIH, Bethesda, Maryland
| | - Sudhir Varma
- Biometrics Research Branch, National Cancer Institute, NIH, Bethesda, Maryland
| | - Richard Simon
- Biometrics Research Branch, National Cancer Institute, NIH, Bethesda, Maryland
| | - Torsten Liersch
- Department of General Surgery, University Medical Center, Göttingen, Germany
| | - Heinz Becker
- Department of General Surgery, University Medical Center, Göttingen, Germany
| | | | - B. Michael Ghadimi
- Department of General Surgery, University Medical Center, Göttingen, Germany
| | - Thomas Ried
- Genetics Branch, National Cancer Institute, NIH, Bethesda, Maryland
| |
Collapse
|
903
|
GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology. BMC Genomics 2007; 8:30. [PMID: 17250772 PMCID: PMC1794235 DOI: 10.1186/1471-2164-8-30] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2006] [Accepted: 01/24/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rapid progress in high-throughput biotechnologies (e.g. microarrays) and exponential accumulation of gene functional knowledge make it promising for systematic understanding of complex human diseases at functional modules level. Based on Gene Ontology, a large number of automatic tools have been developed for the functional analysis and biological interpretation of the high-throughput microarray data. RESULTS Different from the existing tools such as Onto-Express and FatiGO, we develop a tool named GO-2D for identifying 2-dimensional functional modules based on combined GO categories. For example, it refines biological process categories by sorting their genes into different cellular component categories, and then extracts those combined categories enriched with the interesting genes (e.g., the differentially expressed genes) for identifying the cellular-localized functional modules. Applications of GO-2D to the analyses of two human cancer datasets show that very specific disease-relevant processes can be identified by using cellular location information. CONCLUSION For studying complex human diseases, GO-2D can extract functionally compact and detailed modules such as the cellular-localized ones, characterizing disease-relevant modules in terms of both biological processes and cellular locations. The application results clearly demonstrate that 2-dimensional approach complementary to current 1-dimensional approach is powerful for finding modules highly relevant to diseases.
Collapse
|
904
|
Abstract
Nonparametric and parametric approaches have been proposed to estimate false discovery rate under the independent hypothesis testing assumption. The parametric approach has been shown to have better performance than the nonparametric approaches. In this article, we study the nonparametric approaches and quantify the underlying relations between parametric and nonparametric approaches. Our study reveals the conservative nature of the nonparametric approaches, and establishes the connections between the empirical Bayes method and p-value-based nonparametric methods. Based on our results, we advocate using the parametric approach, or directly modeling the test statistics using the empirical Bayes method.
Collapse
Affiliation(s)
- Baolin Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota 55455, USA.
| | | | | |
Collapse
|
905
|
Yang WH, Dai DQ, Yan H. Biclustering of Microarray Data Based on Singular Value Decomposition. EMERGING TECHNOLOGIES IN KNOWLEDGE DISCOVERY AND DATA MINING 2007. [DOI: 10.1007/978-3-540-77018-3_21] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
|
906
|
Linder R, Richards T, Wagner M. Microarray data classified by artificial neural networks. Methods Mol Biol 2007; 382:345-72. [PMID: 18220242 DOI: 10.1007/978-1-59745-304-2_22] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Systems biology has enjoyed explosive growth in both the number of people participating in this area of research and the number of publications on the topic. The field of systems biology encompasses the in silico analysis of high-throughput data as provided by DNA or protein microarrays. Along with the increasing availability of microarray data, attention is focused on methods of analyzing the expression rates. One important type of analysis is the classification task, for example, distinguishing different types of cell functions or tumors. Recently, interest has been awakened toward artificial neural networks (ANN), which have many appealing characteristics such as an exceptional degree of accuracy. Nonlinear relationships or independence from certain assumptions regarding the data distribution are also considered. The current work reviews advantages as well as disadvantages of neural networks in the context of microarray analysis. Comparisons are drawn to alternative methods. Selected solutions are discussed, and finally algorithms for the effective combination of multiple ANNs are presented. The development of approaches to use ANN-processed microarray data applicable to run cell and tissue simulations may be slated for future investigation.
Collapse
Affiliation(s)
- Roland Linder
- Institute of Medical Informatics, University of Lübeck, Germany
| | | | | |
Collapse
|
907
|
Lusa L, McShane LM, Radmacher MD, Shih JH, Wright GW, Simon R. Appropriateness of some resampling-based inference procedures for assessing performance of prognostic classifiers derived from microarray data. Stat Med 2007; 26:1102-13. [PMID: 16755534 DOI: 10.1002/sim.2598] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The goal of many gene-expression microarray profiling clinical studies is to develop a multivariate classifier to predict patient disease outcome from a gene-expression profile measured on some biological specimen from the patient. Often some preliminary validation of the predictive power of a profile-based classifier is carried out using the same data set that was used to derive the classifier. Techniques such as cross-validation or bootstrapping can be used in this setting to assess predictive power, and if applied correctly, can result in a less biased estimate of predictive accuracy of a classifier. However, some investigators have attempted to apply standard statistical inference procedures to assess the statistical significance of associations between true and cross-validated predicted outcomes. We demonstrate in this paper that naïve application of standard statistical inference procedures to these measures of association under null situations can result in greatly inflated testing type I error rates. Under alternatives of small to moderate associations, confidence interval coverage probabilities may be too low, although for very large associations coverage probabilities approach their intended values. Our results suggest that caution should be exercised in interpreting some of the claims of exceptional prognostic classifier performance that have been reported in prominent biomedical journals in the past few years.
Collapse
Affiliation(s)
- Lara Lusa
- Department of Experimental Oncology, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milano, Italy.
| | | | | | | | | | | |
Collapse
|
908
|
Wang L, Chu F, Xie W. Accurate cancer classification using expressions of very few genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2007; 4:40-53. [PMID: 17277412 DOI: 10.1109/tcbb.2007.1006] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
We aim at finding the smallest set of genes that can ensure highly accurate classification of cancers from microarray data by using supervised machine learning algorithms. The significance of finding the minimum gene subsets is three-fold: 1) It greatly reduces the computational burden and "noise" arising from irrelevant genes. In the examples studied in this paper, finding the minimum gene subsets even allows for extraction of simple diagnostic rules which lead to accurate diagnosis without the need for any classifiers. 2) It simplifies gene expression tests to include only a very small number of genes rather than thousands of genes, which can bring down the cost for cancer testing significantly. 3) It calls for further investigation into the possible biological relationship between these small numbers of genes and cancer development and treatment. Our simple yet very effective method involves two steps. In the first step, we choose some important genes using a feature importance ranking scheme. In the second step, we test the classification capability of all simple combinations of those important genes by using a good classifier. For three "small" and "simple" data sets with two, three, and four cancer (sub)types, our approach obtained very high accuracy with only two or three genes. For a "large" and "complex" data set with 14 cancer types, we divided the whole problem into a group of binary classification problems and applied the 2-step approach to each of these binary classification problems. Through this "divide-and-conquer" approach, we obtained accuracy comparable to previously reported results but with only 28 genes rather than 16,063 genes. In general, our method can significantly reduce the number of genes required for highly reliable diagnosis.
Collapse
Affiliation(s)
- Lipo Wang
- School of Electircal and Electronic Engineering, Nanyang Technological University, Singapore.
| | | | | |
Collapse
|
909
|
Do KA, McLachlan G, Bean R, Wen S. Application of Gene Shaving and Mixture Models to Cluster Microarray Gene Expression Data. Cancer Inform 2007. [DOI: 10.1177/117693510700500002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Researchers are frequently faced with the analysis of microarray data of a relatively large number of genes using a small number of tissue samples. We examine the application of two statistical methods for clustering such microarray expression data: EMMIX-GENE and GeneClust. EMMIX-GENE is a mixture-model based clustering approach, designed primarily to cluster tissue samples on the basis of the genes. GeneClust is an implementation of the gene shaving methodology, motivated by research to identify distinct sets of genes for which variation in expression could be related to a biological property of the tissue samples. We illustrate the use of these two methods in the analysis of Affymetrix oligonucleotide arrays of well-known data sets from colon tissue samples with and without tumors, and of tumor tissue samples from patients with leukemia. Although the two approaches have been developed from different perspectives, the results demonstrate a clear correspondence between gene clusters produced by GeneClust and EMMIX-GENE for the colon tissue data. It is demonstrated, for the case of ribosomal proteins and smooth muscle genes in the colon data set, that both methods can classify genes into co-regulated families. It is further demonstrated that tissue types (tumor and normal) can be separated on the basis of subtle distributed patterns of genes. Application to the leukemia tissue data produces a division of tissues corresponding closely to the external classification, acute myeloid meukemia (AML) and acute lymphoblastic leukemia (ALL), for both methods. In addition, we also identify genes specific for the subgroup of ALL-Tcell samples. Overall, we find that the gene shaving method produces gene clusters at great speed; allows variable cluster sizes and can incorporate partial or full supervision; and finds clusters of genes in which the gene expression varies greatly over the tissue samples while maintaining a high level of coherence between the gene expression profiles. The intent of the EMMIX-GENE method is to cluster the tissue samples. It performs a filtering step that results in a subset of relevant genes, followed by gene clustering, and then tissue clustering, and is favorable in its accuracy of ranking the clusters produced.
Collapse
Affiliation(s)
- K-A. Do
- University of Texas, M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | - G.J. McLachlan
- Department of Mathematics & Institute for Molecular Bioscience, University of Queensland Brisbane, 4072, Australia
| | - R. Bean
- Department of Mathematics & Institute for Molecular Bioscience, University of Queensland Brisbane, 4072, Australia
| | - S. Wen
- University of Texas, M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| |
Collapse
|
910
|
|
911
|
Niijima S, Kuhara S. Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE. BMC Bioinformatics 2006; 7:543. [PMID: 17187691 PMCID: PMC1790716 DOI: 10.1186/1471-2105-7-543] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2006] [Accepted: 12/25/2006] [Indexed: 11/23/2022] Open
Abstract
Background: In class prediction problems using microarray data, gene selection is essential to improve the prediction accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVM-RFE) has become one of the leading methods and is being widely used. The SVM-based approach performs gene selection using the weight vector of the hyperplane constructed by the samples on the margin. However, the performance can be easily affected by noise and outliers, when it is applied to noisy, small sample size microarray data. Results: In this paper, we propose a recursive gene selection method using the discriminant vector of the maximum margin criterion (MMC), which is a variant of classical linear discriminant analysis (LDA). To overcome the computational drawback of classical LDA and the problem of high dimensionality, we present efficient and stable algorithms for MMC-based RFE (MMC-RFE). The MMC-RFE algorithms naturally extend to multi-class cases. The performance of MMC-RFE was extensively compared with that of SVM-RFE using nine cancer microarray datasets, including four multi-class datasets. Conclusion: Our extensive comparison has demonstrated that for binary-class datasets MMC-RFE tends to show intermediate performance between hard-margin SVM-RFE and SVM-RFE with a properly chosen soft-margin parameter. Notably, MMC-RFE achieves significantly better performance with a smaller number of genes than SVM-RFE for multi-class datasets. The results suggest that MMC-RFE is less sensitive to noise and outliers due to the use of average margin, and thus may be useful for biomarker discovery from noisy data.
Collapse
Affiliation(s)
- Satoshi Niijima
- Department of Bioinformatics, Graduate School of Systems Life Sciences, Kyushu University, Hakozaki 6-10-1, Higashi-ku, Fukuoka 812-8581, Japan
| | - Satoru Kuhara
- Faculty of Agriculture, Kyushu University, Hakozaki 6-10-1, Higashi-ku, Fukuoka 812-8581, Japan
| |
Collapse
|
912
|
Abstract
As microarray analyses become increasingly routine, involving the simultaneous investigation of huge numbers of genes, researchers can easily search for and uncover what appear to be promising patterns in their data. In such circumstances tools are needed to help decide the extent to which these patterns are meaningful or can be explained by chance alone. The purpose of this chapter is to describe examples of the use of microarray analysis for inferential purposes and how validation of inference is addressed by Monte-Carlo techniques, which essentially amounts to investigation of statistical methods on synthetic or random data sets.
Collapse
Affiliation(s)
- Daniel Q Naiman
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
913
|
Ma S, Huang J. Clustering threshold gradient descent regularization: with applications to microarray studies. Bioinformatics 2006; 23:466-72. [PMID: 17182700 DOI: 10.1093/bioinformatics/btl632] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION An important goal of microarray studies is to discover genes that are associated with clinical outcomes, such as disease status and patient survival. While a typical experiment surveys gene expressions on a global scale, there may be only a small number of genes that have significant influence on a clinical outcome. Moreover, expression data have cluster structures and the genes within a cluster have correlated expressions and coordinated functions, but the effects of individual genes in the same cluster may be different. Accordingly, we seek to build statistical models with the following properties. First, the model is sparse in the sense that only a subset of the parameter vector is non-zero. Second, the cluster structures of gene expressions are properly accounted for. RESULTS For gene expression data without pathway information, we divide genes into clusters using commonly used methods, such as K-means or hierarchical approaches. The optimal number of clusters is determined using the Gap statistic. We propose a clustering threshold gradient descent regularization (CTGDR) method, for simultaneous cluster selection and within cluster gene selection. We apply this method to binary classification and censored survival analysis. Compared to the standard TGDR and other regularization methods, the CTGDR takes into account the cluster structure and carries out feature selection at both the cluster level and within-cluster gene level. We demonstrate the CTGDR on two studies of cancer classification and two studies correlating survival of lymphoma patients with microarray expressions. AVAILABILITY R code is available upon request. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shuangge Ma
- Department of Epidemiology and Public Health, Yale University, New Haven, CT, USA.
| | | |
Collapse
|
914
|
Abstract
Background Microarray studies provide a way of linking variations of phenotypes with their genetic causations. Constructing predictive models using high dimensional microarray measurements usually consists of three steps: (1) unsupervised gene screening; (2) supervised gene screening; and (3) statistical model building. Supervised gene screening based on marginal gene ranking is commonly used to reduce the number of genes in the model building. Various simple statistics, such as t-statistic or signal to noise ratio, have been used to rank genes in the supervised screening. Despite of its extensive usage, statistical study of supervised gene screening remains scarce. Our study is partly motivated by the differences in gene discovery results caused by using different supervised gene screening methods. Results We investigate concordance and reproducibility of supervised gene screening based on eight commonly used marginal statistics. Concordance is assessed by the relative fractions of overlaps between top ranked genes screened using different marginal statistics. We propose a Bootstrap Reproducibility Index, which measures reproducibility of individual genes under the supervised screening. Empirical studies are based on four public microarray data. We consider the cases where the top 20%, 40% and 60% genes are screened. Conclusion From a gene discovery point of view, the effect of supervised gene screening based on different marginal statistics cannot be ignored. Empirical studies show that (1) genes passed different supervised screenings may be considerably different; (2) concordance may vary, depending on the underlying data structure and percentage of selected genes; (3) evaluated with the Bootstrap Reproducibility Index, genes passed supervised screenings are only moderately reproducible; and (4) concordance cannot be improved by supervised screening based on reproducibility.
Collapse
Affiliation(s)
- Shuangge Ma
- Department of Epidemiology and Public Health, Yale University, New Haven, CT 06520, USA.
| |
Collapse
|
915
|
Kemming D, Vogt U, Tidow N, Schlotter CM, Bürger H, Helms MW, Korsching E, Granetzny A, Boseila A, Hillejan L, Marra A, Ergönenc Y, Adigüzel H, Brandt B. Whole genome expression analysis for biologic rational pathway modeling: application in cancer prognosis and therapy prediction. Mol Diagn Ther 2006; 10:271-80. [PMID: 17022690 DOI: 10.1007/bf03256202] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Using semi-quantitative microarray technology, almost every one of the approximately 30 000 human genes can be analyzed simultaneously with a low rate of false-positives, a high specificity, and a high quantification accuracy. This is supported by data from comparative studies of microarrays and reverse-transcription PCR for established cancer genes including those for epidermal growth factor receptor (EGFR), human epidermal growth factor receptor-2 (HER2/ERBB2), estrogen receptor (ESR1), progesterone receptor (PGR), urokinase-type plasminogen activator (PLAU), and plasminogen activator inhibitor-1 (SERPINE1). As such, semi-quantitative expression data provide an almost completely comprehensive background of biological knowledge that can be applied to cancer diagnostics. In clinical terms, expression profiling may be able to provide significant information regarding (i) the identification of high-risk patients requiring aggressive chemotherapy; (ii) the pathway control of therapy predictive parameters (e.g. ESR1 and HER2); (iii) the discovery of targets for biologically rational therapeutics (e.g. capecitabine and trastuzumab); (iv) additional support for decisions about switching therapy; (v) target discovery; and (vi) the prediction of the course of new therapies in clinical trials. In conclusion, whole genome expression analysis might be able to determine important genes related to cancer progression and adjuvant chemotherapy resistance, especially in the context of new approaches involving primary systemic chemotherapy. In this review, we will survey the current progress in whole genome expression analyses for cancer prognosis and prediction. Special emphasis is given to the approach of combining biostatistical analysis of expression data with knowledge of biochemical and genetic pathways.
Collapse
Affiliation(s)
- D Kemming
- Institute for Tumor Biology, Hamburg, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
916
|
Abstract
Background Recently several statistical methods have been proposed to identify genes with differential expression between two conditions. However, very few studies consider the problem of sample imbalance and there is no study to investigate the impact of sample imbalance on identifying differential expression genes. In addition, it is not clear which method is more suitable for the unbalanced data. Results Based on random sampling, two evaluation models are proposed to investigate the impact of sample imbalance on identifying differential expression genes. Using the proposed evaluation models, the performances of six famous methods are compared on the unbalanced data. The experimental results indicate that the sample imbalance has a great influence on selecting differential expression genes. Furthermore, different methods have very different performances on the unbalanced data. Among the six methods, the welch t-test appears to perform best when the size of samples in the large variance group is larger than that in the small one, while the Regularized t-test and SAM outperform others on the unbalanced data in other cases. Conclusion Two proposed evaluation models are effective and sample imbalance should be taken into account in microarray experiment design and gene expression data analysis. The results and two proposed evaluation models can provide some help in selecting suitable method to process the unbalanced data.
Collapse
Affiliation(s)
- Kun Yang
- Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China
| | - Jianzhong Li
- Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China
| | - Hong Gao
- Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China
| |
Collapse
|
917
|
Lambert-Lacroix S, Peyre J. Local likelihood regression in generalized linear single-index models with applications to microarray data. Comput Stat Data Anal 2006. [DOI: 10.1016/j.csda.2006.06.021] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
918
|
Kim Y, Kwon S, Heun Song S. Multiclass sparse logistic regression for classification of multiple cancer types using gene expression data. Comput Stat Data Anal 2006. [DOI: 10.1016/j.csda.2006.06.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
919
|
|
920
|
Díaz-Uriarte R, Alvarez de Andrés S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 2006. [DOI: 10.1186/1471-2105-7-3 pmid: 16398926] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Selection of relevant genes for sample classification is a common task in most gene expression studies, where researchers try to identify the smallest possible set of genes that can still achieve good predictive performance (for instance, for future use with diagnostic purposes in clinical practice). Many gene selection approaches use univariate (gene-by-gene) rankings of gene relevance and arbitrary thresholds to select the number of genes, can only be applied to two-class problems, and use gene selection ranking criteria unrelated to the classification algorithm. In contrast, random forest is a classification algorithm well suited for microarray data: it shows excellent performance even when most predictive variables are noise, can be used when the number of variables is much larger than the number of observations and in problems involving more than two classes, and returns measures of variable importance. Thus, it is important to understand the performance of random forest with microarray data and its possible use for gene selection.
Results
We investigate the use of random forest for classification of microarray data (including multi-class problems) and propose a new method of gene selection in classification problems based on random forest. Using simulated and nine microarray data sets we show that random forest has comparable performance to other classification methods, including DLDA, KNN, and SVM, and that the new gene selection procedure yields very small sets of genes (often smaller than alternative methods) while preserving predictive accuracy.
Conclusion
Because of its performance and features, random forest and gene selection using random forest should probably become part of the "standard tool-box" of methods for class prediction and gene selection with microarray data.
Collapse
|
921
|
Medina I, Montaner D, Tárraga J, Dopazo J. Prophet, a web-based tool for class prediction using microarray data. Bioinformatics 2006; 23:390-1. [PMID: 17138587 DOI: 10.1093/bioinformatics/btl602] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED Sample classification and class prediction is the aim of many gene expression studies. We present a web-based application, Prophet, which builds prediction rules and allows using them for further sample classification. Prophet automatically chooses the best classifier, along with the optimal selection of genes, using a strategy that renders unbiased cross-validated errors. Prophet is linked to different microarray data analysis modules, and includes a unique feature: the possibility of performing the functional interpretation of the molecular signature found. AVAILABILITY Prophet can be found at the URL http://prophet.bioinfo.cipf.es/ or within the GEPAS package at http://www.gepas.org/ SUPPLEMENTARY INFORMATION http://gepas.bioinfo.cipf.es/tutorial/prophet.html.
Collapse
Affiliation(s)
- Ignacio Medina
- Department of Bioinformatics, Centro de Investigación Príncipe Felipe, Valencia, E46013, Spain
| | | | | | | |
Collapse
|
922
|
Local Linear Logistic Classification of Microarray Data Using Orthogonal Components. KOREAN JOURNAL OF APPLIED STATISTICS 2006. [DOI: 10.5351/kjas.2006.19.3.587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
923
|
Sun Y, Goodison S, Li J, Liu L, Farmerie W. Improved breast cancer prognosis through the combination of clinical and genetic markers. ACTA ACUST UNITED AC 2006; 23:30-7. [PMID: 17130137 PMCID: PMC3431620 DOI: 10.1093/bioinformatics/btl543] [Citation(s) in RCA: 106] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
MOTIVATION Accurate prognosis of breast cancer can spare a significant number of breast cancer patients from receiving unnecessary adjuvant systemic treatment and its related expensive medical costs. Recent studies have demonstrated the potential value of gene expression signatures in assessing the risk of post-surgical disease recurrence. However, these studies all attempt to develop genetic marker-based prognostic systems to replace the existing clinical criteria, while ignoring the rich information contained in established clinical markers. Given the complexity of breast cancer prognosis, a more practical strategy would be to utilize both clinical and genetic marker information that may be complementary. METHODS A computational study is performed on publicly available microarray data, which has spawned a 70-gene prognostic signature. The recently proposed I-RELIEF algorithm is used to identify a hybrid signature through the combination of both genetic and clinical markers. A rigorous experimental protocol is used to estimate the prognostic performance of the hybrid signature and other prognostic approaches. Survival data analyses is performed to compare different prognostic approaches. RESULTS The hybrid signature performs significantly better than other methods, including the 70-gene signature, clinical makers alone and the St. Gallen consensus criterion. At the 90% sensitivity level, the hybrid signature achieves 67% specificity, as compared to 47% for the 70-gene signature and 48% for the clinical makers. The odds ratio of the hybrid signature for developing distant metastases within five years between the patients with a good prognosis signature and the patients with a bad prognosis is 21.0 (95% CI:6.5-68.3), far higher than either genetic or clinical markers alone. AVAILABILITY The breast cancer dataset is available at www.nature.com and Matlab codes are available upon request.
Collapse
Affiliation(s)
- Yijun Sun
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL 32611, USA
| | | | | | | | | |
Collapse
|
924
|
Abstract
Microarray data contains a large number of genes (usually more than 1000) and a relatively small number of samples (usually fewer than 100). This presents problems to discriminant analysis of microarray data. One way to alleviate the problem is to reduce dimensionality of data by selecting important genes to the discriminant problem. Gene selection can be cast as a feature selection problem in the context of pattern classification. Feature selection approaches are broadly grouped into filter methods and wrapper methods. The wrapper method outperforms the filter method but at the cost of more intensive computation. In the present study, we proposed a wrapper-like gene selection algorithm based on the Regularization Network. Compared with classical wrapper method, the computational costs in our gene selection algorithm is significantly reduced, because the evaluation criterion we proposed does not demand repeated training in the leave-one-out procedure.
Collapse
Affiliation(s)
- Xin Zhou
- Bioinformatics Research Centre, School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798, Singapore.
| | | |
Collapse
|
925
|
Gene expression patterns that predict sensitivity to epidermal growth factor receptor tyrosine kinase inhibitors in lung cancer cell lines and human lung tumors. BMC Genomics 2006; 7:289. [PMID: 17096850 PMCID: PMC1660550 DOI: 10.1186/1471-2164-7-289] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2006] [Accepted: 11/10/2006] [Indexed: 02/01/2023] Open
Abstract
Background Increased focus surrounds identifying patients with advanced non-small cell lung cancer (NSCLC) who will benefit from treatment with epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors (TKI). EGFR mutation, gene copy number, coexpression of ErbB proteins and ligands, and epithelial to mesenchymal transition markers all correlate with EGFR TKI sensitivity, and while prediction of sensitivity using any one of the markers does identify responders, individual markers do not encompass all potential responders due to high levels of inter-patient and inter-tumor variability. We hypothesized that a multivariate predictor of EGFR TKI sensitivity based on gene expression data would offer a clinically useful method of accounting for the increased variability inherent in predicting response to EGFR TKI and for elucidation of mechanisms of aberrant EGFR signalling. Furthermore, we anticipated that this methodology would result in improved predictions compared to single parameters alone both in vitro and in vivo. Results Gene expression data derived from cell lines that demonstrate differential sensitivity to EGFR TKI, such as erlotinib, were used to generate models for a priori prediction of response. The gene expression signature of EGFR TKI sensitivity displays significant biological relevance in lung cancer biology in that pertinent signalling molecules and downstream effector molecules are present in the signature. Diagonal linear discriminant analysis using this gene signature was highly effective in classifying out-of-sample cancer cell lines by sensitivity to EGFR inhibition, and was more accurate than classifying by mutational status alone. Using the same predictor, we classified human lung adenocarcinomas and captured the majority of tumors with high levels of EGFR activation as well as those harbouring activating mutations in the kinase domain. We have demonstrated that predictive models of EGFR TKI sensitivity can classify both out-of-sample cell lines and lung adenocarcinomas. Conclusion These data suggest that multivariate predictors of response to EGFR TKI have potential for clinical use and likely provide a robust and accurate predictor of EGFR TKI sensitivity that is not achieved with single biomarkers or clinical characteristics in non-small cell lung cancers.
Collapse
|
926
|
Kelemen JZ, Kertész-Farkas A, Kocsor A, Puskás LG. Kalman filtering for disease-state estimation from microarray data. Bioinformatics 2006; 22:3047-53. [PMID: 17065158 DOI: 10.1093/bioinformatics/btl545] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION In this paper, we propose using the Kalman filter (KF) as a pre-processing step in microarray-based molecular diagnosis. Incorporating the expression covariance between genes is important in such classification problems, since this represents the functional relationships that govern tissue state. Failing to fulfil such requirements may result in biologically implausible class prediction models. Here, we show that employing the KF to remove noise (while retaining meaningful covariance and thus being able to estimate the underlying biological state from microarray measurements) yields linearly separable data suitable for most classification algorithms. RESULTS We demonstrate the utility and performance of the KF as a robust disease-state estimator on publicly available binary and multi-class microarray datasets in combination with the most widely used classification methods to date. Moreover, using popular graphical representation schemes we show that our filtered datasets also have an improved visualization capability.
Collapse
Affiliation(s)
- János Z Kelemen
- Laboratory of Functional Genomics, Biological Research Centre, Hungarian Academy of Sciences, Szeged Temesvári krt. 62, H-6726, Hungary.
| | | | | | | |
Collapse
|
927
|
Bogaerts J, Cardoso F, Buyse M, Braga S, Loi S, Harrison JA, Bines J, Mook S, Decker N, Ravdin P, Therasse P, Rutgers E, van 't Veer LJ, Piccart M. Gene signature evaluation as a prognostic tool: challenges in the design of the MINDACT trial. ACTA ACUST UNITED AC 2006; 3:540-51. [PMID: 17019432 DOI: 10.1038/ncponc0591] [Citation(s) in RCA: 164] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2005] [Accepted: 05/02/2006] [Indexed: 11/09/2022]
Abstract
This Review describes the work conducted by the TRANSBIG consortium in the development of the MINDACT (Microarray In Node negative Disease may Avoid ChemoTherapy) trial. The goal of the trial is to provide definitive evidence regarding the clinical relevance of the 70-gene prognosis signature, and to assess the performance of this signature compared with that of traditional prognostic indicators for assigning adjuvant chemotherapy to patients with node-negative breast cancer. We outline the background work and the key questions in node-negative early-stage breast cancer, and then focus on the MINDACT trial design and statistical considerations. The challenges inherent in this trial in terms of logistics, implementation and interpretation of the results are also discussed. We hope that this article will trigger further discussion about the difficulties of setting up and analyzing trials aimed at establishing the worth of new methods for better selection of patients for cancer treatment.
Collapse
Affiliation(s)
- Jan Bogaerts
- Medical Oncology & Translational Research, Jules Bordet Institute, Boulevard de Waterloo 125, 1000 Brussels, Belgium
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
928
|
Mamtani MR, Thakre TP, Kalkonde MY, Amin MA, Kalkonde YV, Amin AP, Kulkarni H. A simple method to combine multiple molecular biomarkers for dichotomous diagnostic classification. BMC Bioinformatics 2006; 7:442. [PMID: 17032455 PMCID: PMC1618410 DOI: 10.1186/1471-2105-7-442] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2006] [Accepted: 10/10/2006] [Indexed: 11/29/2022] Open
Abstract
Background In spite of the recognized diagnostic potential of biomarkers, the quest for squelching noise and wringing in information from a given set of biomarkers continues. Here, we suggest a statistical algorithm that – assuming each molecular biomarker to be a diagnostic test – enriches the diagnostic performance of an optimized set of independent biomarkers employing established statistical techniques. We validated the proposed algorithm using several simulation datasets in addition to four publicly available real datasets that compared i) subjects having cancer with those without; ii) subjects with two different cancers; iii) subjects with two different types of one cancer; and iv) subjects with same cancer resulting in differential time to metastasis. Results Our algorithm comprises of three steps: estimating the area under the receiver operating characteristic curve for each biomarker, identifying a subset of biomarkers using linear regression and combining the chosen biomarkers using linear discriminant function analysis. Combining these established statistical methods that are available in most statistical packages, we observed that the diagnostic accuracy of our approach was 100%, 99.94%, 96.67% and 93.92% for the real datasets used in the study. These estimates were comparable to or better than the ones previously reported using alternative methods. In a synthetic dataset, we also observed that all the biomarkers chosen by our algorithm were indeed truly differentially expressed. Conclusion The proposed algorithm can be used for accurate diagnosis in the setting of dichotomous classification of disease states.
Collapse
Affiliation(s)
| | - Tushar P Thakre
- Lata Medical Research Foundation, Nagpur, India
- University of North Texas Health Science Center, Fort Worth, Texas, USA
| | | | | | | | - Amit P Amin
- Lata Medical Research Foundation, Nagpur, India
| | | |
Collapse
|
929
|
Barrier A, Roser F, Boëlle PY, Franc B, Tse C, Brault D, Lacaine F, Houry S, Callard P, Penna C, Debuire B, Flahault A, Dudoit S, Lemoine A. Prognosis of stage II colon cancer by non-neoplastic mucosa gene expression profiling. Oncogene 2006; 26:2642-8. [PMID: 17043639 DOI: 10.1038/sj.onc.1210060] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We have assessed the possibility to build a prognosis predictor (PP), based on non-neoplastic mucosa microarray gene expression measures, for stage II colon cancer patients. Non-neoplastic colonic mucosa mRNA samples from 24 patients (10 with a metachronous metastasis, 14 with no recurrence) were profiled using the Affymetrix HGU133A GeneChip. Patients were repeatedly and randomly divided into 1000 training sets (TSs) of size 16 and validation sets (VS) of size 8. For each TS/VS split, a 70-gene PP, identified on the TS by selecting the 70 most differentially expressed genes and applying diagonal linear discriminant analysis, was used to predict the prognoses of VS patients. Mean prognosis prediction performances of the 70-gene PP were 81.8% for accuracy, 73.0% for sensitivity and 87.1% for specificity. Informative genes suggested branching signal-transduction pathways with possible extensive networks between individual pathways. They also included genes coding for proteins involved in immune surveillance. In conclusion, our study suggests that one can build an accurate PP for stage II colon cancer patients, based on non-neoplastic mucosa microarray gene expression measures.
Collapse
Affiliation(s)
- A Barrier
- Service de Chirurgie Digestive, Hôpital Tenon, AP-HP, Paris, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
930
|
Tai YC, Speed TP. A multivariate empirical Bayes statistic for replicated microarray time course data. Ann Stat 2006. [DOI: 10.1214/009053606000000759] [Citation(s) in RCA: 131] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
931
|
|
932
|
Jiang D, Pei J, Ramanathan M, Lin C, Tang C, Zhang A. Mining gene–sample–time microarray data: a coherent gene cluster discovery approach. Knowl Inf Syst 2006. [DOI: 10.1007/s10115-006-0031-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
933
|
Asgharzadeh S, Pique-Regi R, Sposto R, Wang H, Yang Y, Shimada H, Matthay K, Buckley J, Ortega A, Seeger RC. Prognostic significance of gene expression profiles of metastatic neuroblastomas lacking MYCN gene amplification. J Natl Cancer Inst 2006; 98:1193-203. [PMID: 16954472 DOI: 10.1093/jnci/djj330] [Citation(s) in RCA: 177] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND The aggressiveness of metastatic neuroblastomas that lack MYCN gene amplification varies with age--they are least aggressive when diagnosed in patients younger than 12 months and most aggressive when diagnosed in patients older than 24 months. However, age at diagnosis is not always associated with patient survival. We examined whether molecular classification of metastatic neuroblastomas without MYCN gene amplification at diagnosis using gene expression profiling could improve the prediction of risk of disease progression. METHODS We used Affymetrix microarrays to determine the gene expression profiles of 102 untreated primary neuroblastomas without MYCN gene amplification obtained from children whose ages at diagnosis ranged from 0.1 to 151 months. A supervised method using diagonal linear discriminant analysis was devised to build a multigene model for predicting risk of disease progression. The accuracy of the model was evaluated using nested cross-validations, permutation analyses, and gene expression data from 15 additional tumors obtained at disease progression. RESULTS An expression profile model using 55 genes defined a tumor signature that distinguished two groups of patients from among those older than 12 months at diagnosis and clinically classified as having high-risk disease, those with a progression-free survival (PFS) rate of 16% (95% confidence interval [CI] = 8% to 28%), and those with a PFS rate of 79% (95% CI = 57% to 91%) (P<.01). These tumor signatures also identified two groups of patients with PFS of 15% (95% CI = 7% to 27%) and 69% (95% CI = 40% to 86%) (P<.01) from among patients who were older than 18 months at diagnosis. The gene expression signature of untreated molecular high-risk tumors was also present in progressively growing tumors. CONCLUSION Gene expression signatures of tumors obtained at diagnosis from patients with clinically indistinguishable high-risk, metastatic neuroblastomas identify subgroups with different outcomes. Accurate identification of these subgroups with gene expression profiles may facilitate development, implementation, and analysis of clinical trials aimed at improving outcome.
Collapse
Affiliation(s)
- Shahab Asgharzadeh
- Department of Pediatrics, Division of Hematology-Oncology, Childrens Hospital Los Angeles and Saban Research Institute, University of Southern California, Los Angeles, CA, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
934
|
Somiari SB, Somiari RI, Heckman CM, Olsen CH, Jordan RM, Russell SJ, Shriver CD. Circulating MMP2 and MMP9 in breast cancer -- potential role in classification of patients into low risk, high risk, benign disease and breast cancer categories. Int J Cancer 2006; 119:1403-11. [PMID: 16615109 DOI: 10.1002/ijc.21989] [Citation(s) in RCA: 109] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Matrix metalloproteinase (MMP) 2 and 9 are involved in cancer invasion and metastasis, and increased levels occur in serum and plasma of breast cancer (BC) patients. It is, however, unclear whether changes in serum levels can be exploited for early detection or classification of patients into different risk/disease categories. In our study, we measured concentration and activity of MMP2/9 in sera of 345 donors classified as low risk (Gail score <1.7), high risk (HR) (Gail score > or =1.7), benign disease or BC. Kruskal-Wallis and Mann-Whitney nonparametric tests showed that total-MMP2 concentration is higher in HR compared to control (p = 0.012), benign (p = 0.001) and cancer (p = 0.007). Active MMP2 (aMMP2) concentration is higher in control than benign and cancer (p < 0.001, respectively). Total and aMMP9 concentrations are higher in cancer than benign (p < 0.001, p = 0.002, respectively). Total-MMP2 and total-MMP9 activities are lower in control than benign (p < 0.001, p = 0.002, respectively) and cancer (p < 0.001, respectively). Total-MMP2 and MMP9 activities are also higher in cancer than benign (p = 0.004, p < 0.001) and HR (p = 0.008, p = 0.007, respectively). These results were not affected by age or inclusion/exclusion of donors with noninvasive cancer or atypical hyperplasia. Linear discriminant analysis revealed that HR donors are characterized by lower total-MMP2 and higher aMMP2. Overall group classification accuracy was 64.5%. Independent validation based on the leave-one-out cross validation approach gave an overall classification of 63%. Our study provides evidence supporting the potential role of serum MMP2/9 as biomarkers for breast disease classification.
Collapse
Affiliation(s)
- Stella B Somiari
- Clinical Breast Care Project, Windber Research Institute, Windber, PA 15963, USA.
| | | | | | | | | | | | | |
Collapse
|
935
|
Barrier A, Boelle PY, Roser F, Gregg J, Tse C, Brault D, Lacaine F, Houry S, Huguier M, Franc B, Flahault A, Lemoine A, Dudoit S. Stage II colon cancer prognosis prediction by tumor gene expression profiling. J Clin Oncol 2006; 24:4685-91. [PMID: 16966692 DOI: 10.1200/jco.2005.05.0229] [Citation(s) in RCA: 167] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE This study mainly aimed to identify and assess the performance of a microarray-based prognosis predictor (PP) for stage II colon cancer. A previously suggested 23-gene prognosis signature (PS) was also evaluated. PATIENTS AND METHODS Tumor mRNA samples from 50 patients were profiled using oligonucleotide microarrays. PPs were built and assessed by random divisions of patients into training and validation sets (TSs and VSs, respectively). For each TS/VS split, a 30-gene PP, identified on the TS by selecting the 30 most differentially expressed genes and applying diagonal linear discriminant analysis, was used to predict the prognoses of VS patients. Two schemes were considered: single-split validation, based on a single random split of patients into two groups of equal size (group 1 and group 2), and Monte Carlo cross validation (MCCV), whereby patients were repeatedly and randomly divided into TS and VS of various sizes. RESULTS The 30-gene PP, identified from group 1 patients, yielded an 80% prognosis prediction accuracy on group 2 patients. MCCV yielded the following average prognosis prediction performance measures: 76.3% accuracy, 85.1% sensitivity, and 67.5% specificity. Improvements in prognosis prediction were observed with increasing TS size. The 30-gene PS were found to be highly-variable across TS/VS splits. Assessed on the same random splits of patients, the previously suggested 23-gene PS yielded a 67.7% mean prognosis prediction accuracy. CONCLUSION Microarray gene expression profiling is able to predict the prognosis of stage II colon cancer patients. The present study also illustrates the usefulness of resampling techniques for honest performance assessment of microarray-based PPs.
Collapse
Affiliation(s)
- Alain Barrier
- Service de Chirurgie digestive, Hôpital Tenon, 4 rue de la Chine, 75020 Paris, France.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
936
|
Baker SG, Kramer BS. Identifying genes that contribute most to good classification in microarrays. BMC Bioinformatics 2006; 7:407. [PMID: 16959042 PMCID: PMC1574352 DOI: 10.1186/1471-2105-7-407] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2006] [Accepted: 09/07/2006] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND The goal of most microarray studies is either the identification of genes that are most differentially expressed or the creation of a good classification rule. The disadvantage of the former is that it ignores the importance of gene interactions; the disadvantage of the latter is that it often does not provide a sufficient focus for further investigation because many genes may be included by chance. Our strategy is to search for classification rules that perform well with few genes and, if they are found, identify genes that occur relatively frequently under multiple random validation (random splits into training and test samples). RESULTS We analyzed data from four published studies related to cancer. For classification we used a filter with a nearest centroid rule that is easy to implement and has been previously shown to perform well. To comprehensively measure classification performance we used receiver operating characteristic curves. In the three data sets with good classification performance, the classification rules for 5 genes were only slightly worse than for 20 or 50 genes and somewhat better than for 1 gene. In two of these data sets, one or two genes had relatively high frequencies not noticeable with rules involving 20 or 50 genes: desmin for classifying colon cancer versus normal tissue; and zyxin and secretory granule proteoglycan genes for classifying two types of leukemia. CONCLUSION Using multiple random validation, investigators should look for classification rules that perform well with few genes and select, for further study, genes with relatively high frequencies of occurrence in these classification rules.
Collapse
Affiliation(s)
- Stuart G Baker
- Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, MD 20892-7354, USA
| | - Barnett S Kramer
- Office of Disease Prevention, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
937
|
Faith J, Mintram R, Angelova M. Targeted projection pursuit for visualizing gene expression data classifications. Bioinformatics 2006; 22:2667-73. [PMID: 16954139 DOI: 10.1093/bioinformatics/btl463] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED We present a novel method for finding low-dimensional views of high-dimensional data: Targeted Projection Pursuit. The method proceeds by finding projections of the data that best approximate a target view. Two versions of the method are introduced; one version based on Procrustes analysis and one based on an artificial neural network. These versions are capable of finding orthogonal or non-orthogonal projections, respectively. The method is quantitatively and qualitatively compared with other dimension reduction techniques. It is shown to find 2D views that display the classification of cancers from gene expression data with a visual separation equal to, or better than, existing dimension reduction techniques. AVAILABILITY source code, additional diagrams, and original data are available from http://computing.unn.ac.uk/staff/CGJF1/tpp/bioinf.html
Collapse
Affiliation(s)
- Joe Faith
- Northumbria University Newcastle, UK.
| | | | | |
Collapse
|
938
|
|
939
|
Guillot G, Olsson M, Benson M, Rudemo M. Discrimination and scoring using small sets of genes for two-sample microarray data. Math Biosci 2006; 205:195-203. [PMID: 17087979 DOI: 10.1016/j.mbs.2006.08.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2006] [Revised: 06/01/2006] [Accepted: 08/07/2006] [Indexed: 10/24/2022]
Abstract
Comparison of gene expression for two groups of individuals form an important subclass of microarray experiments. We study multivariate procedures, in particular use of Hotelling's T2 for discrimination between the groups with a special emphasis on methods based on few genes only. We apply the methods to data from an experiment with a group of atopic dermatitis patients compared with a control group. We also compare our methodology to other recently proposed methods on publicly available datasets. It is found that (i) use of several genes gives a much improved discrimination of the groups as compared to one gene only, (ii) the genes that play the most important role in the multivariate analysis are not necessarily those that rank first in univariate comparisons of the groups, (iii) Linear Discriminant Analysis carried out with sets of 2-5 genes selected according to their Hotelling T2 give results comparable to state-of-the-art methods using many more genes, a feature of our method which might be crucial in clinical applications. Finding groups of genes that together give optimal multivariate discrimination (given the size of the group) can identify crucial pathways and networks of genes responsible for a disease. The computer code that we developed to make computations is available as an R package.
Collapse
|
940
|
Shen R, Ghosh D, Chinnaiyan A, Meng Z. Eigengene-based linear discriminant model for tumor classification using gene expression microarray data. Bioinformatics 2006; 22:2635-42. [PMID: 16926220 DOI: 10.1093/bioinformatics/btl442] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION The nearest shrunken centroids classifier has become a popular algorithm in tumor classification problems using gene expression microarray data. Feature selection is an embedded part of the method to select top-ranking genes based on a univariate distance statistic calculated for each gene individually. The univariate statistics summarize gene expression profiles outside of the gene co-regulation network context, leading to redundant information being included in the selection procedure. RESULTS We propose an Eigengene-based Linear Discriminant Analysis (ELDA) to address gene selection in a multivariate framework. The algorithm uses a modified rotated Spectral Decomposition (SpD) technique to select 'hub' genes that associate with the most important eigenvectors. Using three benchmark cancer microarray datasets, we show that ELDA selects the most characteristic genes, leading to substantially smaller classifiers than the univariate feature selection based analogues. The resulting de-correlated expression profiles make the gene-wise independence assumption more realistic and applicable for the shrunken centroids classifier and other diagonal linear discriminant type of models. Our algorithm further incorporates a misclassification cost matrix, allowing differential penalization of one type of error over another. In the breast cancer data, we show false negative prognosis can be controlled via a cost-adjusted discriminant function. AVAILABILITY R code for the ELDA algorithm is available from author upon request.
Collapse
Affiliation(s)
- Ronglai Shen
- Department of Biostatistics, University of Michigan Ann Arbor, MI 48109-0602, USA
| | | | | | | |
Collapse
|
941
|
Collins CD, Purohit S, Podolsky RH, Zhao HS, Schatz D, Eckenrode SE, Yang P, Hopkins D, Muir A, Hoffman M, McIndoe RA, Rewers M, She JX. The application of genomic and proteomic technologies in predictive, preventive and personalized medicine. Vascul Pharmacol 2006; 45:258-67. [PMID: 17030152 DOI: 10.1016/j.vph.2006.08.003] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2006] [Revised: 08/05/2006] [Accepted: 08/05/2006] [Indexed: 11/17/2022]
Abstract
The long asymptomatic period before the onset of chronic diseases offers good opportunities for disease prevention. Indeed, many chronic diseases may be preventable by avoiding those factors that trigger the disease process (primary prevention) or by use of therapy that modulates the disease process before the onset of clinical symptoms (secondary prevention). Accurate prediction is vital for disease prevention so that therapy can be given to those individuals who are most likely to develop the disease. The utility of predictive markers is dependent on three parameters, which must be carefully assessed: sensitivity, specificity and positive predictive value. Specificity is important if a biomarker is to be used to identify individuals either for counseling or for preventive therapy. However, a reciprocal relationship exists between sensitivity and specificity. Thus, successful biomarkers will be highly specific without sacrificing sensitivity. Unfortunately, biomarkers with ideal specificity and sensitivity are difficult to find for many diseases. One potential solution is to use the combinatorial power of a large number of biomarkers, each of which alone may not offer satisfactory specificity and sensitivity. Recent technological advances in genetics, genomics, proteomics, and bioinformatics offer a great opportunity for biomarker discovery. The newly identified biomarkers have the potential to bring increased accuracy in disease diagnosis and classification, as well as therapeutic monitoring. In this review, we will use type 1 diabetes (T1D) as an example, when appropriate, to discuss pertinent issues related to high throughput biomarker discovery.
Collapse
Affiliation(s)
- C D Collins
- Center for Biotechnology and Genomic Medicine, Medical College of Georgia, 1120 15th Street, CA4124, Augusta, GA 30912-2400, United States
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
942
|
Zhou X, Mao KZ. The ties problem resulting from counting-based error estimators and its impact on gene selection algorithms. Bioinformatics 2006; 22:2507-15. [PMID: 16908500 DOI: 10.1093/bioinformatics/btl438] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Feature selection approaches, such as filter and wrapper, have been applied to address the gene selection problem in the literature of microarray data analysis. In wrapper methods, the classification error is usually used as the evaluation criterion of feature subsets. Due to the nature of high dimensionality and small sample size of microarray data, however, counting-based error estimation may not necessarily be an ideal criterion for gene selection problem. RESULTS Our study reveals that evaluating genes in terms of counting-based error estimators such as resubstitution error, leave-one-out error, cross-validation error and bootstrap error may encounter severe ties problem, i.e. two or more gene subsets score equally, and this in turn results in uncertainty in gene selection. Our analysis finds that the ties problem is caused by the discrete nature of counting-based error estimators and could be avoided by using continuous evaluation criteria instead. Experiment results show that continuous evaluation criteria such as generalised the absolute value of w2 measure for support vector machines and modified Relief's measure for k-nearest neighbors produce improved gene selection compared with counting-based error estimators. AVAILABILITY The companion website is at http://www.ntu.edu.sg/home5/pg02776030/wrappers/ The website contains (1) the source code of all the gene selection algorithms and (2) the complete set of tables and figures of experiments.
Collapse
Affiliation(s)
- Xin Zhou
- School of Electrical & Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798, Singapore
| | | |
Collapse
|
943
|
Jiang R, Mircean C, Shmulevich I, Cogdell D, Jia Y, Tabus I, Aldape K, Sawaya R, Bruner JM, Fuller GN, Zhang W. Pathway alterations during glioma progression revealed by reverse phase protein lysate arrays. Proteomics 2006; 6:2964-71. [PMID: 16619307 DOI: 10.1002/pmic.200500555] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The progression of gliomas has been extensively studied at the genomic level using cDNA microarrays. However, systematic examinations at the protein translational and post-translational levels are far more limited. We constructed a glioma protein lysate array from 82 different primary glioma tissues, and surveyed the expression and phosphorylation of 46 different proteins involved in signaling pathways of cell proliferation, cell survival, apoptosis, angiogenesis, and cell invasion. An analysis algorithm was employed to robustly estimate the protein expressions in these samples. When ranked by their discriminating power to separate 37 glioblastomas (high-grade gliomas) from 45 lower-grade gliomas, the following 12 proteins were identified as the most powerful discriminators: IBalpha, EGFRpTyr845, AKTpThr308, phosphatidylinositol 3-kinase (PI3K), BadpSer136, insulin-like growth factor binding protein (IGFBP) 2, IGFBP5, matrix metalloproteinase 9 (MMP9), vascular endothelial growth factor (VEGF), phosphorylated retinoblastoma protein (pRB), Bcl-2, and c-Abl. Clustering analysis showed a close link between PI3K and AKTpThr308, IGFBP5 and IGFBP2, and IBalpha and EGFRpTyr845. Another cluster includes MMP9, Bcl-2, VEGF, and pRB. These clustering patterns may suggest functional relationships, which warrant further investigation. The marked association of phosphorylation of AKT at Thr308, but not Ser473, with glioblastoma suggests a specific event of PI3K pathway activation in glioma progression.
Collapse
Affiliation(s)
- Rongcai Jiang
- Department of Pathology, The University of Texas M. D. Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, TX 77030, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
944
|
|
945
|
Identifying disease feature genes based on cellular localized gene functional modules and regulation networks. CHINESE SCIENCE BULLETIN-CHINESE 2006. [DOI: 10.1007/s11434-006-2067-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
946
|
Abstract
The study of gene expression profiling of cells and tissue has become a major tool for discovery in medicine. Microarray experiments allow description of genome-wide expression changes in health and disease. The results of such experiments are expected to change the methods employed in the diagnosis and prognosis of disease in obstetrics and gynecology. Moreover, an unbiased and systematic study of gene expression profiling should allow the establishment of a new taxonomy of disease for obstetric and gynecologic syndromes. Thus, a new era is emerging in which reproductive processes and disorders could be characterized using molecular tools and fingerprinting. The design, analysis, and interpretation of microarray experiments require specialized knowledge that is not part of the standard curriculum of our discipline. This article describes the types of studies that can be conducted with microarray experiments (class comparison, class prediction, class discovery). We discuss key issues pertaining to experimental design, data preprocessing, and gene selection methods. Common types of data representation are illustrated. Potential pitfalls in the interpretation of microarray experiments, as well as the strengths and limitations of this technology, are highlighted. This article is intended to assist clinicians in appraising the quality of the scientific evidence now reported in the obstetric and gynecologic literature.
Collapse
Affiliation(s)
- Adi L. Tarca
- Perinatology Research Branch, National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, and Detroit, MI
- Department of Computer Science, Wayne State University
| | - Roberto Romero
- Perinatology Research Branch, National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, and Detroit, MI
- Center for Molecular Medicine and Genetics, Wayne State University
| | - Sorin Draghici
- Department of Computer Science, Wayne State University
- Karmanos Cancer Institute, Detroit, MI
| |
Collapse
|
947
|
Simon R, Wang SJ. Use of genomic signatures in therapeutics development in oncology and other diseases. THE PHARMACOGENOMICS JOURNAL 2006; 6:166-73. [PMID: 16415922 DOI: 10.1038/sj.tpj.6500349] [Citation(s) in RCA: 116] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Pharmacogenomics is the science of determining how the benefits and adverse effects of a drug vary among a target population of patients based on genomic features of the patient's germ line and diseased tissue. By identifying those patients who are most likely to respond while eliminating serious adverse effects, the therapeutic index of a drug can be substantially increased. This may facilitate demonstrating the effectiveness of the drug and may avoid subsequent problems due to serious adverse events. Our objective here is to provide clinical trial designs and analysis strategies for the utilization of genomic signatures as classifiers for patient stratification or patient selection in therapeutics development. We review methods for the development of genomic signature classifiers of treatment outcome in high-dimensional settings, where the number of variables available for prediction far exceeds the number of cases. The split-sample and cross-validation methods for obtaining estimates of prediction accuracy in developmental studies are described. We present clinical trial designs for utilizing genomic signature classifiers in therapeutics development. The purpose of the classifier is to facilitate the identification of groups of patients with a high probability of benefiting from it and avoiding serious adverse events. We distinguish exploratory analysis during the development of the genomic classifier from prospective planning and rigorous testing of therapeutic hypotheses in studies that utilize the genomic classifier in therapeutics development. We discuss a variety of clinical trial designs including those utilizing specimen collection and assay prospectively for newly accrued patients and those involving a prospectively planned analysis of archived specimens from a previously conducted clinical trial. Our discussion of the development and use of classifiers of efficacy is mostly focused on applications in oncology using classifiers based on biomarkers measured in tumors. Some of the same considerations apply, however, to development of efficacy and safety classifiers in nononcologic diseases based on single-nucleotide germline polymorphisms.
Collapse
Affiliation(s)
- R Simon
- Biometric Research Branch, Division of Cancer Treatment & Diagnosis, National Cancer Institute, Bethesda, MD 20892-7434, USA.
| | | |
Collapse
|
948
|
Davis CA, Gerick F, Hintermair V, Friedel CC, Fundel K, Küffner R, Zimmer R. Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics 2006; 22:2356-63. [PMID: 16882647 DOI: 10.1093/bioinformatics/btl400] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Two important questions for the analysis of gene expression measurements from different sample classes are (1) how to classify samples and (2) how to identify meaningful gene signatures (ranked gene lists) exhibiting the differences between classes and sample subsets. Solutions to both questions have immediate biological and biomedical applications. To achieve optimal classification performance, a suitable combination of classifier and gene selection method needs to be specifically selected for a given dataset. The selected gene signatures can be unstable and the resulting classification accuracy unreliable, particularly when considering different subsets of samples. Both unstable gene signatures and overestimated classification accuracy can impair biological conclusions. METHODS We address these two issues by repeatedly evaluating the classification performance of all models, i.e. pairwise combinations of various gene selection and classification methods, for random subsets of arrays (sampling). A model score is used to select the most appropriate model for the given dataset. Consensus gene signatures are constructed by extracting those genes frequently selected over many samplings. Sampling additionally permits measurement of the stability of the classification performance for each model, which serves as a measure of model reliability. RESULTS We analyzed a large gene expression dataset with 78 measurements of four different cartilage sample classes. Classifiers trained on subsets of measurements frequently produce models with highly variable performance. Our approach provides reliable classification performance estimates via sampling. In addition to reliable classification performance, we determined stable consensus signatures (i.e. gene lists) for sample classes. Manual literature screening showed that these genes are highly relevant to our gene expression experiment with osteoarthritic cartilage. We compared our approach to others based on a publicly available dataset on breast cancer. AVAILABILITY R package at http://www.bio.ifi.lmu.de/~davis/edaprakt
Collapse
Affiliation(s)
- Chad A Davis
- Institute of Informatics, Ludwig-Maximilians-Universität München, Amalienstrasse 17 80333 Munich, Germany
| | | | | | | | | | | | | |
Collapse
|
949
|
Sievertzon M, Nilsson P, Lundeberg J. Improving reliability and performance of DNA microarrays. Expert Rev Mol Diagn 2006; 6:481-92. [PMID: 16706748 DOI: 10.1586/14737159.6.3.481] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
A great many platforms and versions of the microarray technology, with different characteristics and applications, have been developed. This review will describe some key issues in reliability and performance with the two most commonly used platforms for gene expression analysis, in situ-synthesized oligonucleotide microarrays or GeneChips and spotted microarrays. Some recent advances and new applications within the field will be mentioned briefly.
Collapse
Affiliation(s)
- Maria Sievertzon
- Royal Institute of Technology, AlbaNova University Center, KTH Genome Center, Department of Biotechnology, S-106 91 Stockholm, Sweden.
| | | | | |
Collapse
|
950
|
Verhaak RGW, Sanders MA, Bijl MA, Delwel R, Horsman S, Moorhouse MJ, van der Spek PJ, Löwenberg B, Valk PJM. HeatMapper: powerful combined visualization of gene expression profile correlations, genotypes, phenotypes and sample characteristics. BMC Bioinformatics 2006; 7:337. [PMID: 16836741 PMCID: PMC1574351 DOI: 10.1186/1471-2105-7-337] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2006] [Accepted: 07/12/2006] [Indexed: 11/10/2022] Open
Abstract
Background Accurate interpretation of data obtained by unsupervised analysis of large scale expression profiling studies is currently frequently performed by visually combining sample-gene heatmaps and sample characteristics. This method is not optimal for comparing individual samples or groups of samples. Here, we describe an approach to visually integrate the results of unsupervised and supervised cluster analysis using a correlation plot and additional sample metadata. Results We have developed a tool called the HeatMapper that provides such visualizations in a dynamic and flexible manner and is available from . Conclusion The HeatMapper allows an accessible and comprehensive visualization of the results of gene expression profiling and cluster analysis.
Collapse
Affiliation(s)
- Roel GW Verhaak
- Department of Hematology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Mathijs A Sanders
- Department of Hematology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Maarten A Bijl
- Department of Hematology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Ruud Delwel
- Department of Hematology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Sebastiaan Horsman
- Department of Bioinformatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Michael J Moorhouse
- Department of Bioinformatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Peter J van der Spek
- Department of Bioinformatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | | | - Peter JM Valk
- Department of Hematology, Erasmus University Medical Center, Rotterdam, The Netherlands
| |
Collapse
|