Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Dudoit S, Fridlyand J, Speed TP. Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. J Am Stat Assoc 2002. [DOI: 10.1198/016214502753479248] [Citation(s) in RCA: 1691] [Impact Index Per Article: 73.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

For:	Dudoit S, Fridlyand J, Speed TP. Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. J Am Stat Assoc 2002. [DOI: 10.1198/016214502753479248] [Citation(s) in RCA: 1691] [Impact Index Per Article: 73.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Number

Cited by Other Article(s)

451

Improved shrunken centroid classifiers for high-dimensional class-imbalanced data. BMC Bioinformatics 2013;14:64. [PMID: 23433084 PMCID: PMC3687811 DOI: 10.1186/1471-2105-14-64] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Accepted: 01/31/2013] [Indexed: 11/21/2022] Open

452

Hajiloo M, Sapkota Y, Mackey JR, Robson P, Greiner R, Damaraju S. ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction. BMC Bioinformatics 2013;14:61. [PMID: 23432980 PMCID: PMC3618021 DOI: 10.1186/1471-2105-14-61] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 02/14/2013] [Indexed: 01/09/2023] Open

Abstract

BACKGROUND

Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case-control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Methods such as self-declared ancestry, ancestry informative markers, genomic control, structured association, and principal component analysis are used to assess and correct population stratification but each has limitations. We provide an alternative technique to address population stratification.

RESULTS

We propose a novel machine learning method, ETHNOPRED, which uses the genotype and ethnicity data from the HapMap project to learn ensembles of disjoint decision trees, capable of accurately predicting an individual's continental and sub-continental ancestry. To predict an individual's continental ancestry, ETHNOPRED produced an ensemble of 3 decision trees involving a total of 10 SNPs, with 10-fold cross validation accuracy of 100% using HapMap II dataset. We extended this model to involve 29 disjoint decision trees over 149 SNPs, and showed that this ensemble has an accuracy of ≥ 99.9%, even if some of those 149 SNP values were missing. On an independent dataset, predominantly of Caucasian origin, our continental classifier showed 96.8% accuracy and improved genomic control's λ from 1.22 to 1.11. We next used the HapMap III dataset to learn classifiers to distinguish European subpopulations (North-Western vs. Southern), East Asian subpopulations (Chinese vs. Japanese), African subpopulations (Eastern vs. Western), North American subpopulations (European vs. Chinese vs. African vs. Mexican vs. Indian), and Kenyan subpopulations (Luhya vs. Maasai). In these cases, ETHNOPRED produced ensembles of 3, 39, 21, 11, and 25 disjoint decision trees, respectively involving 31, 502, 526, 242 and 271 SNPs, with 10-fold cross validation accuracy of 86.5% ± 2.4%, 95.6% ± 3.9%, 95.6% ± 2.1%, 98.3% ± 2.0%, and 95.9% ± 1.5%. However, ETHNOPRED was unable to produce a classifier that can accurately distinguish Chinese in Beijing vs. Chinese in Denver.

CONCLUSIONS

ETHNOPRED is a novel technique for producing classifiers that can identify an individual's continental and sub-continental heritage, based on a small number of SNPs. We show that its learned classifiers are simple, cost-efficient, accurate, transparent, flexible, fast, applicable to large scale GWASs, and robust to missing values.

Collapse

453

Ulfenborg B, Klinga-Levan K, Olsson B. Classification of tumor samples from expression data using decision trunks. Cancer Inform 2013;12:53-66. [PMID: 23467331 PMCID: PMC3579425 DOI: 10.4137/cin.s10356] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open

454

An extension of PPLS-DA for classification and comparison to ordinary PLS-DA. PLoS One 2013;8:e55267. [PMID: 23408965 PMCID: PMC3569448 DOI: 10.1371/journal.pone.0055267] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2012] [Accepted: 12/27/2012] [Indexed: 11/19/2022] Open

455

Telaar A, Repsilber D, Nürnberg G. Biomarker discovery: classification using pooled samples. Comput Stat 2013. [DOI: 10.1007/s00180-011-0302-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

456

Papillon-Cavanagh S, De Jay N, Hachem N, Olsen C, Bontempi G, Aerts HJWL, Quackenbush J, Haibe-Kains B. Comparison and validation of genomic predictors for anticancer drug sensitivity. J Am Med Inform Assoc 2013;20:597-602. [PMID: 23355484 DOI: 10.1136/amiajnl-2012-001442] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open

457

Gadegbeku CA, Gipson DS, Holzman LB, Ojo AO, Song PXK, Barisoni L, Sampson MG, Kopp JB, Lemley KV, Nelson PJ, Lienczewski CC, Adler SG, Appel GB, Cattran DC, Choi MJ, Contreras G, Dell KM, Fervenza FC, Gibson KL, Greenbaum LA, Hernandez JD, Hewitt SM, Hingorani SR, Hladunewich M, Hogan MC, Hogan SL, Kaskel FJ, Lieske JC, Meyers KEC, Nachman PH, Nast CC, Neu AM, Reich HN, Sedor JR, Sethna CB, Trachtman H, Tuttle KR, Zhdanova O, Zilleruelo GE, Kretzler M. Design of the Nephrotic Syndrome Study Network (NEPTUNE) to evaluate primary glomerular nephropathy by a multidisciplinary approach. Kidney Int 2013;83:749-56. [PMID: 23325076 PMCID: PMC3612359 DOI: 10.1038/ki.2012.428] [Citation(s) in RCA: 255] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]

458

Song L, Langfelder P, Horvath S. Random generalized linear model: a highly accurate and interpretable ensemble predictor. BMC Bioinformatics 2013;14:5. [PMID: 23323760 PMCID: PMC3645958 DOI: 10.1186/1471-2105-14-5] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2012] [Accepted: 01/03/2013] [Indexed: 01/13/2023] Open

Abstract

BACKGROUND

Ensemble predictors such as the random forest are known to have superior accuracy but their black-box predictions are difficult to interpret. In contrast, a generalized linear model (GLM) is very interpretable especially when forward feature selection is used to construct the model. However, forward feature selection tends to overfit the data and leads to low predictive accuracy. Therefore, it remains an important research goal to combine the advantages of ensemble predictors (high accuracy) with the advantages of forward regression modeling (interpretability). To address this goal several articles have explored GLM based ensemble predictors. Since limited evaluations suggested that these ensemble predictors were less accurate than alternative predictors, they have found little attention in the literature.

RESULTS

Comprehensive evaluations involving hundreds of genomic data sets, the UCI machine learning benchmark data, and simulations are used to give GLM based ensemble predictors a new and careful look. A novel bootstrap aggregated (bagged) GLM predictor that incorporates several elements of randomness and instability (random subspace method, optional interaction terms, forward variable selection) often outperforms a host of alternative prediction methods including random forests and penalized regression models (ridge regression, elastic net, lasso). This random generalized linear model (RGLM) predictor provides variable importance measures that can be used to define a "thinned" ensemble predictor (involving few features) that retains excellent predictive accuracy.

CONCLUSION

RGLM is a state of the art predictor that shares the advantages of a random forest (excellent predictive accuracy, feature importance measures, out-of-bag estimates of accuracy) with those of a forward selected generalized linear model (interpretability). These methods are implemented in the freely available R software package randomGLM.

Collapse

459

Rajapakse JC, Mundra PA. Multiclass gene selection using Pareto-fronts. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013;10:87-97. [PMID: 23702546 DOI: 10.1109/tcbb.2013.1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]

460

Wagaman A. Efficientk-NN graph construction for graphs on variables. Stat Anal Data Min 2013. [DOI: 10.1002/sam.11186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

461

Discriminant and Class-Modelling Chemometric Techniques for Food PDO Verification. FOOD PROTECTED DESIGNATION OF ORIGIN - METHODOLOGIES AND APPLICATIONS 2013. [DOI: 10.1016/b978-0-444-59562-1.00013-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]

462

Lee YD, Cook D, Park JW, Lee EK. PPtree: Projection pursuit classification tree. Electron J Stat 2013. [DOI: 10.1214/13-ejs810] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

463

Global Top-Scoring Pair Decision Tree for Gene Expression Data Analysis. ACTA ACUST UNITED AC 2013. [DOI: 10.1007/978-3-642-37207-0_20] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

464

Alonso-Betanzos A, Bolón-Canedo V, Fernández-Francos D, Porto-Díaz I, Sánchez-Maroño N. Up-to-Date Feature Selection Methods for Scalable and Efficient Machine Learning. EFFICIENCY AND SCALABILITY METHODS FOR COMPUTATIONAL INTELLECT 2013:1-26. [DOI: 10.4018/978-1-4666-3942-3.ch001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2025]

465

Wang T, Zhu L. Sparse sufficient dimension reduction using optimal scoring. Comput Stat Data Anal 2013. [DOI: 10.1016/j.csda.2012.06.015] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

466

Improving clustering with pairwise constraints: a discriminative approach. Knowl Inf Syst 2012. [DOI: 10.1007/s10115-012-0592-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

467

Wang Y, Zhou Y, Li Y, Ling Z, Zhu Y, Guo X, Sun H. An improved dimensionality reduction method for meta-transcriptome indexing based diseases classification. BMC SYSTEMS BIOLOGY 2012;6 Suppl 3:S12. [PMID: 23281712 PMCID: PMC3524076 DOI: 10.1186/1752-0509-6-s3-s12] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

468

Fan Y, Tang CY. Tuning parameter selection in high dimensional penalized likelihood. J R Stat Soc Series B Stat Methodol 2012. [DOI: 10.1111/rssb.12001] [Citation(s) in RCA: 157] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

469

Chen CK. The classification of cancer stage microarray data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2012;108:1070-1077. [PMID: 22925656 DOI: 10.1016/j.cmpb.2012.07.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Revised: 06/20/2012] [Accepted: 07/17/2012] [Indexed: 06/01/2023]

470

A model selection criterion for discriminant analysis of high-dimensional data with fewer observations. J Stat Plan Inference 2012. [DOI: 10.1016/j.jspi.2012.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

471

Hochrein J, Klein MS, Zacharias HU, Li J, Wijffels G, Schirra HJ, Spang R, Oefner PJ, Gronwald W. Performance Evaluation of Algorithms for the Classification of Metabolic 1H NMR Fingerprints. J Proteome Res 2012;11:6242-51. [DOI: 10.1021/pr3009034] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]

472

Sanz-Pamplona R, Berenguer A, Cordero D, Riccadonna S, Solé X, Crous-Bou M, Guinó E, Sanjuan X, Biondo S, Soriano A, Jurman G, Capella G, Furlanello C, Moreno V. Clinical value of prognosis gene expression signatures in colorectal cancer: a systematic review. PLoS One 2012;7:e48877. [PMID: 23145004 PMCID: PMC3492249 DOI: 10.1371/journal.pone.0048877] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2012] [Accepted: 10/02/2012] [Indexed: 12/01/2022] Open

Abstract

Introduction

The traditional staging system is inadequate to identify those patients with stage II colorectal cancer (CRC) at high risk of recurrence or with stage III CRC at low risk. A number of gene expression signatures to predict CRC prognosis have been proposed, but none is routinely used in the clinic. The aim of this work was to assess the prediction ability and potential clinical usefulness of these signatures in a series of independent datasets.

Methods

A literature review identified 31 gene expression signatures that used gene expression data to predict prognosis in CRC tissue. The search was based on the PubMed database and was restricted to papers published from January 2004 to December 2011. Eleven CRC gene expression datasets with outcome information were identified and downloaded from public repositories. Random Forest classifier was used to build predictors from the gene lists. Matthews correlation coefficient was chosen as a measure of classification accuracy and its associated p-value was used to assess association with prognosis. For clinical usefulness evaluation, positive and negative post-tests probabilities were computed in stage II and III samples.

Results

Five gene signatures showed significant association with prognosis and provided reasonable prediction accuracy in their own training datasets. Nevertheless, all signatures showed low reproducibility in independent data. Stratified analyses by stage or microsatellite instability status showed significant association but limited discrimination ability, especially in stage II tumors. From a clinical perspective, the most predictive signatures showed a minor but significant improvement over the classical staging system.

Conclusions

The published signatures show low prediction accuracy but moderate clinical usefulness. Although gene expression data may inform prognosis, better strategies for signature validation are needed to encourage their widespread use in the clinic.

Collapse

473

Srivastava MS, Reid N. Testing the structure of the covariance matrix with fewer observations than the dimension. J MULTIVARIATE ANAL 2012. [DOI: 10.1016/j.jmva.2012.06.004] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

474

Kandaswamy KK, Pugalenthi G, Kalies KU, Hartmann E, Martinetz T. EcmPred: prediction of extracellular matrix proteins based on random forest with maximum relevance minimum redundancy feature selection. J Theor Biol 2012;317:377-83. [PMID: 23123454 DOI: 10.1016/j.jtbi.2012.10.015] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2012] [Revised: 10/08/2012] [Accepted: 10/09/2012] [Indexed: 12/11/2022]

475

Wu MY, Dai DQ, Shi Y, Yan H, Zhang XF. Biomarker identification and cancer classification based on microarray data using Laplace naive Bayes model with mean shrinkage. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012;9:1649-1662. [PMID: 22868679 DOI: 10.1109/tcbb.2012.105] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]

476

Ameling S, Herda LR, Hammer E, Steil L, Teumer A, Trimpert C, Dörr M, Kroemer HK, Klingel K, Kandolf R, Völker U, Felix SB. Myocardial gene expression profiles and cardiodepressant autoantibodies predict response of patients with dilated cardiomyopathy to immunoadsorption therapy. Eur Heart J 2012;34:666-75. [PMID: 23100283 PMCID: PMC3584995 DOI: 10.1093/eurheartj/ehs330] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Abstract

Aims

Immunoadsorption with subsequent immunoglobulin G substitution (IA/IgG) represents a novel therapeutic approach in the treatment of dilated cardiomyopathy (DCM) which leads to the improvement of left ventricular ejection fraction (LVEF). However, response to this therapeutic intervention shows wide inter-individual variability. In this pilot study, we tested the value of clinical, biochemical, and molecular parameters for the prediction of the response of patients with DCM to IA/IgG.

Methods and results

Forty DCM patients underwent endomyocardial biopsies (EMBs) before IA/IgG. In eight patients with normal LVEF (controls), EMBs were obtained for clinical reasons. Clinical parameters, negative inotropic activity (NIA) of antibodies on isolated rat cardiomyocytes, and gene expression profiles of EMBs were analysed. Dilated cardiomyopathy patients displaying improvement of LVEF (≥20 relative and ≥5% absolute) 6 months after IA/IgG were considered responders. Compared with non-responders (n = 16), responders (n = 24) displayed shorter disease duration (P = 0.006), smaller LV internal diameter in diastole (P = 0.019), and stronger NIA of antibodies. Antibodies obtained from controls were devoid of NIA. Myocardial gene expression patterns were different in responders and non-responders for genes of oxidative phosphorylation, mitochondrial dysfunction, hypertrophy, and ubiquitin–proteasome pathway. The integration of scores of NIA and expression levels of four genes allowed robust discrimination of responders from non-responders at baseline (BL) [sensitivity of 100% (95% CI 85.8–100%); specificity up to 100% (95% CI 79.4–100%); cut-off value: −0.28] and was superior to scores derived from antibodies, gene expression, or clinical parameters only.

Conclusion

Combined assessment of NIA of antibodies and gene expression patterns of DCM patients at BL predicts response to IA/IgG therapy and may enable appropriate selection of patients who benefit from this therapeutic intervention.

Collapse

477

Computational gene mapping to analyze continuous automated physiologic monitoring data in neuro-trauma intensive care. J Trauma Acute Care Surg 2012;73:419-24; discussion 424-5. [PMID: 22846949 DOI: 10.1097/ta.0b013e31825ff59a] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Abstract

BACKGROUND

We asked whether the advanced machine learning applications used in microarray gene profiling could assess critical thresholds in the massive databases generated by continuous electronic physiologic vital signs (VS) monitoring in the neuro-trauma intensive care unit.

METHODS

We used Class Prediction Analysis to predict binary outcomes (life/death, good/bad Extended Glasgow Outcome Score, etc.) based on data accrued within 12, 24, 48, and 72 hours after admission to the neuro-trauma intensive care unit. Univariate analyses selected "features," discriminator VS segments or "genes," in each individual's data set. Prediction models using these selected features were then constructed using six different statistical modeling techniques to predict outcome for other individuals in the sample cohort based on the selected features of each individual then cross-validated with a leave-one-out method.

RESULTS

We gleaned complete sets of 588 VS monitoring segment features for each of four periods and outcomes from 52 of 60 patients with severe traumatic brain injury who met study inclusion criteria. Overall, intracranial pressures and blood pressures over time (e.g., intracranial pressure >20 mm Hg for 20 minutes) provided the best discrimination for outcomes. Modeling performed best in the first 12 hours of care and for mortality. The mean number of selected features included 76 predicting 14-day hospital stay in that period, 11 predicting mortality, and 4 predicting 3-month Extended Glasgow Outcome Score. Four of the six techniques constructed models that correctly identified mortality by 12 hours 75% of the time or higher.

CONCLUSION

Our results suggest that valid prediction models after severe traumatic brain injury can be constructed using gene mapping techniques to analyze large data sets from conventional electronic monitoring data, but that this methodology needs validation in larger data sets, and that additional unstructured learning techniques may also prove useful.

Collapse

478

Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data. BMC Bioinformatics 2012;13:270. [PMID: 23075381 PMCID: PMC3542193 DOI: 10.1186/1471-2105-13-270] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2012] [Accepted: 09/18/2012] [Indexed: 01/19/2023] Open

479

Miguéis VL, Van den Poel D, Camanho AS, Falcão e Cunha J. Predicting partial customer churn using Markov for discrimination for modeling first purchase sequences. ADV DATA ANAL CLASSI 2012. [DOI: 10.1007/s11634-012-0121-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

480

Use of gene expression data for predicting continuous phenotypes for animal production and breeding. Animal 2012;2:1413-20. [PMID: 22443898 DOI: 10.1017/s1751731108002632] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

481

Nikolovski N, Rubtsov D, Segura MP, Miles GP, Stevens TJ, Dunkley TP, Munro S, Lilley KS, Dupree P. Putative glycosyltransferases and other plant Golgi apparatus proteins are revealed by LOPIT proteomics. PLANT PHYSIOLOGY 2012;160:1037-51. [PMID: 22923678 PMCID: PMC3461528 DOI: 10.1104/pp.112.204263] [Citation(s) in RCA: 125] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2012] [Accepted: 08/22/2012] [Indexed: 05/18/2023]

482

Yang J, Miescke K, McCullagh P. Classification based on a permanental process with cyclic approximation. Biometrika 2012. [DOI: 10.1093/biomet/ass047] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

483

Modelling Forest α-Diversity and Floristic Composition — On the Added Value of LiDAR plus Hyperspectral Remote Sensing. REMOTE SENSING 2012. [DOI: 10.3390/rs4092818] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

484

Jiang W, Chen BE. Estimating prediction error in microarray classification: Modifications of the 0.632+ bootstrap when ${\bf n} < {\bf p}$. CAN J STAT 2012. [DOI: 10.1002/cjs.11158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

485

Optimal gene subset selection using the modified SFFS algorithm for tumor classification. Neural Comput Appl 2012. [DOI: 10.1007/s00521-012-1148-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

486

Hanczar B, Bar-Hen A. A new measure of classifier performance for gene expression data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012;9:1379-1386. [PMID: 22291161 DOI: 10.1109/tcbb.2012.21] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

487

Liu S, Mundra PA, Rajapakse JC. Features for cells and nuclei classification. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2012;2011:6601-4. [PMID: 22255852 DOI: 10.1109/iembs.2011.6091628] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

488

Zhou K, Ai C, Dong P, Fan X, Yang L. A novel model to predict O-glycosylation sites using a highly unbalanced dataset. Glycoconj J 2012;29:551-64. [DOI: 10.1007/s10719-012-9434-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Revised: 07/11/2012] [Accepted: 07/17/2012] [Indexed: 10/28/2022]

489

Jahandideh S, Srinivasasainagendra V, Zhi D. Comprehensive comparative analysis and identification of RNA-binding protein domains: multi-class classification and feature selection. J Theor Biol 2012;312:65-75. [PMID: 22884576 DOI: 10.1016/j.jtbi.2012.07.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2012] [Revised: 07/09/2012] [Accepted: 07/13/2012] [Indexed: 01/11/2023]

490

Wang T, Xu PR, Zhu LX. Non-convex penalized estimation in high-dimensional models with single-index structure. J MULTIVARIATE ANAL 2012. [DOI: 10.1016/j.jmva.2012.03.009] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

491

Wit EC, Bakewell DJG. Borrowing strength: a likelihood ratio test for related sparse signals. Bioinformatics 2012;28:1980-9. [PMID: 22668791 DOI: 10.1093/bioinformatics/bts316] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

492

Fan Y, Li R. VARIABLE SELECTION IN LINEAR MIXED EFFECTS MODELS. Ann Stat 2012;40:2043-2068. [PMID: 24850975 DOI: 10.1214/12-aos1028] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

493

Wang SL, Li XL, Fang J. Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification. BMC Bioinformatics 2012;13:178. [PMID: 22830977 PMCID: PMC3465202 DOI: 10.1186/1471-2105-13-178] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 05/18/2012] [Indexed: 01/03/2023] Open

Abstract

Background

Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development.

Results

This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes.

Conclusions

It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network.

Collapse

494

Analysis of high dimensional data using pre-defined set and subset information, with applications to genomic data. BMC Bioinformatics 2012;13:177. [PMID: 22827252 PMCID: PMC3443674 DOI: 10.1186/1471-2105-13-177] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Accepted: 05/11/2012] [Indexed: 11/25/2022] Open

495

van Vliet MH, Horlings HM, van de Vijver MJ, Reinders MJT, Wessels LFA. Integration of clinical and gene expression data has a synergetic effect on predicting breast cancer outcome. PLoS One 2012;7:e40358. [PMID: 22808140 PMCID: PMC3394805 DOI: 10.1371/journal.pone.0040358] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2012] [Accepted: 06/06/2012] [Indexed: 12/12/2022] Open

496

Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowé A. A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012;9:1106-19. [PMID: 22350210 DOI: 10.1109/tcbb.2012.33] [Citation(s) in RCA: 219] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]

497

Silva-Fortes C, Amaral Turkman MA, Sousa L. Arrow plot: a new graphical tool for selecting up and down regulated genes and genes differentially expressed on sample subgroups. BMC Bioinformatics 2012;13:147. [PMID: 22734592 PMCID: PMC3542259 DOI: 10.1186/1471-2105-13-147] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2011] [Accepted: 06/14/2012] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

A common task in analyzing microarray data is to determine which genes are differentially expressed across two (or more) kind of tissue samples or samples submitted under experimental conditions. Several statistical methods have been proposed to accomplish this goal, generally based on measures of distance between classes. It is well known that biological samples are heterogeneous because of factors such as molecular subtypes or genetic background that are often unknown to the experimenter. For instance, in experiments which involve molecular classification of tumors it is important to identify significant subtypes of cancer. Bimodal or multimodal distributions often reflect the presence of subsamples mixtures. Consequently, there can be genes differentially expressed on sample subgroups which are missed if usual statistical approaches are used. In this paper we propose a new graphical tool which not only identifies genes with up and down regulations, but also genes with differential expression in different subclasses, that are usually missed if current statistical methods are used. This tool is based on two measures of distance between samples, namely the overlapping coefficient (OVL) between two densities and the area under the receiver operating characteristic (ROC) curve. The methodology proposed here was implemented in the open-source R software.

RESULTS

This method was applied to a publicly available dataset, as well as to a simulated dataset. We compared our results with the ones obtained using some of the standard methods for detecting differentially expressed genes, namely Welch t-statistic, fold change (FC), rank products (RP), average difference (AD), weighted average difference (WAD), moderated t-statistic (modT), intensity-based moderated t-statistic (ibmT), significance analysis of microarrays (samT) and area under the ROC curve (AUC). On both datasets all differentially expressed genes with bimodal or multimodal distributions were not selected by all standard selection procedures. We also compared our results with (i) area between ROC curve and rising area (ABCR) and (ii) the test for not proper ROC curves (TNRC). We found our methodology more comprehensive, because it detects both bimodal and multimodal distributions and different variances can be considered on both samples. Another advantage of our method is that we can analyze graphically the behavior of different kinds of differentially expressed genes.

CONCLUSION

Our results indicate that the arrow plot represents a new flexible and useful tool for the analysis of gene expression profiles from microarrays.

Collapse

498

Hu P, Bull SB, Jiang H. Gene network modular-based classification of microarray samples. BMC Bioinformatics 2012;13 Suppl 10:S17. [PMID: 22759422 PMCID: PMC3314572 DOI: 10.1186/1471-2105-13-s10-s17] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

499

Sharma A, Imoto S, Miyano S. A between-class overlapping filter-based method for transcriptome data analysis. J Bioinform Comput Biol 2012;10:1250010. [PMID: 22849365 DOI: 10.1142/s0219720012500102] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

500

Shah RD, Samworth RJ. Variable selection with error control: another look at stability selection. J R Stat Soc Series B Stat Methodol 2012. [DOI: 10.1111/j.1467-9868.2011.01034.x] [Citation(s) in RCA: 158] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]