Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ziegler A, DeStefano AL, König IR, Bardel C, Brinza D, Bull S, Cai Z, Glaser B, Jiang W, Lee KE, Li CX, Li J, Li X, Majoram P, Meng Y, Nicodemus KK, Platt A, Schwarz DF, Shi W, Shugart YY, Stassen HH, Sun YV, Won S, Wang W, Wahba G, Zagaar UA, Zhao Z. Data mining, neural nets, trees--problems 2 and 3 of Genetic Analysis Workshop 15. Genet Epidemiol 2008;31 Suppl 1:S51-60. [PMID: 18046765 DOI: 10.1002/gepi.20280] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

For:	Ziegler A, DeStefano AL, König IR, Bardel C, Brinza D, Bull S, Cai Z, Glaser B, Jiang W, Lee KE, Li CX, Li J, Li X, Majoram P, Meng Y, Nicodemus KK, Platt A, Schwarz DF, Shi W, Shugart YY, Stassen HH, Sun YV, Won S, Wang W, Wahba G, Zagaar UA, Zhao Z. Data mining, neural nets, trees--problems 2 and 3 of Genetic Analysis Workshop 15. Genet Epidemiol 2008;31 Suppl 1:S51-60. [PMID: 18046765 DOI: 10.1002/gepi.20280] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Number

Cited by Other Article(s)

Epistasis Detection via the Joint Cumulant. STATISTICS IN BIOSCIENCES 2022. [DOI: 10.1007/s12561-022-09336-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Grinberg NF, Orhobor OI, King RD. An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat. Mach Learn 2019;109:251-277. [PMID: 32174648 PMCID: PMC7048706 DOI: 10.1007/s10994-019-05848-5] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Revised: 09/17/2019] [Accepted: 09/19/2019] [Indexed: 11/01/2022]

Abstract

In phenotype prediction the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies, often called genome-wide association studies, are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems: one simple and clean (yeast), and the other two complex and real-world (rice and wheat). We compared standard machine learning methods; elastic net, ridge regression, lasso regression, random forest, gradient boosting machines (GBM), and support vector machines (SVM), with two state-of-the-art classical statistical genetics methods; genomic BLUP and a two-step sequential method based on linear regression. Additionally, using the clean yeast data, we investigated how performance varied with the complexity of the biological mechanism, the amount of observational noise, the number of examples, the amount of missing data, and the use of different data representations. We found that for almost all the phenotypes considered, standard machine learning methods outperformed the methods from classical statistical genetics. On the yeast problem, the most successful method was GBM, followed by lasso regression, and the two statistical genetics methods; with greater mechanistic complexity GBM was best, while in simpler cases lasso was superior. In the wheat and rice studies the best two methods were SVM and BLUP. The most robust method in the presence of noise, missing data, etc. was random forests. The classical statistical genetics method of genomic BLUP was found to perform well on problems where there was population structure. This suggests that standard machine learning methods need to be refined to include population structure information when this is present. We conclude that the application of machine learning methods to phenotype prediction problems holds great promise, but that determining which methods is likely to perform well on any given problem is elusive and non-trivial.

Collapse

Romagnoni A, Jégou S, Van Steen K, Wainrib G, Hugot JP. Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data. Sci Rep 2019;9:10351. [PMID: 31316157 PMCID: PMC6637191 DOI: 10.1038/s41598-019-46649-z] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Accepted: 07/03/2019] [Indexed: 02/08/2023] Open

Boulesteix AL, Wright MN, Hoffmann S, König IR. Statistical learning approaches in the genetic epidemiology of complex diseases. Hum Genet 2019;139:73-84. [PMID: 31049651 DOI: 10.1007/s00439-019-01996-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 03/04/2019] [Indexed: 02/07/2023]

Dorani F, Hu T, Woods MO, Zhai G. Ensemble learning for detecting gene-gene interactions in colorectal cancer. PeerJ 2018;6:e5854. [PMID: 30397551 PMCID: PMC6211269 DOI: 10.7717/peerj.5854] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Accepted: 09/28/2018] [Indexed: 11/20/2022] Open

Predictors of surgical site infection after open lower extremity revascularization. J Vasc Surg 2017;65:1769-1778.e3. [PMID: 28527931 DOI: 10.1016/j.jvs.2016.11.053] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Accepted: 11/19/2016] [Indexed: 11/23/2022]

Abstract

OBJECTIVE

Surgical site infection (SSI) after open lower extremity bypass (LEB) is a serious complication leading to an increased rate of graft failure, hospital readmission, and health care costs. This study sought to identify predictors of SSI after LEB for arterial occlusive disease and also potential modifiable factors to improve outcomes.

METHODS

Data from a statewide cardiovascular consortium of 35 hospitals were used to obtain demographic, procedural, and hospital risk factors for patients undergoing elective or urgent open LEB between January 2012 and June 2015. Bivariate comparisons and targeted maximum likelihood estimation were used to identify independent risk factors of SSI. Adjusted odds ratios (ORs) were calculated for patient demographics, comorbidities, operative details, and hospital-level factors.

RESULTS

Our study population included 3033 patients who underwent 703 femoral-femoral bypasses, 1431 femoral-popliteal bypasses, and 899 femoral-distal vessel bypasses. An SSI was diagnosed in 320 patients (10.6%) ≤30 days after the index operation. Adjusted patient and procedural predictors of SSI included renal failure currently requiring dialysis (OR, 4.35; 95% confidence interval [CI], 3.45-5.47; P < .001), hypertension (OR, 4.29; 95% CI, 2.74-6.72; P < .001), body mass index ≥25 kg/m² (OR, 1.78; 95% CI, 1.23-2.57; P = .002), procedural time >240 minutes (OR, 2.95; 95% CI, 1.89-4.62; P < .001), and iodine-only skin preparation (OR, 1.73; 95% CI, 1.02-2.91; P = .04). Hospital factors associated with increased SSI included hospital size <500 beds (OR, 2.22; 95% CI, 1.09-4.55; P = .028) and major teaching hospital (OR, 1.66; 95% CI, 1.07-2.58; P = .024). SSI resulted in increased risk of major amputation and surgical reoperation (P < .01), but did not affect 30-day mortality.

CONCLUSIONS

SSI after LEB is associated with an increase in rate of amputation and reoperation. Several patient, operative, and hospital-related risk factors that predict postoperative SSI were identified, suggesting that targeted improvements in perioperative care may decrease complications and improve vascular patient outcomes.

Collapse

Wright MN, Ziegler A, König IR. Do little interactions get lost in dark random forests? BMC Bioinformatics 2016;17:145. [PMID: 27029549 PMCID: PMC4815164 DOI: 10.1186/s12859-016-0995-8] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Accepted: 03/21/2016] [Indexed: 12/16/2022] Open

Grinberg NF, Lovatt A, Hegarty M, Lovatt A, Skøt KP, Kelly R, Blackmore T, Thorogood D, King RD, Armstead I, Powell W, Skøt L. Implementation of Genomic Prediction in Lolium perenne (L.) Breeding Populations. FRONTIERS IN PLANT SCIENCE 2016;7:133. [PMID: 26904088 PMCID: PMC4751346 DOI: 10.3389/fpls.2016.00133] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 01/25/2016] [Indexed: 05/23/2023]

Abstract

Perennial ryegrass (Lolium perenne L.) is one of the most widely grown forage grasses in temperate agriculture. In order to maintain and increase its usage as forage in livestock agriculture, there is a continued need for improvement in biomass yield, quality, disease resistance, and seed yield. Genetic gain for traits such as biomass yield has been relatively modest. This has been attributed to its long breeding cycle, and the necessity to use population based breeding methods. Thanks to recent advances in genotyping techniques there is increasing interest in genomic selection from which genomically estimated breeding values are derived. In this paper we compare the classical RRBLUP model with state-of-the-art machine learning techniques that should yield themselves easily to use in GS and demonstrate their application to predicting quantitative traits in a breeding population of L. perenne. Prediction accuracies varied from 0 to 0.59 depending on trait, prediction model and composition of the training population. The BLUP model produced the highest prediction accuracies for most traits and training populations. Forage quality traits had the highest accuracies compared to yield related traits. There appeared to be no clear pattern to the effect of the training population composition on the prediction accuracies. The heritability of the forage quality traits was generally higher than for the yield related traits, and could partly explain the difference in accuracy. Some population structure was evident in the breeding populations, and probably contributed to the varying effects of training population on the predictions. The average linkage disequilibrium between adjacent markers ranged from 0.121 to 0.215. Higher marker density and larger training population closely related with the test population are likely to improve the prediction accuracy.

Collapse

König IR, Auerbach J, Gola D, Held E, Holzinger ER, Legault MA, Sun R, Tintle N, Yang HC. Machine learning and data mining in complex genomic data--a review on the lessons learned in Genetic Analysis Workshop 19. BMC Genet 2016;17 Suppl 2:1. [PMID: 26866367 PMCID: PMC4895282 DOI: 10.1186/s12863-015-0315-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

Mitchell L, Sloan TM, Mewissen M, Ghazal P, Forster T, Piotrowski M, Trew A. Parallel classification and feature selection in microarray data using SPRINT. CONCURRENCY AND COMPUTATION : PRACTICE & EXPERIENCE 2014;26:854-865. [PMID: 24883047 PMCID: PMC4038771 DOI: 10.1002/cpe.2928] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Jensen TM, Witte DR, Pieragostino D, McGuire JN, Schjerning ED, Nardi C, Urbani A, Kivimäki M, Brunner EJ, Tabàk AG, Vistisen D. Association between protein signals and type 2 diabetes incidence. Acta Diabetol 2013;50:697-704. [PMID: 22310914 PMCID: PMC4181558 DOI: 10.1007/s00592-012-0376-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/18/2011] [Accepted: 01/18/2012] [Indexed: 01/04/2023]

Rose S. Mortality risk score prediction in an elderly population using machine learning. Am J Epidemiol 2013;177:443-52. [PMID: 23364879 DOI: 10.1093/aje/kws241] [Citation(s) in RCA: 105] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Schunkert H, König IR, Erdmann J. Molecular Signatures of Cardiovascular Disease Risk. Mol Diagn Ther 2012;12:281-7. [DOI: 10.1007/bf03256293] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Schwarz DF, König IR, Ziegler A. On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data. ACTA ACUST UNITED AC 2010;26:1752-8. [PMID: 20505004 DOI: 10.1093/bioinformatics/btq257] [Citation(s) in RCA: 184] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Ziegler A. Genome-wide association studies: quality control and population-based measures. Genet Epidemiol 2010;33 Suppl 1:S45-50. [PMID: 19924716 DOI: 10.1002/gepi.20472] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Tang R, Sinnwell JP, Li J, Rider DN, de Andrade M, Biernacka JM. Identification of genes and haplotypes that predict rheumatoid arthritis using random forests. BMC Proc 2009;3 Suppl 7:S68. [PMID: 20018062 PMCID: PMC2795969 DOI: 10.1186/1753-6561-3-s7-s68] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Szymczak S, Biernacka JM, Cordell HJ, González-Recio O, König IR, Zhang H, Sun YV. Machine learning in genome-wide association studies. Genet Epidemiol 2009;33 Suppl 1:S51-7. [PMID: 19924717 DOI: 10.1002/gepi.20473] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Ziegler A, König IR, Thompson JR. Biostatistical Aspects of Genome-Wide Association Studies. Biom J 2008;50:8-28. [DOI: 10.1002/bimj.200710398] [Citation(s) in RCA: 113] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]

König I, Malley J, Pajevic S, Weimar C, Diener HC, Ziegler A. Patient-centered yes/no prognosis using learning machines. INT J DATA MIN BIOIN 2008;2:289-341. [PMID: 19216340 PMCID: PMC2754835 DOI: 10.1504/ijdmb.2008.022149] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Falk CT, Finch SJ, Kim W, Mukhopadhyay ND, Gong B, Hinrichs A, Li X, Liu X, Malhotra A, Mehta T, Page G, Rao S, Saccone N, Shete S, Yang Y, Yu R, Zhao JH, Zhou X. Data mining of RNA expression and DNA genotype data: presentation group 5 contributions to Genetic Analysis Workshop 15. Genet Epidemiol 2007;31 Suppl 1:S43-50. [PMID: 18046764 DOI: 10.1002/gepi.20279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

de Andrade M, Allen AS. Summary of contributions to GAW15 Group 13: candidate gene association studies. Genet Epidemiol 2007;31 Suppl 1:S110-7. [DOI: 10.1002/gepi.20287] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]