1
|
Cromer SJ, Lakhani CM, Mercader JM, Majarian TD, Schroeder P, Cole JB, Florez JC, Patel CJ, Manning AK, Burnett-Bowie SAM, Merino J, Udler MS. Association and Interaction of Genetics and Area-Level Socioeconomic Factors on the Prevalence of Type 2 Diabetes and Obesity. Diabetes Care 2023; 46:944-952. [PMID: 36787958 PMCID: PMC10154653 DOI: 10.2337/dc22-1954] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 12/17/2022] [Indexed: 02/16/2023]
Abstract
OBJECTIVE Quantify the impact of genetic and socioeconomic factors on risk of type 2 diabetes (T2D) and obesity. RESEARCH DESIGN AND METHODS Among participants in the Mass General Brigham Biobank (MGBB) and UK Biobank (UKB), we used logistic regression models to calculate cross-sectional odds of T2D and obesity using 1) polygenic risk scores for T2D and BMI and 2) area-level socioeconomic risk (educational attainment) measures. The primary analysis included 26,737 participants of European genetic ancestry in MGBB with replication in UKB (N = 223,843), as well as in participants of non-European ancestry (MGBB N = 3,468; UKB N = 7,459). RESULTS The area-level socioeconomic measure most strongly associated with both T2D and obesity was percent without a college degree, and associations with disease prevalence were independent of genetic risk (P < 0.001 for each). Moving from lowest to highest quintiles of combined genetic and socioeconomic burden more than tripled T2D (3.1% to 22.2%) and obesity (20.9% to 69.0%) prevalence. Favorable socioeconomic risk was associated with lower disease prevalence, even in those with highest genetic risk (T2D 13.0% vs. 22.2%, obesity 53.6% vs. 69.0% in lowest vs. highest socioeconomic risk quintiles). Additive effects of genetic and socioeconomic factors accounted for 13.2% and 16.7% of T2D and obesity prevalence, respectively, explained by these models. Findings were replicated in independent European and non-European ancestral populations. CONCLUSIONS Genetic and socioeconomic factors significantly interact to increase risk of T2D and obesity. Favorable area-level socioeconomic status was associated with an almost 50% lower T2D prevalence in those with high genetic risk.
Collapse
Affiliation(s)
- Sara J. Cromer
- Diabetes Unit, Endocrine Division, Massachusetts General Hospital, Boston, MA
- Department of Medicine, Harvard Medical School, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Chirag M. Lakhani
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| | - Josep M. Mercader
- Department of Medicine, Harvard Medical School, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
| | - Timothy D. Majarian
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Philip Schroeder
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Joanne B. Cole
- Department of Medicine, Harvard Medical School, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Division of Endocrinology, Boston Children’s Hospital, Boston, MA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO
| | - Jose C. Florez
- Diabetes Unit, Endocrine Division, Massachusetts General Hospital, Boston, MA
- Department of Medicine, Harvard Medical School, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
| | - Chirag J. Patel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| | - Alisa K. Manning
- Department of Medicine, Harvard Medical School, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Clinical and Translational Epidemiology Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA
| | - Sherri-Ann M. Burnett-Bowie
- Department of Medicine, Harvard Medical School, Boston, MA
- Endocrine Unit, Endocrine Division, Massachusetts General Hospital, Boston, MA
| | - Jordi Merino
- Diabetes Unit, Endocrine Division, Massachusetts General Hospital, Boston, MA
- Department of Medicine, Harvard Medical School, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Miriam S. Udler
- Diabetes Unit, Endocrine Division, Massachusetts General Hospital, Boston, MA
- Department of Medicine, Harvard Medical School, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
| |
Collapse
|
2
|
Carrera C, Cárcel-Márquez J, Cullell N, Torres-Águila N, Muiño E, Castillo J, Sobrino T, Campos F, Rodríguez-Castro E, Llucià-Carol L, Millán M, Muñoz-Narbona L, López-Cancio E, Bustamante A, Ribó M, Álvarez-Sabín J, Jiménez-Conde J, Roquer J, Giralt-Steinhauer E, Soriano-Tárraga C, Mola-Caminal M, Vives-Bauza C, Navarro RD, Tur S, Obach V, Arenillas JF, Segura T, Serrano-Heras G, Martí-Fàbregas J, Delgado-Mederos R, Freijo-Guerrero MM, Moniche F, Cabezas JA, Castellanos M, Gallego-Fabrega C, González-Sanchez J, Krupinsky J, Strbian D, Tatlisumak T, Thijs V, Lemmens R, Slowik A, Pera J, Kittner S, Cole J, Heitsch L, Ibañez L, Cruchaga C, Lee JM, Montaner J, Fernández-Cadenas I. Single nucleotide variations in ZBTB46 are associated with post-thrombolytic parenchymal haematoma. Brain 2021; 144:2416-2426. [PMID: 33723576 PMCID: PMC8418348 DOI: 10.1093/brain/awab090] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 02/12/2021] [Accepted: 02/25/2021] [Indexed: 12/13/2022] Open
Abstract
Haemorrhagic transformation is a complication of recombinant tissue-plasminogen activator treatment. The most severe form, parenchymal haematoma, can result in neurological deterioration, disability, and death. Our objective was to identify single nucleotide variations associated with a risk of parenchymal haematoma following thrombolytic therapy in patients with acute ischaemic stroke. A fixed-effect genome-wide meta-analysis was performed combining two-stage genome-wide association studies (n = 1904). The discovery stage (three cohorts) comprised 1324 ischaemic stroke individuals, 5.4% of whom had a parenchymal haematoma. Genetic variants yielding a P-value < 0.05 1 × 10-5 were analysed in the validation stage (six cohorts), formed by 580 ischaemic stroke patients with 12.1% haemorrhagic events. All participants received recombinant tissue-plasminogen activator; cases were parenchymal haematoma type 1 or 2 as defined by the European Cooperative Acute Stroke Study (ECASS) criteria. Genome-wide significant findings (P < 5 × 10-8) were characterized by in silico functional annotation, gene expression, and DNA regulatory elements. We analysed 7 989 272 single nucleotide polymorphisms and identified a genome-wide association locus on chromosome 20 in the discovery cohort; functional annotation indicated that the ZBTB46 gene was driving the association for chromosome 20. The top single nucleotide polymorphism was rs76484331 in the ZBTB46 gene [P = 2.49 × 10-8; odds ratio (OR): 11.21; 95% confidence interval (CI): 4.82-26.55]. In the replication cohort (n = 580), the rs76484331 polymorphism was associated with parenchymal haematoma (P = 0.01), and the overall association after meta-analysis increased (P = 1.61 × 10-8; OR: 5.84; 95% CI: 3.16-10.76). ZBTB46 codes the zinc finger and BTB domain-containing protein 46 that acts as a transcription factor. In silico studies indicated that ZBTB46 is expressed in brain tissue by neurons and endothelial cells. Moreover, rs76484331 interacts with the promoter sites located at 20q13. In conclusion, we identified single nucleotide variants in the ZBTB46 gene associated with a higher risk of parenchymal haematoma following recombinant tissue-plasminogen activator treatment.
Collapse
Affiliation(s)
- Caty Carrera
- Neurovascular Research Laboratory, VHIR, Universitat Autònoma de Barcelona, Barcelona 08035, Spain
- Stroke Pharmacogenomics and Genetics, IIB-Sant Pau, Barcelona 08025, Spain
| | | | - Natalia Cullell
- Stroke Pharmacogenomics and Genetics, IIB-Sant Pau, Barcelona 08025, Spain
- Stroke Pharmacogenomics and Genetics, Fundació Docència i Recerca Mútua Terrassa, Terrassa 08221, Spain
| | - Nuria Torres-Águila
- Stroke Pharmacogenomics and Genetics, IIB-Sant Pau, Barcelona 08025, Spain
- Stroke Pharmacogenomics and Genetics, Fundació Docència i Recerca Mútua Terrassa, Terrassa 08221, Spain
| | - Elena Muiño
- Stroke Pharmacogenomics and Genetics, IIB-Sant Pau, Barcelona 08025, Spain
| | - José Castillo
- Clinical Neurosciences Research Laboratory, IDIS, Santiago de Compostela, 15706, Spain
| | - Tomás Sobrino
- Clinical Neurosciences Research Laboratory, IDIS, Santiago de Compostela, 15706, Spain
| | - Francisco Campos
- Clinical Neurosciences Research Laboratory, IDIS, Santiago de Compostela, 15706, Spain
| | | | - Laia Llucià-Carol
- Stroke Pharmacogenomics and Genetics, IIB-Sant Pau, Barcelona 08025, Spain
| | - Mònica Millán
- Department of Neuroscience, HUGTP, Badalona 08916, Spain
| | | | | | - Alejandro Bustamante
- Neurovascular Research Laboratory, VHIR, Universitat Autònoma de Barcelona, Barcelona 08035, Spain
| | - Marc Ribó
- Stroke Unit, HUVH, Barcelona 08035, Spain
| | | | - Jordi Jiménez-Conde
- Department of Neurology, Neurovascular Research Group, IMIM-Hospital del Mar, Barcelona 08003, Spain
| | - Jaume Roquer
- Department of Neurology, Neurovascular Research Group, IMIM-Hospital del Mar, Barcelona 08003, Spain
| | - Eva Giralt-Steinhauer
- Department of Neurology, Neurovascular Research Group, IMIM-Hospital del Mar, Barcelona 08003, Spain
| | - Carolina Soriano-Tárraga
- Department of Neurology, Neurovascular Research Group, IMIM-Hospital del Mar, Barcelona 08003, Spain
| | - Marina Mola-Caminal
- Department of Neurology, Neurovascular Research Group, IMIM-Hospital del Mar, Barcelona 08003, Spain
| | | | | | - Silvia Tur
- Department of Neurology, HUSE, Mallorca 07120, Spain
| | - Victor Obach
- Department of Neurology, Hospital Clínic i Provincial de Barcelona, Barcelona 08036, Spain
| | - Juan Francisco Arenillas
- Department of Neurology, Hospital Clínico Universitario, University of Valladolid, Valladolid 47003, Spain
| | - Tomás Segura
- Department of Neurology, CHUA, Albacete 02006, Spain
| | | | - Joan Martí-Fàbregas
- Department of Neurology, Hospital de la Santa Creu i Sant Pau, IIB-Sant Pau, Barcelona 08025, Spain
| | - Raquel Delgado-Mederos
- Department of Neurology, Hospital de la Santa Creu i Sant Pau, IIB-Sant Pau, Barcelona 08025, Spain
| | - M Mar Freijo-Guerrero
- Neurovascular Unit, Biocruces Bizkaia Health Research Institute, Bilbao 48903, Spain
| | - Francisco Moniche
- Department of Neurology, Virgen del Rocío, IBIS, Seville 41023, Spain
| | | | | | - Cristina Gallego-Fabrega
- Stroke Pharmacogenomics and Genetics, IIB-Sant Pau, Barcelona 08025, Spain
- Stroke Pharmacogenomics and Genetics, Fundació Docència i Recerca Mútua Terrassa, Terrassa 08221, Spain
| | - Jonathan González-Sanchez
- Stroke Pharmacogenomics and Genetics, Fundació Docència i Recerca Mútua Terrassa, Terrassa 08221, Spain
- School of Healthcare Science, Manchester Metropolitan University, Manchester M15 6BH, UK
| | - Jurek Krupinsky
- School of Healthcare Science, Manchester Metropolitan University, Manchester M15 6BH, UK
- Neurology Unit, Hospital Universitari Mútua Terrassa, Terrassa 08221, Spain
| | - Daniel Strbian
- Department of Neurology, Helsinki University Hospital, Helsinki FI-00029, Finland
| | - Turgut Tatlisumak
- Sahlgrenska Academy at University of Gothenburg and Sahlgrenska University Hospital, Gothenburg 41345, Sweden
| | - Vincent Thijs
- Stroke Division, Florey Institute of Neuroscience and Mental Health, University of Melbourne, Heidelberg, VIC 3072, Australia
- Department of Neurology, Austin Health, Heidelberg, VIC 3072, Australia
| | - Robin Lemmens
- Department of Neurology, University Hospitals Leuven, Campus Gasthuisberg, Leuven 3000, Belgium
| | - Agnieszka Slowik
- Department of Neurology, Jagiellonian University Medical College, Kraków 31-007, Poland
| | - Johanna Pera
- Department of Neurology, Jagiellonian University Medical College, Kraków 31-007, Poland
| | - Steven Kittner
- Department of Neurology, University of Maryland School of Medicine and Baltimore, Baltimore, MD 21201-1559, USA
| | - John Cole
- Department of Neurology, University of Maryland School of Medicine and Baltimore, Baltimore, MD 21201-1559, USA
| | - Laura Heitsch
- Division of Emergency Medicine, Washington University School of Medicine, St. Louis, MO 63110-1010, USA
- Department of Neurology, Washington University School of Medicine, St. Louis, MO 63110-1010, USA
| | - Laura Ibañez
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110-1010, USA
| | - Carlos Cruchaga
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110-1010, USA
| | - Jin-Moo Lee
- Department of Neurology, Washington University School of Medicine, St. Louis, MO 63110-1010, USA
| | - Joan Montaner
- Neurovascular Research Laboratory, VHIR, Universitat Autònoma de Barcelona, Barcelona 08035, Spain
- Department of Neurology, Virgen del Rocío, IBIS, Seville 41023, Spain
| | | |
Collapse
|
3
|
Liu C, Zeinomar N, Chung WK, Kiryluk K, Gharavi AG, Hripcsak G, Crew KD, Shang N, Khan A, Fasel D, Manolio TA, Jarvik GP, Rowley R, Justice AE, Rahm AK, Fullerton SM, Smoller JW, Larson EB, Crane PK, Dikilitas O, Wiesner GL, Bick AG, Terry MB, Weng C. Generalizability of Polygenic Risk Scores for Breast Cancer Among Women With European, African, and Latinx Ancestry. JAMA Netw Open 2021; 4:e2119084. [PMID: 34347061 PMCID: PMC8339934 DOI: 10.1001/jamanetworkopen.2021.19084] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
IMPORTANCE Multiple polygenic risk scores (PRSs) for breast cancer have been developed from large research consortia; however, their generalizability to diverse clinical settings is unknown. OBJECTIVE To examine the performance of previously developed breast cancer PRSs in a clinical setting for women of European, African, and Latinx ancestry. DESIGN, SETTING, AND PARTICIPANTS This cohort study using the Electronic Medical Records and Genomics (eMERGE) network data set included 39 591 women from 9 contributing medical centers in the US that had electronic medical records (EMR) linked to genotype data. Breast cancer cases and controls were identified through a validated EMR phenotyping algorithm. MAIN OUTCOMES AND MEASURES Multivariable logistic regression was used to assess the association between breast cancer risk and 7 previously developed PRSs, adjusting for age, study site, breast cancer family history, and first 3 ancestry informative principal components. RESULTS This study included 39 591 women: 33 594 with European, 3801 with African, and 2196 with Latinx ancestry. The mean (SD) age at breast cancer diagnosis was 60.7 (13.0), 58.8 (12.5), and 60.1 (13.0) years for women with European, African, and Latinx ancestry, respectively. PRSs derived from women with European ancestry were associated with breast cancer risk in women with European ancestry (highest odds ratio [OR] per 1-SD increase, 1.46; 95% CI, 1.41-1.51), women with Latinx ancestry (highest OR, 1.31; 95% CI, 1.09-1.58), and women with African ancestry (OR, 1.19; 95% CI, 1.05-1.35). For women with European ancestry, this association with breast cancer risk was largest in the extremes of the PRS distribution, with ORs ranging from 2.19 (95% CI, 1.84-2.53) to 2.48 (95% CI, 1.89-3.25) for the 3 different PRSs examined for those in the highest 1% of the PRS compared with those in the middle quantile. Among women with Latinx and African ancestries at the extremes of the PRS distribution, there were no statistically significant associations. CONCLUSIONS AND RELEVANCE This cohort study found that PRS models derived from women with European ancestry for breast cancer risk generalized well for women with European, Latinx, and African ancestries across different clinical settings, although the effect sizes for women with African ancestry were smaller, likely because of differences in risk allele frequencies and linkage disequilibrium patterns. These results highlight the need to improve representation of diverse population groups, particularly women with African ancestry, in genomic research cohorts.
Collapse
Affiliation(s)
- Cong Liu
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York
| | - Nur Zeinomar
- Department of Epidemiology, Columbia University Irving Medical Center, New York, New York
- Division of Medical Oncology, Rutgers Cancer Institute of New Jersey, Robert Wood Johnson Medical School, New Brunswick, New Jersey
| | - Wendy K. Chung
- Department of Pediatrics, Columbia University Irving Medical Center, New York, New York
| | - Krzysztof Kiryluk
- Department of Medicine, Columbia University Irving Medical Center, New York, New York
| | - Ali G. Gharavi
- Department of Medicine, Columbia University Irving Medical Center, New York, New York
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York
| | - Katherine D. Crew
- Department of Medicine, Columbia University Irving Medical Center, New York, New York
| | - Ning Shang
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York
| | - Atlas Khan
- Department of Medicine, Columbia University Irving Medical Center, New York, New York
| | - David Fasel
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York
| | - Teri A. Manolio
- National Human Genome Research Institute, Bethesda, Maryland
| | - Gail P. Jarvik
- Department of Medicine, University of Washington, Seattle
| | - Robb Rowley
- National Human Genome Research Institute, Bethesda, Maryland
| | - Ann E. Justice
- Department of Population Health Sciences, Geisinger, Danville, Pennsylvania
| | - Alanna K. Rahm
- Genomic Medicine Institute, Geisinger, Danville, Pennsylvania
| | | | - Jordan W. Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts
| | - Eric B. Larson
- Kaiser Permanente Washington Health Research Institute, Seattle, Washington
| | - Paul K. Crane
- Department of Medicine, University of Washington, Seattle
| | - Ozan Dikilitas
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, Minnesota
| | - Georgia L. Wiesner
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Alexander G. Bick
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Mary Beth Terry
- Department of Epidemiology, Columbia University Irving Medical Center, New York, New York
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York
| |
Collapse
|
4
|
Li Z, Löytynoja A, Fraimout A, Merilä J. Effects of marker type and filtering criteria on Q ST- F ST comparisons. R Soc Open Sci 2019; 6:190666. [PMID: 31827824 PMCID: PMC6894560 DOI: 10.1098/rsos.190666] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Accepted: 09/16/2019] [Indexed: 06/10/2023]
Abstract
Comparative studies of quantitative and neutral genetic differentiation (Q ST-F ST tests) provide means to detect adaptive population differentiation. However, Q ST-F ST tests can be overly liberal if the markers used deflate F ST below its expectation, or overly conservative if methodological biases lead to inflated F ST estimates. We investigated how marker type and filtering criteria for marker selection influence Q ST-F ST comparisons through their effects on F ST using simulations and empirical data on over 18 000 in silico genotyped microsatellites and 3.8 million single-locus polymorphism (SNP) loci from four populations of nine-spined sticklebacks (Pungitius pungitius). Empirical and simulated data revealed that F ST decreased with increasing marker variability, and was generally higher with SNPs than with microsatellites. The estimated baseline F ST levels were also sensitive to filtering criteria for SNPs: both minor alleles and linkage disequilibrium (LD) pruning influenced F ST estimation, as did marker ascertainment. However, in the case of stickleback data used here where Q ST is high, the choice of marker type, their genomic location, ascertainment and filtering made little difference to outcomes of Q ST-F ST tests. Nevertheless, we recommend that Q ST-F ST tests using microsatellites should discard the most variable loci, and those using SNPs should pay attention to marker ascertainment and properly account for LD before filtering SNPs. This may be especially important when level of quantitative trait differentiation is low and levels of neutral differentiation high.
Collapse
Affiliation(s)
- Zitong Li
- Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Programme, University of Helsinki, Helsinki 00014, Finland
| | - Ari Löytynoja
- Institute of Biotechnology, University of Helsinki, Helsinki 00014, Finland
| | - Antoine Fraimout
- Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Programme, University of Helsinki, Helsinki 00014, Finland
| | - Juha Merilä
- Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Programme, University of Helsinki, Helsinki 00014, Finland
| |
Collapse
|
5
|
Affiliation(s)
- Christopher Naugler
- Department of Pathology and Laboratory Medicine, University of Calgary, Calgary, Canada
- Department of Family Medicine, University of Calgary, Calgary, Canada
- Department of Community Health Sciences, University of Calgary, Calgary, Canada
| | - Deirdre L. Church
- Department of Pathology and Laboratory Medicine, University of Calgary, Calgary, Canada
- Department of Medicine, University of Calgary, Calgary, Canada
| |
Collapse
|
6
|
Stanaway IB, Hall TO, Rosenthal EA, Palmer M, Naranbhai V, Knevel R, Namjou-Khales B, Carroll RJ, Kiryluk K, Gordon AS, Linder J, Howell KM, Mapes BM, Lin FTJ, Joo YY, Hayes MG, Gharavi AG, Pendergrass SA, Ritchie MD, de Andrade M, Croteau-Chonka DC, Raychaudhuri S, Weiss ST, Lebo M, Amr SS, Carrell D, Larson EB, Chute CG, Rasmussen-Torvik LJ, Roy-Puckelwartz MJ, Sleiman P, Hakonarson H, Li R, Karlson EW, Peterson JF, Kullo IJ, Chisholm R, Denny JC, Jarvik GP, Crosslin DR. The eMERGE genotype set of 83,717 subjects imputed to ~40 million variants genome wide and association with the herpes zoster medical record phenotype. Genet Epidemiol 2018; 43:63-81. [PMID: 30298529 PMCID: PMC6375696 DOI: 10.1002/gepi.22167] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Revised: 08/10/2018] [Accepted: 08/28/2018] [Indexed: 12/30/2022]
Abstract
The Electronic Medical Records and Genomics (eMERGE) network is a network of medical centers with electronic medical records linked to existing biorepository samples for genomic discovery and genomic medicine research. The network sought to unify the genetic results from 78 Illumina and Affymetrix genotype array batches from 12 contributing medical centers for joint association analysis of 83,717 human participants. In this report, we describe the imputation of eMERGE results and methods to create the unified imputed merged set of genome‐wide variant genotype data. We imputed the data using the Michigan Imputation Server, which provides a missing single‐nucleotide variant genotype imputation service using the minimac3 imputation algorithm with the Haplotype Reference Consortium genotype reference set. We describe the quality control and filtering steps used in the generation of this data set and suggest generalizable quality thresholds for imputation and phenotype association studies. To test the merged imputed genotype set, we replicated a previously reported chromosome 6 HLA‐B herpes zoster (shingles) association and discovered a novel zoster‐associated loci in an epigenetic binding site near the terminus of chromosome 3 (3p29).
Collapse
Affiliation(s)
- Ian B Stanaway
- Department of Biomedical Informatics Medical Education, School of Medicine, University of Washington, Seattle, Washington
| | - Taryn O Hall
- Department of Biomedical Informatics Medical Education, School of Medicine, University of Washington, Seattle, Washington
| | - Elisabeth A Rosenthal
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, Washington
| | - Melody Palmer
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, Washington
| | - Vivek Naranbhai
- Department of Biomedical Informatics Medical Education, School of Medicine, University of Washington, Seattle, Washington.,Harvard Medical School, Harvard University, Cambridge, Massachusetts
| | - Rachel Knevel
- Harvard Medical School, Harvard University, Cambridge, Massachusetts
| | - Bahram Namjou-Khales
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Robert J Carroll
- Departments of Biomedical Informatics and Medicine, Vanderbilt University, Nashville, Tennessee
| | - Krzysztof Kiryluk
- Department of Medicine, Columbia University, New York City, New York
| | - Adam S Gordon
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, Washington
| | - Jodell Linder
- Vanderbilt Institute for Clinical and Translational Research, School of Medicine, Vanderbilt University, Nashville, Tennessee
| | - Kayla Marie Howell
- Vanderbilt Institute for Clinical and Translational Research, School of Medicine, Vanderbilt University, Nashville, Tennessee
| | - Brandy M Mapes
- Vanderbilt Institute for Clinical and Translational Research, School of Medicine, Vanderbilt University, Nashville, Tennessee
| | - Frederick T J Lin
- Feinberg School of Medicine, Northwestern University, Chicago, Illinois
| | | | - M Geoffrey Hayes
- Feinberg School of Medicine, Northwestern University, Chicago, Illinois
| | - Ali G Gharavi
- Department of Medicine, Columbia University, New York City, New York
| | | | - Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania
| | | | | | - Soumya Raychaudhuri
- Harvard Medical School, Harvard University, Cambridge, Massachusetts.,Program in Medical and Population Genetics, Broad Institute of Massachusetts Technical Institute and Harvard University, Cambridge, Massachusetts
| | - Scott T Weiss
- Harvard Medical School, Harvard University, Cambridge, Massachusetts
| | - Matt Lebo
- Harvard Medical School, Harvard University, Cambridge, Massachusetts
| | - Sami S Amr
- Harvard Medical School, Harvard University, Cambridge, Massachusetts
| | - David Carrell
- Kaiser Permanente Washington Health Research Institute (Formerly Group Health Cooperative-Seattle), Kaiser Permanente, Seattle, Washington
| | - Eric B Larson
- Kaiser Permanente Washington Health Research Institute (Formerly Group Health Cooperative-Seattle), Kaiser Permanente, Seattle, Washington
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland
| | | | | | - Patrick Sleiman
- Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
| | | | - Rongling Li
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland
| | - Elizabeth W Karlson
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Josh F Peterson
- Departments of Biomedical Informatics and Medicine, Vanderbilt University, Nashville, Tennessee
| | | | - Rex Chisholm
- Feinberg School of Medicine, Northwestern University, Chicago, Illinois
| | - Joshua Charles Denny
- Departments of Biomedical Informatics and Medicine, Vanderbilt University, Nashville, Tennessee
| | - Gail P Jarvik
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, Washington
| | -
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland
| | - David R Crosslin
- Department of Biomedical Informatics Medical Education, School of Medicine, University of Washington, Seattle, Washington
| |
Collapse
|
7
|
Cohn EG, Hamilton N, Larson EL, Williams JK. Self-reported race and ethnicity of US biobank participants compared to the US Census. J Community Genet 2017; 8:229-238. [PMID: 28623623 PMCID: PMC5496846 DOI: 10.1007/s12687-017-0308-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Accepted: 05/11/2017] [Indexed: 12/17/2022] Open
Abstract
Precision medicine envisions a future of effective diagnosis, treatment, and prevention grounded in precise understandings of the genetic and environmental determinants of disease. Given that the original genome-wide association studies represented a predominately European White population, and that diversity in genomic studies must account for genetic variation both within and across racial categories, new research studies are at a heightened risk for inadequate representation. Currently biological samples are being made available for sequencing in biobanks across the USA, but the diversity of those samples is unknown. The aims of this study were to describe the types of recruitment and enrollment materials used by US biobanks and the diversity of the samples contained within their collection. Biobank websites and brochures were evaluated for reading level, health literacy, and factors known to encourage the recruitment of minorities, such as showing pictures of diverse populations. Biobank managers were surveyed by mail on the methods and materials used for enrollment, recruitment, consent, and the self-reported race/ethnicity of biobank participants. From 51 US biobanks (68% response rate), recruitment and enrollment materials were in English only, and most of the websites and brochures exceeded a fifth-grade reading level. When compared to the 2015 US Census, self-reported race/ethnicity of participants was not significantly different for Whites (61%) and blacks (13%). The percentages were significantly lower for Hispanics and Latinos (18 vs. 7%, p = 0.00) and Hawaiian/Pacific Islanders (0.2 vs. 0.01%; p = 0.01) and higher for Asians (13 vs. 5%, p = 0.01). Materials for recruitment predominantly in English may limit participation by underrepresented populations.
Collapse
Affiliation(s)
- Elizabeth Gross Cohn
- School of Nursing, Columbia University, New York, NY, USA.
- Adelphi University, Garden City, NY, USA.
| | - Nalo Hamilton
- School of Nursing, University of California, Los Angeles, Los Angeles, CA, USA
| | | | | |
Collapse
|
8
|
Jackson KL, Mbagwu M, Pacheco JA, Baldridge AS, Viox DJ, Linneman JG, Shukla SK, Peissig PL, Borthwick KM, Carrell DA, Bielinski SJ, Kirby JC, Denny JC, Mentch FD, Vazquez LM, Rasmussen-Torvik LJ, Kho AN. Performance of an electronic health record-based phenotype algorithm to identify community associated methicillin-resistant Staphylococcus aureus cases and controls for genetic association studies. BMC Infect Dis 2016; 16:684. [PMID: 27855652 PMCID: PMC5114817 DOI: 10.1186/s12879-016-2020-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2016] [Accepted: 11/11/2016] [Indexed: 12/25/2022] Open
Abstract
Background Community associated methicillin-resistant Staphylococcus aureus (CA-MRSA) is one of the most common causes of skin and soft tissue infections in the United States, and a variety of genetic host factors are suspected to be risk factors for recurrent infection. Based on the CDC definition, we have developed and validated an electronic health record (EHR) based CA-MRSA phenotype algorithm utilizing both structured and unstructured data. Methods The algorithm was validated at three eMERGE consortium sites, and positive predictive value, negative predictive value and sensitivity, were calculated. The algorithm was then run and data collected across seven total sites. The resulting data was used in GWAS analysis. Results Across seven sites, the CA-MRSA phenotype algorithm identified a total of 349 cases and 7761 controls among the genotyped European and African American biobank populations. PPV ranged from 68 to 100% for cases and 96 to 100% for controls; sensitivity ranged from 94 to 100% for cases and 75 to 100% for controls. Frequency of cases in the populations varied widely by site. There were no plausible GWAS-significant (p < 5 E −8) findings. Conclusions Differences in EHR data representation and screening patterns across sites may have affected identification of cases and controls and accounted for varying frequencies across sites. Future work identifying these patterns is necessary. Electronic supplementary material The online version of this article (doi:10.1186/s12879-016-2020-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kathryn L Jackson
- Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
| | - Michael Mbagwu
- Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | | | | | - Daniel J Viox
- Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.,Emory University School of Medicine, Atlanta, GA, USA
| | - James G Linneman
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | | | - Peggy L Peissig
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | | | - David A Carrell
- Group Health Research Institute, Group Health Cooperative, Seattle, WA, USA
| | | | - Jacqueline C Kirby
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Frank D Mentch
- The Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Lyam M Vazquez
- The Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | | | - Abel N Kho
- Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| |
Collapse
|
9
|
Zhang YP, Zhang YY, Duan DD. From Genome-Wide Association Study to Phenome-Wide Association Study: New Paradigms in Obesity Research. Prog Mol Biol Transl Sci 2016; 140:185-231. [PMID: 27288830 DOI: 10.1016/bs.pmbts.2016.02.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Obesity is a condition in which excess body fat has accumulated over an extent that increases the risk of many chronic diseases. The current clinical classification of obesity is based on measurement of body mass index (BMI), waist-hip ratio, and body fat percentage. However, these measurements do not account for the wide individual variations in fat distribution, degree of fatness or health risks, and genetic variants identified in the genome-wide association studies (GWAS). In this review, we will address this important issue with the introduction of phenome, phenomics, and phenome-wide association study (PheWAS). We will discuss the new paradigm shift from GWAS to PheWAS in obesity research. In the era of precision medicine, phenomics and PheWAS provide the required approaches to better definition and classification of obesity according to the association of obese phenome with their unique molecular makeup, lifestyle, and environmental impact.
Collapse
Affiliation(s)
- Y-P Zhang
- Pediatric Heart Center, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Y-Y Zhang
- Department of Cardiology, Changzhou Second People's Hospital, Changzhou, Jiangsu, China
| | - D D Duan
- Laboratory of Cardiovascular Phenomics, Center for Cardiovascular Research, Department of Pharmacology, and Center for Molecular Medicine, University of Nevada School of Medicine, Reno, NV, United States.
| |
Collapse
|
10
|
Simonti CN, Vernot B, Bastarache L, Bottinger E, Carrell DS, Chisholm RL, Crosslin DR, Hebbring SJ, Jarvik GP, Kullo IJ, Li R, Pathak J, Ritchie MD, Roden DM, Verma SS, Tromp G, Prato JD, Bush WS, Akey JM, Denny JC, Capra JA. The phenotypic legacy of admixture between modern humans and Neandertals. Science 2016; 351:737-41. [PMID: 26912863 DOI: 10.1126/science.aad2149] [Citation(s) in RCA: 152] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Many modern human genomes retain DNA inherited from interbreeding with archaic hominins, such as Neandertals, yet the influence of this admixture on human traits is largely unknown. We analyzed the contribution of common Neandertal variants to over 1000 electronic health record (EHR)-derived phenotypes in ~28,000 adults of European ancestry. We discovered and replicated associations of Neandertal alleles with neurological, psychiatric, immunological, and dermatological phenotypes. Neandertal alleles together explained a significant fraction of the variation in risk for depression and skin lesions resulting from sun exposure (actinic keratosis), and individual Neandertal alleles were significantly associated with specific human phenotypes, including hypercoagulation and tobacco use. Our results establish that archaic admixture influences disease risk in modern humans, provide hypotheses about the effects of hundreds of Neandertal haplotypes, and demonstrate the utility of EHR data in evolutionary analyses.
Collapse
Affiliation(s)
- Corinne N Simonti
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - Benjamin Vernot
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | | | - David S Carrell
- Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA, USA
| | - Rex L Chisholm
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - David R Crosslin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA. Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA, USA
| | - Scott J Hebbring
- Center for Human Genetics, Marshfield Clinic, Marshfield, WI, USA
| | - Gail P Jarvik
- Department of Genome Sciences, University of Washington, Seattle, WA, USA. Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA, USA
| | - Iftikhar J Kullo
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | - Rongling Li
- Division of Genomic Medicine, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jyotishman Pathak
- Division of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA. Biomedical and Translational Informatics, Geisinger Health System, Danville, PA, USA
| | - Dan M Roden
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA. Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA. Department of Medicine, Vanderbilt University, Nashville, TN, USA. Department of Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - Shefali S Verma
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
| | - Gerard Tromp
- Weis Center for Research, Geisinger Health System, Danville, PA, USA. Division of Molecular Biology and Human Genetics, Department of Biomedical Sciences, Faculty of Health Science, Stellenbosch University, Tygerberg, South Africa
| | - Jeffrey D Prato
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - William S Bush
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - Joshua M Akey
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Joshua C Denny
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA. Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA. Department of Medicine, Vanderbilt University, Nashville, TN, USA
| | - John A Capra
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA. Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA. Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA. Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
11
|
Maglo KN, Mersha TB, Martin LJ. Population Genomics and the Statistical Values of Race: An Interdisciplinary Perspective on the Biological Classification of Human Populations and Implications for Clinical Genetic Epidemiological Research. Front Genet 2016; 7:22. [PMID: 26925096 PMCID: PMC4756148 DOI: 10.3389/fgene.2016.00022] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 02/02/2016] [Indexed: 01/14/2023] Open
Abstract
The biological status and biomedical significance of the concept of race as applied to humans continue to be contentious issues despite the use of advanced statistical and clustering methods to determine continental ancestry. It is thus imperative for researchers to understand the limitations as well as potential uses of the concept of race in biology and biomedicine. This paper deals with the theoretical assumptions behind cluster analysis in human population genomics. Adopting an interdisciplinary approach, it demonstrates that the hypothesis that attributes the clustering of human populations to "frictional" effects of landform barriers at continental boundaries is empirically incoherent. It then contrasts the scientific status of the "cluster" and "cline" constructs in human population genomics, and shows how cluster may be instrumentally produced. It also shows how statistical values of race vindicate Darwin's argument that race is evolutionarily meaningless. Finally, the paper explains why, due to spatiotemporal parameters, evolutionary forces, and socio-cultural factors influencing population structure, continental ancestry may be pragmatically relevant to global and public health genomics. Overall, this work demonstrates that, from a biological systematic and evolutionary taxonomical perspective, human races/continental groups or clusters have no natural meaning or objective biological reality. In fact, the utility of racial categorizations in research and in clinics can be explained by spatiotemporal parameters, socio-cultural factors, and evolutionary forces affecting disease causation and treatment response.
Collapse
Affiliation(s)
- Koffi N Maglo
- Department of Philosophy, Center for Clinical and Translational Science and Training, University of Cincinnati Cincinnati, OH, USA
| | - Tesfaye B Mersha
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati Cincinnati, OH, USA
| | - Lisa J Martin
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati Cincinnati, OH, USA
| |
Collapse
|
12
|
Ritchie MD, de Andrade M, Kuivaniemi H. The foundation of precision medicine: integration of electronic health records with genomics through basic, clinical, and translational research. Front Genet 2015; 6:104. [PMID: 25852745 PMCID: PMC4362332 DOI: 10.3389/fgene.2015.00104] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 02/27/2015] [Indexed: 12/30/2022] Open
Affiliation(s)
- Marylyn D Ritchie
- Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University University Park, PA, USA ; Institute of Biomedical and Translational Informatics, Geisinger Health System Danville, PA, USA
| | - Mariza de Andrade
- Division of Biomedical Statistics and Informatics, Department of Health Science Research, Mayo Clinic Rochester, MN, USA
| | - Helena Kuivaniemi
- The Sigfried and Janet Weis Center for Research, Geisinger Health System Danville, PA, USA ; Department of Surgery, Temple University School of Medicine Philadelphia, PA, USA
| |
Collapse
|
13
|
Verma SS, de Andrade M, Tromp G, Kuivaniemi H, Pugh E, Namjou-Khales B, Mukherjee S, Jarvik GP, Kottyan LC, Burt A, Bradford Y, Armstrong GD, Derr K, Crawford DC, Haines JL, Li R, Crosslin D, Ritchie MD. Imputation and quality control steps for combining multiple genome-wide datasets. Front Genet 2014; 5:370. [PMID: 25566314 PMCID: PMC4263197 DOI: 10.3389/fgene.2014.00370] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Accepted: 10/03/2014] [Indexed: 12/16/2022] Open
Abstract
The electronic MEdical Records and GEnomics (eMERGE) network brings together DNA biobanks linked to electronic health records (EHRs) from multiple institutions. Approximately 51,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes), and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2) were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.
Collapse
Affiliation(s)
- Shefali S Verma
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University Pennsylvania, PA, USA
| | - Mariza de Andrade
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic Rochester, MN, USA
| | - Gerard Tromp
- The Sigfried and Janet Weis Center for Research, Geisinger Health System Danville, PA, USA
| | - Helena Kuivaniemi
- The Sigfried and Janet Weis Center for Research, Geisinger Health System Danville, PA, USA
| | - Elizabeth Pugh
- Center for Inherited Disease Research, John Hopkins University Baltimore, MD, USA
| | | | | | - Gail P Jarvik
- Department of Medicine, University of Washington Seattle, WA, USA
| | - Leah C Kottyan
- Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| | - Amber Burt
- Department of Medicine, University of Washington Seattle, WA, USA
| | - Yuki Bradford
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University Pennsylvania, PA, USA
| | - Gretta D Armstrong
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University Pennsylvania, PA, USA
| | - Kimberly Derr
- The Sigfried and Janet Weis Center for Research, Geisinger Health System Danville, PA, USA
| | - Dana C Crawford
- Center for Human Genetics Research, Vanderbilt University Nashville, TN, USA ; Department of Epidemiology and Biostatistics, Case Western University Cleveland, OH, USA
| | - Jonathan L Haines
- Department of Epidemiology and Biostatistics, Case Western University Cleveland, OH, USA
| | - Rongling Li
- Division of Genomic Medicine, National Human Genome Research Institute Bethesda, MD, USA
| | - David Crosslin
- Department of Medicine, University of Washington Seattle, WA, USA
| | - Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University Pennsylvania, PA, USA
| |
Collapse
|