1
|
Cruz LA, Cooke Bailey JN, Crawford DC. Importance of Diversity in Precision Medicine: Generalizability of Genetic Associations Across Ancestry Groups Toward Better Identification of Disease Susceptibility Variants. Annu Rev Biomed Data Sci 2023; 6:339-356. [PMID: 37196357 PMCID: PMC10720270 DOI: 10.1146/annurev-biodatasci-122220-113250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Genome-wide association studies (GWAS) revolutionized our understanding of common genetic variation and its impact on common human disease and traits. Developed and adopted in the mid-2000s, GWAS led to searchable genotype-phenotype catalogs and genome-wide datasets available for further data mining and analysis for the eventual development of translational applications. The GWAS revolution was swift and specific, including almost exclusively populations of European descent, to the neglect of the majority of the world's genetic diversity. In this narrative review, we recount the GWAS landscape of the early years that established a genotype-phenotype catalog that is now universally understood to be inadequate for a complete understanding of complex human genetics. We then describe approaches taken to augment the genotype-phenotype catalog, including the study populations, collaborative consortia, and study design approaches aimed to generalize and then ultimately discover genome-wide associations in non-European descent populations. The collaborations and data resources established in the efforts to diversify genomic findings undoubtedly provide the foundations of the next chapters of genetic association studies with the advent of budget-friendly whole-genome sequencing.
Collapse
Affiliation(s)
- Lauren A Cruz
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA;
- Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio, USA
| | - Jessica N Cooke Bailey
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA;
- Department of Genetics and Genome Sciences, Case Western Reserve University, Cleveland, Ohio, USA
- Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio, USA
| | - Dana C Crawford
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA;
- Department of Genetics and Genome Sciences, Case Western Reserve University, Cleveland, Ohio, USA
- Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio, USA
| |
Collapse
|
2
|
Drouet DE, Liu S, Crawford DC. Assessment of multi-population polygenic risk scores for lipid traits in African Americans. PeerJ 2023; 11:e14910. [PMID: 37214096 PMCID: PMC10198155 DOI: 10.7717/peerj.14910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 01/25/2023] [Indexed: 05/24/2023] Open
Abstract
Polygenic risk scores (PRS) based on genome-wide discoveries are promising predictors or classifiers of disease development, severity, and/or progression for common clinical outcomes. A major limitation of most risk scores is the paucity of genome-wide discoveries in diverse populations, prompting an emphasis to generate these needed data for trans-population and population-specific PRS construction. Given diverse genome-wide discoveries are just now being completed, there has been little opportunity for PRS to be evaluated in diverse populations independent from the discovery efforts. To fill this gap, we leverage here summary data from a recent genome-wide discovery study of lipid traits (HDL-C, LDL-C, triglycerides, and total cholesterol) conducted in diverse populations represented by African Americans, Hispanics, Asians, Native Hawaiians, Native Americans, and others by the Population Architecture using Genomics and Epidemiology (PAGE) Study. We constructed lipid trait PRS using PAGE Study published genetic variants and weights in an independent African American adult patient population linked to de-identified electronic health records and genotypes from the Illumina Metabochip (n = 3,254). Using multi-population lipid trait PRS, we assessed levels of association for their respective lipid traits, clinical outcomes (cardiovascular disease and type 2 diabetes), and common clinical labs. While none of the multi-population PRS were strongly associated with the tested trait or outcome, PRSLDL-Cwas nominally associated with cardiovascular disease. These data demonstrate the complexity in applying PRS to real-world clinical data even when data from multiple populations are available.
Collapse
Affiliation(s)
- Domenica E. Drouet
- Department of Medicine, Case Western Reserve University, Cleveland, OH, United States of America
| | - Shiying Liu
- Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, United States of America
| | - Dana C. Crawford
- Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, United States of America
- Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, United States of America
- Genetics and Genome Sciences, Case Western Reserve University, Cleveland, OH, United States of America
| |
Collapse
|
3
|
Nealon CL, Halladay CW, Kinzy TG, Simpson P, Canania RL, Anthony SA, Roncone DP, Sawicki Rogers LR, Leber JN, Dougherty JM, Sullivan JM, Wu WC, Greenberg PB, Iyengar SK, Crawford DC, Peachey NS, Bailey JNC, VA Million Veteran Program. Development and Evaluation of a Rules-based Algorithm for Primary Open-Angle Glaucoma in the VA Million Veteran Program. Ophthalmic Epidemiol 2022; 29:640-648. [PMID: 34822319 PMCID: PMC9583190 DOI: 10.1080/09286586.2021.1992784] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 08/20/2021] [Accepted: 10/09/2021] [Indexed: 10/19/2022]
Abstract
The availability of electronic health record (EHR)-linked biobank data for research presents opportunities to better understand complex ocular diseases. Developing accurate computable phenotypes for ocular diseases for which gold standard diagnosis includes imaging remains inaccessible in most biobank-linked EHRs. The objective of this study was to develop and validate a computable phenotype to identify primary open-angle glaucoma (POAG) through accessing the Department of Veterans Affairs (VA) Computerized Patient Record System (CPRS) and Million Veteran Program (MVP) biobank. Accessing CPRS clinical ophthalmology data from VA Medical Center Eye Clinic (VAMCEC) patients, we developed and iteratively refined POAG case and control algorithms based on clinical, prescription, and structured diagnosis data (ICD-CM codes). Refinement was performed via detailed chart review, initially at a single VAMCEC (n = 200) and validated at two additional VAMCECs (n = 100 each). Positive and negative predictive values (PPV, NPV) were computed as the proportion of CPRS patients correctly classified with POAG or without POAG, respectively, by the algorithms, validated by ophthalmologists and optometrists with access to gold-standard clinical diagnosis data. The final algorithms performed better than previously reported approaches in assuring the accuracy and reproducibility of POAG classification (PPV >83% and NPV >97%) with consistent performance in Black or African American and in White Veterans. Applied to the MVP to identify cases and controls, genetic analysis of a known POAG-associated locus further validated the algorithms. We conclude that ours is a viable approach to use combined EHR-genetic data to study patients with complex diseases that require imaging confirmation.
Collapse
Affiliation(s)
| | | | - Tyler G. Kinzy
- VA Northeast Ohio Healthcare System, Cleveland, OH
- Cleveland Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, OH
- Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH
| | | | | | | | | | | | - Jenna N. Leber
- Ophthalmology Section, VA Western NY Health Care System, Buffalo NY
| | | | - Jack M. Sullivan
- Ophthalmology Section, VA Western NY Health Care System, Buffalo NY
| | - Wen-Chih Wu
- Cardiology Section, Medical Service, Providence VA Medical Center, Providence, RI
| | - Paul B. Greenberg
- Ophthalmology Section, Providence VA Medical Center, Providence, RI
- Division of Ophthalmology, Alpert Medical School, Brown University, Providence, RI
| | - Sudha K. Iyengar
- VA Northeast Ohio Healthcare System, Cleveland, OH
- Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH
| | - Dana C. Crawford
- VA Northeast Ohio Healthcare System, Cleveland, OH
- Cleveland Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, OH
- Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH
| | - Neal S. Peachey
- VA Northeast Ohio Healthcare System, Cleveland, OH
- Cole Eye Institute, Cleveland Clinic Foundation, Cleveland, OH
- Department of Ophthalmology, Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, OH
| | - Jessica N. Cooke Bailey
- VA Northeast Ohio Healthcare System, Cleveland, OH
- Cleveland Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, OH
- Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH
| | | |
Collapse
|
4
|
Kaur H, Crawford DC, Liang J, Benchek P, COGENT BP Consortium, Zhu X, Kallianpur AR, Bush WS. Replication of European hypertension associations in a case-control study of 9,534 African Americans. PLoS One 2021; 16:e0259962. [PMID: 34793544 PMCID: PMC8601554 DOI: 10.1371/journal.pone.0259962] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 10/29/2021] [Indexed: 12/01/2022] Open
Abstract
OBJECTIVE Hypertension is more prevalent in African Americans (AA) than other ethnic groups. Genome-wide association studies (GWAS) have identified loci associated with hypertension and other cardio-metabolic traits like type 2 diabetes, coronary artery disease, and body mass index (BMI), however the AA population is underrepresented in these studies. In this study, we examined a large AA cohort for the generalizability of 14 Metabochip array SNPs with previously reported European hypertension associations. METHODS To evaluate associations, we analyzed genotype data of 14 SNPs for their associations with a diagnosis of hypertension, systolic blood pressure (SBP), and diastolic blood pressure (DBP) in a case-control study of an AA population (N = 9,534). We also performed an age-stratified analysis (>30, 30≥59 and ≥60 years) following the hypertension definition described by the 8th Joint National Committee (JNC). Associations were adjusted for BMI, age, age2, sex, clinical confounders, and genetic ancestry using multivariable regression models to estimate odds ratios (ORs) and beta-coefficients. Analyses stratified by sex were also conducted. Meta-analyses (including both BioVU and COGENT-BP cohorts) were performed using a random-effects model. RESULTS We found rs880315 to be associated with systolic hypertension (SBP≥140 mmHg) in the entire cohort (OR = 1.14, p = 0.003) and within women only (OR = 1.16, p = 0.012). Variant rs17080093 associated with lower SBP and DBP (β = -2.99, p = 0.0352 and - β = 1.69, p = 0.0184) among younger individuals, particularly in younger women (β = -3.92, p = 0.0025 and β = -1.87, p = 0.0241 for SBP and DBP respectively). SNP rs1530440 associated with higher SBP and DBP measurements (younger individuals β = 4.1, p = 0.039 and β = 2.5, p = 0.043 for SBP and DBP; (younger women β = 4.5, p = 0.025 and β = 2.9, p = 0.028 for SBP and DBP), and hypertension risk in older women (OR = 1.4, p = 0.050). rs16948048 increases hypertension risk in younger individuals (OR = 1.31, p = 0.011). Among mid-age women rs880315 associated with higher risk of hypertension (OR = 1.20, p = 0.027). rs1361831 associated with DBP (β = -1.96, p = 0.02) among individuals older than 60 years. rs3096277 increases hypertension risk among older individuals (OR = 1.26 p = 0.0015), however, this variant also reduces SBP among younger women (β = -2.63, p = 0.0102). CONCLUSION These findings suggest that European-descent and AA populations share genetic loci that contribute to blood pressure traits and hypertension. However, the OR and beta-coefficient estimates differ, and some are age-dependent. Additional genetic studies of hypertension in AA are warranted to identify new loci associated with hypertension and blood pressure traits in this population.
Collapse
Affiliation(s)
- Harpreet Kaur
- Genomic Medicine Institute, Cleveland Clinic/Lerner Research Institute, Cleveland, OH, United States of America
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, United States of America
| | - Dana C. Crawford
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, United States of America
| | - Jingjing Liang
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, United States of America
| | - Penelope Benchek
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, United States of America
| | | | - Xiaofeng Zhu
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, United States of America
| | - Asha R. Kallianpur
- Genomic Medicine Institute, Cleveland Clinic/Lerner Research Institute, Cleveland, OH, United States of America
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, OH, United States of America
| | - William S. Bush
- Genomic Medicine Institute, Cleveland Clinic/Lerner Research Institute, Cleveland, OH, United States of America
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, United States of America
| |
Collapse
|
5
|
Veatch OJ, Bauer CR, Keenan BT, Josyula NS, Mazzotti DR, Bagai K, Malow BA, Robishaw JD, Pack AI, Pendergrass SA. Characterization of genetic and phenotypic heterogeneity of obstructive sleep apnea using electronic health records. BMC Med Genomics 2020; 13:105. [PMID: 32711518 PMCID: PMC7382070 DOI: 10.1186/s12920-020-00755-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 07/13/2020] [Indexed: 12/22/2022] Open
Abstract
Background Obstructive sleep apnea (OSA) is defined by frequent episodes of reduced or complete cessation of airflow during sleep and is linked to negative health outcomes. Understanding the genetic factors influencing expression of OSA may lead to new treatment strategies. Electronic health records (EHRs) can be leveraged to both validate previously reported OSA-associated genomic variation and detect novel relationships between these variants and comorbidities. Methods We identified candidate single nucleotide polymorphisms (SNPs) via systematic literature review of existing research. Using datasets available at Geisinger (n = 39,407) and Vanderbilt University Medical Center (n = 24,084), we evaluated associations between 40 previously implicated SNPs and OSA diagnosis, defined using clinical codes. We also evaluated associations between these SNPs and OSA severity measures obtained from sleep reports at Geisinger (n = 6571). Finally, we used a phenome-wide association study approach to help reveal pleiotropic genetic effects between OSA candidate SNPs and other clinical codes and laboratory values available in the EHR. Results Most previously reported OSA candidate SNPs showed minimal to no evidence for associations with OSA diagnosis or severity in the EHR-derived datasets. Three SNPs in LEPR, MMP-9, and GABBR1 validated for an association with OSA diagnosis in European Americans; the SNP in GABBR1 was associated following meta-analysis of results from both clinical populations. The GABBR1 and LEPR SNPs, and one additional SNP, were associated with OSA severity measures in European Americans from Geisinger. Three additional candidate OSA SNPs were not associated with OSA-related traits but instead with hyperlipidemia and autoimmune diseases of the thyroid. Conclusions To our knowledge, this is one of the largest candidate gene studies and one of the first phenome-wide association studies of OSA genomic variation. Results validate genetic associates with OSA in the LEPR, MMP-9 and GABBR1 genes, but suggest that the majority of previously identified genetic associations with OSA may be false positives. Phenome-wide analyses provide evidence of mediated pleiotropy. Future well-powered genome-wide association analyses of OSA risk and severity across populations with diverse ancestral backgrounds are needed. The comprehensive nature of the analyses represents a platform for informing future work focused on understanding how genetic data can be useful to informing treatment of OSA and related comorbidities.
Collapse
Affiliation(s)
- Olivia J Veatch
- Division of Sleep Medicine/Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, 125 S. 31st St, Office 2123, Philadelphia, PA, 19104, USA. .,Sleep Disorders Division/Department of Neurology, Vanderbilt University Medical Center, Nashville, TN, 37232, USA. .,Department of Psychiatry & Behavioral Sciences, University of Kansas Medical Center, Mail-Stop 4015, 3901 Rainbow Blvd., Kansas City, KS, 66160, USA.
| | | | - Brendan T Keenan
- Division of Sleep Medicine/Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, 125 S. 31st St, Office 2123, Philadelphia, PA, 19104, USA
| | | | - Diego R Mazzotti
- Division of Sleep Medicine/Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, 125 S. 31st St, Office 2123, Philadelphia, PA, 19104, USA
| | - Kanika Bagai
- Sleep Disorders Division/Department of Neurology, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Beth A Malow
- Sleep Disorders Division/Department of Neurology, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Janet D Robishaw
- Department of Biomedical Science, Charles E. Schmidt College of Medicine, Florida Atlantic University, Boca Raton, FL, 33431, USA
| | - Allan I Pack
- Division of Sleep Medicine/Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, 125 S. 31st St, Office 2123, Philadelphia, PA, 19104, USA
| | | |
Collapse
|
6
|
Pendergrass SA, Buyske S, Jeff JM, Frase A, Dudek S, Bradford Y, Ambite JL, Avery CL, Buzkova P, Deelman E, Fesinmeyer MD, Haiman C, Heiss G, Hindorff LA, Hsu CN, Jackson RD, Lin Y, Le Marchand L, Matise TC, Monroe KR, Moreland L, North KE, Park SL, Reiner A, Wallace R, Wilkens LR, Kooperberg C, Ritchie MD, Crawford DC. A phenome-wide association study (PheWAS) in the Population Architecture using Genomics and Epidemiology (PAGE) study reveals potential pleiotropy in African Americans. PLoS One 2019; 14:e0226771. [PMID: 31891604 PMCID: PMC6938343 DOI: 10.1371/journal.pone.0226771] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 12/03/2019] [Indexed: 12/11/2022] Open
Abstract
We performed a hypothesis-generating phenome-wide association study (PheWAS) to identify and characterize cross-phenotype associations, where one SNP is associated with two or more phenotypes, between thousands of genetic variants assayed on the Metabochip and hundreds of phenotypes in 5,897 African Americans as part of the Population Architecture using Genomics and Epidemiology (PAGE) I study. The PAGE I study was a National Human Genome Research Institute-funded collaboration of four study sites accessing diverse epidemiologic studies genotyped on the Metabochip, a custom genotyping chip that has dense coverage of regions in the genome previously associated with cardio-metabolic traits and outcomes in mostly European-descent populations. Here we focus on identifying novel phenome-genome relationships, where SNPs are associated with more than one phenotype. To do this, we performed a PheWAS, testing each SNP on the Metabochip for an association with up to 273 phenotypes in the participating PAGE I study sites. We identified 133 putative pleiotropic variants, defined as SNPs associated at an empirically derived p-value threshold of p<0.01 in two or more PAGE study sites for two or more phenotype classes. We further annotated these PheWAS-identified variants using publicly available functional data and local genetic ancestry. Amongst our novel findings is SPARC rs4958487, associated with increased glucose levels and hypertension. SPARC has been implicated in the pathogenesis of diabetes and is also known to have a potential role in fibrosis, a common consequence of multiple conditions including hypertension. The SPARC example and others highlight the potential that PheWAS approaches have in improving our understanding of complex disease architecture by identifying novel relationships between genetic variants and an array of common human phenotypes.
Collapse
Affiliation(s)
| | - Steven Buyske
- Department of Statistics, Rutgers University, Piscataway, New Jersey, United States of America
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
| | - Janina M. Jeff
- Illumina, Inc., San Diego, California, United States of America
| | - Alex Frase
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Scott Dudek
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Yuki Bradford
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jose-Luis Ambite
- Information Sciences Institute; University of Southern California, Marina del Rey, California, United States of America
| | - Christy L. Avery
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Petra Buzkova
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Ewa Deelman
- Information Sciences Institute; University of Southern California, Marina del Rey, California, United States of America
| | | | - Christopher Haiman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, California, United States of America
| | - Gerardo Heiss
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Lucia A. Hindorff
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Chun-Nan Hsu
- Center for Research in Biological Systems, Department of Neurosciences, University of California, San Diego, La Jolla, California, United States of America
| | | | - Yi Lin
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Loic Le Marchand
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
| | - Tara C. Matise
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
| | - Kristine R. Monroe
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, California, United States of America
| | - Larry Moreland
- University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Kari E. North
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Sungshim L. Park
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, California, United States of America
| | - Alex Reiner
- Department of Epidemiology, University of Washington, Seattle, Washington, United States of America
| | - Robert Wallace
- Departments of Epidemiology and Internal Medicine, University of Iowa, Iowa City, Iowa, United States of America
| | - Lynne R. Wilkens
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Marylyn D. Ritchie
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Dana C. Crawford
- Cleveland Institute for Computational Biology, Cleveland, Ohio, United States of America
- Departments of Population and Quantitative Health Sciences and Genetics and Genome Sciences, Case Western Reserve University, Cleveland, Ohio, United States of America
- * E-mail:
| |
Collapse
|
7
|
Hollister BM, Farber-Eger E, Aldrich MC, Crawford DC. A Social Determinant of Health May Modify Genetic Associations for Blood Pressure: Evidence From a SNP by Education Interaction in an African American Population. Front Genet 2019; 10:428. [PMID: 31134134 PMCID: PMC6523518 DOI: 10.3389/fgene.2019.00428] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 04/18/2019] [Indexed: 01/11/2023] Open
Abstract
African Americans experience the highest burden of hypertension in the United States compared with other groups. Genetic contributions to this complex condition are now emerging in this as well as other populations through large-scale genome-wide association studies (GWAS) and meta-analyses. Despite these recent discovery efforts, relatively few large-scale studies of blood pressure have considered the joint influence of genetics and social determinants of health despite extensive evidence supporting their impact on hypertension. To identify these expected interactions, we accessed a subset of the Vanderbilt University Medical Center (VUMC) biorepository linked to de-identified electronic health records (EHRs) of adult African Americans genotyped using the Illumina Metabochip (n = 2,577). To examine potential interactions between education, a recognized social determinant of health, and genetic variants contributing to blood pressure, we used linear regression models to investigate two-way interactions for systolic and diastolic blood pressure (DBP). We identified a two-way interaction between rs6687976 and education affecting DBP (p = 0.052). Individuals homozygous for the minor allele and having less than a high school education had higher DBP compared with (1) individuals homozygous for the minor allele and high school education or greater and (2) individuals not homozygous for the minor allele and less than a high school education. To our knowledge, this is the first EHR -based study to suggest a gene-environment interaction for blood pressure in African Americans, supporting the hypothesis that genetic contributions to hypertension may be modulated by social factors.
Collapse
Affiliation(s)
- Brittany M Hollister
- Social and Behavioral Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, United States
| | - Eric Farber-Eger
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Melinda C Aldrich
- Department of Thoracic Surgery, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Dana C Crawford
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, United States
| |
Collapse
|
8
|
Using high-throughput sequencing for investigating intra-host hepatitis C evolution over long retrospective periods. INFECTION GENETICS AND EVOLUTION 2018; 67:136-144. [PMID: 30395998 DOI: 10.1016/j.meegid.2018.11.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 09/11/2018] [Accepted: 11/02/2018] [Indexed: 12/13/2022]
Abstract
Collections of biological samples held by hospitals represent invaluable resources for conducting retrospective evolutionary studies of chronic infections. Using high-throughput sequencing, those collections permit analysis of within-host genetic diversity over long follow-up periods, and allow a better understanding of resistance to treatment regimes during disease evolution. Here, we studied the evolution of hepatitis C virus (HCV) populations in two patients with an absence of response to dual therapies. We implemented amplicon sequencing to survey genomic variation at the Core and NS5B regions of HCV over a period of 13 years from blood samples obtained at multiple time points. We observed mixed infection by multiple HCV genotypes in both patients. Genetic heterogeneity and sample composition analysis provided information about the changes in viral population over the course of clinical treatment, with NS5B experiencing an increase in diversity after treatment initiation. Secondary infections were estimated to predate treatment year, and our results pointed towards diversifying selection occurring post-treatment, acting on standing genomic variation and maintaining high genetic heterogeneity during infection. For these two patients infected with multiple HCV genotypes, the maintenance of viral diversity was explained with the hypothesis of soft selective sweep started at the same time as antiviral treatment was initiated.
Collapse
|
9
|
Restrepo NA, Laper SM, Farber-Eger E, Crawford DC. Local genetic ancestry in CDKN2B-AS1 is associated with primary open-angle glaucoma in an African American cohort extracted from de-identified electronic health records. BMC Med Genomics 2018; 11:70. [PMID: 30255811 PMCID: PMC6157155 DOI: 10.1186/s12920-018-0392-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Glaucoma is a leading cause of blindness in developed countries. Primary open-angle glaucoma (POAG), the most prevalent clinical subtype of glaucoma in the United States, affects African Americans at a higher rate compared with European Americans. Risk factors identified for POAG include increased age and family history, which coupled with heritability estimates, suggest this complex condition is associated with genetic and environmental factors. To date, several genome-wide studies have identified loci significantly associated with POAG risk, but most of these studies were performed in populations of European-descent. METHODS To identify population-specific and trans-population genetic associations for POAG, we genotyped 11,521 African Americans using the Illumina Metabochip as part of the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study accessing BioVU, the Vanderbilt University Medical Center's biorepository linked to de-identified electronic health records. Among this study population, we identified 138 cases of POAG and 1376 controls and performed Metabochip-wide tests of association. We also estimated local genetic ancestry at CDKN2B-AS1, a POAG-associated locus established in European-descent populations. RESULTS Overall, we did not identify significant single SNP-POAG associations after adjusting for multiple testing. We did, however, detect a significant association between POAG risk and local African genetic ancestry at CDKN2B-AS1, where on average cases were of 90% African descent compared with controls at 58% (p = 2 × 10- 6). CONCLUSIONS These data suggest that CDKN2B-AS1 is an important locus for POAG risk among African Americans, warranting further investigation to identify the variants underlying this association.
Collapse
Affiliation(s)
- Nicole A Restrepo
- Department of Population and Quantitative Health Sciences, Institute for Computational Biology, Case Western Reserve University, 2103 Cornell Road, Wolstein Research Building, Suite 2-527, Cleveland, OH, 44106, USA
| | | | - Eric Farber-Eger
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Dana C Crawford
- Department of Population and Quantitative Health Sciences, Institute for Computational Biology, Case Western Reserve University, 2103 Cornell Road, Wolstein Research Building, Suite 2-527, Cleveland, OH, 44106, USA.
| |
Collapse
|
10
|
Smieszek S, Mitchell SL, Farber-Eger EH, Veatch OJ, Wheeler NR, Goodloe RJ, Wells QS, Murdock DG, Crawford DC. Hi-MC: a novel method for high-throughput mitochondrial haplogroup classification. PeerJ 2018; 6:e5149. [PMID: 29967758 PMCID: PMC6022720 DOI: 10.7717/peerj.5149] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Accepted: 06/12/2018] [Indexed: 11/20/2022] Open
Abstract
Effective approaches for assessing mitochondrial DNA (mtDNA) variation are important to multiple scientific disciplines. Mitochondrial haplogroups characterize branch points in the phylogeny of mtDNA. Several tools exist for mitochondrial haplogroup classification. However, most require full or partial mtDNA sequence which is often cost prohibitive for studies with large sample sizes. The purpose of this study was to develop Hi-MC, a high-throughput method for mitochondrial haplogroup classification that is cost effective and applicable to large sample sizes making mitochondrial analysis more accessible in genetic studies. Using rigorous selection criteria, we defined and validated a custom panel of mtDNA single nucleotide polymorphisms that allows for accurate classification of European, African, and Native American mitochondrial haplogroups at broad resolution with minimal genotyping and cost. We demonstrate that Hi-MC performs well in samples of European, African, and Native American ancestries, and that Hi-MC performs comparably to a commonly used classifier. Implementation as a software package in R enables users to download and run the program locally, grants greater flexibility in the number of samples that can be run, and allows for easy expansion in future revisions. Hi-MC is available in the CRAN repository and the source code is freely available at https://github.com/vserch/himc.
Collapse
Affiliation(s)
- Sandra Smieszek
- Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
- Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
| | - Sabrina L. Mitchell
- Vanderbilt Eye Institute and Department of Ophthalmology & Visual Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Eric H. Farber-Eger
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Olivia J. Veatch
- Department of Neurology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Nicholas R. Wheeler
- Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
- Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
| | - Robert J. Goodloe
- Center for Human Genetics Research, Vanderbilt University, Nashville, TN, USA
| | - Quinn S. Wells
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - Deborah G. Murdock
- Center for Mitochondrial and Epigenomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Dana C. Crawford
- Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
- Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
| |
Collapse
|
11
|
Fish AE, Crawford DC, Capra JA, Bush WS. Local ancestry transitions modify snp-trait associations. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:424-435. [PMID: 29218902 PMCID: PMC5728664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Genomic maps of local ancestry identify ancestry transitions - points on a chromosome where recent recombination events in admixed individuals have joined two different ancestral haplotypes. These events bring together alleles that evolved within separate continential populations, providing a unique opportunity to evaluate the joint effect of these alleles on health outcomes. In this work, we evaluate the impact of genetic variants in the context of nearby local ancestry transitions within a sample of nearly 10,000 adults of African ancestry with traits derived from electronic health records. Genetic data was located using the Metabochip, and used to derive local ancestry. We develop a model that captures the effect of both single variants and local ancestry, and use it to identify examples where local ancestry transitions significantly interact with nearby variants to influence metabolic traits. In our most compelling example, we find that the minor allele of rs16890640 occuring on a European background with a downstream local ancestry transition to African ancestry results in significantly lower mean corpuscular hemoglobin and volume. This finding represents a new way of discovering genetic interactions, and is supported by molecular data that suggest changes to local ancestry may impact local chromatin looping.
Collapse
Affiliation(s)
- Alexandra E Fish
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37235, USA, ²Departments of Biological Sciences, Biomedical Informatics, and Computer Science, Vanderbilt University, Nashville, TN 37235, USA,
| | | | | | | |
Collapse
|
12
|
CRAWFORD DANAC, MORGAN ALEXANDERA, DENNY JOSHUAC, ARONOW BRUCEJ, BRENNER STEVENE. PRECISION MEDICINE: FROM DIPLOTYPES TO DISPARITIES TOWARDS IMPROVED HEALTH AND THERAPIES. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:389-399. [PMID: 29218899 PMCID: PMC6182117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Precision medicine research efforts both in basic science discovery and clinical implementation are well underway and promise to provide individualized preventions and treatments, improving overall health care delivery. To achieve these goals, advances in data capture and analysis are needed spanning different types of 'omic and clinical data. The efforts to enhance precise treatments for all may accentuate healthcare disparities unless specific challenges are identified and addressed. This session of the 2018 Pacific Symposium on Biocomputing presents the latest developments in this transdisciplinary research space of genomics, medicine, and population health.
Collapse
Affiliation(s)
- DANA C. CRAWFORD
- Population and Quantitative Health Sciences, Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, 44106 USA,
| | | | - JOSHUA C. DENNY
- Vanderbilt University Medical Center, Nashville, TN 37203 USA,
| | - BRUCE J. ARONOW
- Center for Computational Medicine, Cincinnati Children’s Hospital Medical Center and the University of Cincinnati, Cincinnati, OH 45229 USA,
| | | |
Collapse
|
13
|
Martin-Sanchez FJ, Aguiar-Pulido V, Lopez-Campos GH, Peek N, Sacchi L. Secondary Use and Analysis of Big Data Collected for Patient Care. Yearb Med Inform 2017; 26:28-37. [PMID: 28480474 PMCID: PMC6239231 DOI: 10.15265/iy-2017-008] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Objectives: To identify common methodological challenges and review relevant initiatives related to the re-use of patient data collected in routine clinical care, as well as to analyze the economic benefits derived from the secondary use of this data. Through the use of several examples, this article aims to provide a glimpse into the different areas of application, namely clinical research, genomic research, study of environmental factors, and population and health services research. This paper describes some of the informatics methods and Big Data resources developed in this context, such as electronic phenotyping, clinical research networks, biorepositories, screening data banks, and wide association studies. Lastly, some of the potential limitations of these approaches are discussed, focusing on confounding factors and data quality. Methods: A series of literature searches in main bibliographic databases have been conducted in order to assess the extent to which existing patient data has been repurposed for research. This contribution from the IMIA working group on "Data mining and Big Data analytics" focuses on the literature published during the last two years, covering the timeframe since the working group's last survey. Results and Conclusions: Although most of the examples of secondary use of patient data lie in the arena of clinical and health services research, we have started to witness other important applications, particularly in the area of genomic research and the study of health effects of environmental factors. Further research is needed to characterize the economic impact of secondary use across the broad spectrum of translational research.
Collapse
Affiliation(s)
- F. J. Martin-Sanchez
- Weill Cornell Medicine, Department of Healthcare Policy and Research, Division of Health Informatics, New York, USA
| | - V. Aguiar-Pulido
- Weill Cornell Medicine, Brain and Mind Research Institute, New York, USA
| | - G. H. Lopez-Campos
- The University of Melbourne, Health & Biomedical Informatics Centre, Melbourne, Australia
| | - N. Peek
- MRC Health e-Research Centre, Division of Informatics, Imaging and Data Science, The University of Manchester, Manchester, UK
| | - L. Sacchi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| |
Collapse
|
14
|
Goodloe R, Farber-Eger E, Boston J, Crawford DC, Bush WS. Reducing Clinical Noise for Body Mass Index Measures Due to Unit and Transcription Errors in the Electronic Health Record. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017; 2017:102-111. [PMID: 28815116 PMCID: PMC5543370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Abstract
Body mass index (BMI) is an important outcome and covariate adjustment for many clinical association studies. Accurate assessment of BMI, therefore, is a critical part of many study designs. Electronic health records (EHRs) are a growing source of clinical data for research purposes, and have proven useful for identifying and replicating genetic associations. EHR-based data collected for clinical and billing purposes have several unique properties, including a high degree of heterogeneity or "clinical noise." In this work, we propose a new method for reducing the problems of transcription and recording error for height and weight and apply these methods to a subset of the Vanderbilt University Medical Center biorepository known as EAGLE BioVU (n=15,863). After processing, we show that the distribution of BMI from EAGLE BioVU closely matches population-based estimates from the National Health and Nutrition Examination Surveys (NHANES), and that our approach retains far more data points than traditional outlier detection methods.
Collapse
Affiliation(s)
- Robert Goodloe
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Eric Farber-Eger
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jonathan Boston
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Dana C. Crawford
- Institute for Computational Biology, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - William S. Bush
- Institute for Computational Biology, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| |
Collapse
|
15
|
Extracting Country-of-Origin from Electronic Health Records for Gene- Environment Studies as Part of the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) Study. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017; 2017:50-57. [PMID: 28815105 PMCID: PMC5543359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
We describe here the extraction of country-of-origin, an acculturation variable relevant for gene-environment studies, in a biorepository linked to de-identified electronic health records (EHRs) assessed by the Epidemiologic Architecture for Genes Linked to Environment (EAGLE), a study site of the Population Architecture using Genomics and Epidemiology (PAGE) I study. We extracted country-of-origin from the unstructured clinical free text using regular expressions within the MySQL relational database system in a cohort of 15,863 subjects of mostly non-European descent (including 11,519 African Americans, 1,702 Hispanics, and 1,118 Asians). We performed searches for 231 world countries (including independent sovereign states, dependent areas, and disputed territories) and common misspellings in >14 gigabytes of data including >13 billion characters of clinical text. Manual review of a fraction of the initial country-of-origin assignments established rules for data cleaning and quality control to achieve final country-of-origin status for each subject. After data cleaning, a total of 1,911/15,893 (12.02%) subjects were assigned to a country-of-origin outside of the United States. Mexico was the most commonly assigned country outside of the United States (264 subjects; 13.8% of subjects with a foreign country-of-origin assignment). The distribution of the countries assigned followed expectations based on known migration patterns to the United States with an emphasis on the southeastern region. These data suggest country-of-origin can be successfully extracted from unstructured clinical text for downstream genetic association studies.
Collapse
|
16
|
Hollister BM, Restrepo NA, Farber-Eger E, Crawford DC, Aldrich MC, Non A. DEVELOPMENT AND PERFORMANCE OF TEXT-MINING ALGORITHMS TO EXTRACT SOCIOECONOMIC STATUS FROM DE-IDENTIFIED ELECTRONIC HEALTH RECORDS. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017; 22:230-241. [PMID: 27896978 DOI: 10.1142/9789813207813_0023] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Socioeconomic status (SES) is a fundamental contributor to health, and a key factor underlying racial disparities in disease. However, SES data are rarely included in genetic studies due in part to the difficultly of collecting these data when studies were not originally designed for that purpose. The emergence of large clinic-based biobanks linked to electronic health records (EHRs) provides research access to large patient populations with longitudinal phenotype data captured in structured fields as billing codes, procedure codes, and prescriptions. SES data however, are often not explicitly recorded in structured fields, but rather recorded in the free text of clinical notes and communications. The content and completeness of these data vary widely by practitioner. To enable gene-environment studies that consider SES as an exposure, we sought to extract SES variables from racial/ethnic minority adult patients (n=9,977) in BioVU, the Vanderbilt University Medical Center biorepository linked to de-identified EHRs. We developed several measures of SES using information available within the de-identified EHR, including broad categories of occupation, education, insurance status, and homelessness. Two hundred patients were randomly selected for manual review to develop a set of seven algorithms for extracting SES information from de-identified EHRs. The algorithms consist of 15 categories of information, with 830 unique search terms. SES data extracted from manual review of 50 randomly selected records were compared to data produced by the algorithm, resulting in positive predictive values of 80.0% (education), 85.4% (occupation), 87.5% (unemployment), 63.6% (retirement), 23.1% (uninsured), 81.8% (Medicaid), and 33.3% (homelessness), suggesting some categories of SES data are easier to extract in this EHR than others. The SES data extraction approach developed here will enable future EHR-based genetic studies to integrate SES information into statistical analyses. Ultimately, incorporation of measures of SES into genetic studies will help elucidate the impact of the social environment on disease risk and outcomes.
Collapse
Affiliation(s)
- Brittany M Hollister
- Vanderbilt Genetics Institute, Vanderbilt University, 519 Light Hall, 2215 Garland Ave. South, Nashville, TN, 37232, USA,
| | | | | | | | | | | |
Collapse
|
17
|
Fernández-Rhodes L, Gong J, Haessler J, Franceschini N, Graff M, Nishimura KK, Wang Y, Highland HM, Yoneyama S, Bush WS, Goodloe R, Ritchie MD, Crawford D, Gross M, Fornage M, Buzkova P, Tao R, Isasi C, Avilés-Santa L, Daviglus M, Mackey RH, Houston D, Gu CC, Ehret G, Nguyen KDH, Lewis CE, Leppert M, Irvin MR, Lim U, Haiman CA, Le Marchand L, Schumacher F, Wilkens L, Lu Y, Bottinger EP, Loos RJL, Sheu WHH, Guo X, Lee WJ, Hai Y, Hung YJ, Absher D, Wu IC, Taylor KD, Lee IT, Liu Y, Wang TD, Quertermous T, Juang JMJ, Rotter JI, Assimes T, Hsiung CA, Chen YDI, Prentice R, Kuller LH, Manson JE, Kooperberg C, Smokowski P, Robinson WR, Gordon-Larsen P, Li R, Hindorff L, Buyske S, Matise TC, Peters U, North KE. Trans-ethnic fine-mapping of genetic loci for body mass index in the diverse ancestral populations of the Population Architecture using Genomics and Epidemiology (PAGE) Study reveals evidence for multiple signals at established loci. Hum Genet 2017; 136:771-800. [PMID: 28391526 PMCID: PMC5485655 DOI: 10.1007/s00439-017-1787-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Accepted: 03/23/2017] [Indexed: 11/26/2022]
Abstract
Most body mass index (BMI) genetic loci have been identified in studies of primarily European ancestries. The effect of these loci in other racial/ethnic groups is less clear. Thus, we aimed to characterize the generalizability of 170 established BMI variants, or their proxies, to diverse US populations and trans-ethnically fine-map 36 BMI loci using a sample of >102,000 adults of African, Hispanic/Latino, Asian, European and American Indian/Alaskan Native descent from the Population Architecture using Genomics and Epidemiology Study. We performed linear regression of the natural log of BMI (18.5-70 kg/m2) on the additive single nucleotide polymorphisms (SNPs) at BMI loci on the MetaboChip (Illumina, Inc.), adjusting for age, sex, population stratification, study site, or relatedness. We then performed fixed-effect meta-analyses and a Bayesian trans-ethnic meta-analysis to empirically cluster by allele frequency differences. Finally, we approximated conditional and joint associations to test for the presence of secondary signals. We noted directional consistency with the previously reported risk alleles beyond what would have been expected by chance (binomial p < 0.05). Nearly, a quarter of the previously described BMI index SNPs and 29 of 36 densely-genotyped BMI loci on the MetaboChip replicated/generalized in trans-ethnic analyses. We observed multiple signals at nine loci, including the description of seven loci with novel multiple signals. This study supports the generalization of most common genetic loci to diverse ancestral populations and emphasizes the importance of dense multiethnic genomic data in refining the functional variation at genetic loci of interest and describing several loci with multiple underlying genetic variants.
Collapse
Affiliation(s)
- Lindsay Fernández-Rhodes
- Department of Epidemiology, UNC Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| | - Jian Gong
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Jeffrey Haessler
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Nora Franceschini
- Department of Epidemiology, UNC Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Mariaelisa Graff
- Department of Epidemiology, UNC Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Katherine K Nishimura
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Yujie Wang
- Department of Epidemiology, UNC Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Heather M Highland
- Department of Epidemiology, UNC Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Sachiko Yoneyama
- Department of Epidemiology, UNC Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - William S Bush
- Department of Epidemiology and Biostatistics, Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
| | - Robert Goodloe
- Center for Human Genetics Research, Vanderbilt University, Nashville, TN, USA
| | - Marylyn D Ritchie
- Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
| | - Dana Crawford
- Department of Epidemiology and Biostatistics, Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
| | - Myron Gross
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA
| | - Myriam Fornage
- Center for Human Genetics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Petra Buzkova
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA, USA
| | - Ran Tao
- Department of Biostatistics, UNC Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Carmen Isasi
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | | | - Martha Daviglus
- Insitute of Minority Health Research, University of Illinois at Chicago, Chicago, IL, USA
| | - Rachel H Mackey
- Department of Epidemiology, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Denise Houston
- Geriatrics and Gerontology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - C Charles Gu
- Division of Biostatistics, School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Georg Ehret
- Center for Complex Disease Genomics, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
- Division of Cardiology, Geneva University Hospital, Geneva, OH, Switzerland
| | - Khanh-Dung H Nguyen
- Center for Complex Disease Genomics, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Cora E Lewis
- Department of Medicine, University of Alabama, Birmingham, AL, USA
| | - Mark Leppert
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | | | - Unhee Lim
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, USA
| | - Christopher A Haiman
- Department of Preventive Medicine, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Loic Le Marchand
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, USA
| | - Fredrick Schumacher
- Department of Preventive Medicine, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Lynne Wilkens
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, USA
| | - Yingchang Lu
- Charles R. Bronfman Instituted for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Erwin P Bottinger
- Charles R. Bronfman Instituted for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ruth J L Loos
- Charles R. Bronfman Instituted for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Wayne H-H Sheu
- Division of Endocrine and Metabolism, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung, Taiwan
- School of Medicine, National Defense Medical Center, National Yang-Ming University, Taipei, Taiwan
| | - Xiuqing Guo
- Department of Pediatrics, Institute for Translational Genomics and Population Sciences, LABioMed at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Wen-Jane Lee
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Yang Hai
- Department of Pediatrics, Institute for Translational Genomics and Population Sciences, LABioMed at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Yi-Jen Hung
- Division of Endocrinology and Metabolism, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
| | - Devin Absher
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - I-Chien Wu
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan Town, Taiwan
| | - Kent D Taylor
- Department of Pediatrics, Institute for Translational Genomics and Population Sciences, LABioMed at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - I-Te Lee
- Division of Endocrine and Metabolism, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung, Taiwan
- School of Medicine, National Defense Medical Center, National Yang-Ming University, Taipei, Taiwan
| | - Yeheng Liu
- Department of Pediatrics, Institute for Translational Genomics and Population Sciences, LABioMed at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Tzung-Dau Wang
- Division of Cardiology, Department of Internal Medicine, Cardiovascular Center, National Taiwan University Hospital, Taipei, Taiwan
| | - Thomas Quertermous
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Jyh-Ming J Juang
- Division of Cardiology, Department of Internal Medicine, Cardiovascular Center, National Taiwan University Hospital, Taipei, Taiwan
| | - Jerome I Rotter
- Department of Pediatrics, Institute for Translational Genomics and Population Sciences, LABioMed at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Themistocles Assimes
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Chao A Hsiung
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan Town, Taiwan
| | - Yii-Der Ida Chen
- Department of Pediatrics, Institute for Translational Genomics and Population Sciences, LABioMed at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Ross Prentice
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Lewis H Kuller
- Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - JoAnn E Manson
- Division of Preventive Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Paul Smokowski
- School of Social Welfare, The University of Kansas, Lawrence, KS, USA
| | - Whitney R Robinson
- Department of Epidemiology, UNC Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Penny Gordon-Larsen
- Department of Nutrition, UNC Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Rongling Li
- Division of Genomic Medicine, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Hindorff
- Division of Genomic Medicine, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Steven Buyske
- Department of Statistics and Biostatistics, Rutgers University, Piscataway, NJ, USA
| | - Tara C Matise
- Department of Genetics, Rutgers University, Piscataway, NJ, USA
| | - Ulrike Peters
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Kari E North
- Department of Epidemiology, UNC Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
18
|
Restrepo NA, Butkiewicz M, McGrath JA, Crawford DC. Shared Genetic Etiology of Autoimmune Diseases in Patients from a Biorepository Linked to De-identified Electronic Health Records. Front Genet 2016; 7:185. [PMID: 27812365 PMCID: PMC5071319 DOI: 10.3389/fgene.2016.00185] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Accepted: 10/03/2016] [Indexed: 01/15/2023] Open
Abstract
Autoimmune diseases represent a significant medical burden affecting up to 5–8% of the U.S. population. While genetics is known to play a role, studies of common autoimmune diseases are complicated by phenotype heterogeneity, limited sample sizes, and a single disease approach. Here we performed a targeted genetic association study for cases of multiple sclerosis (MS), rheumatoid arthritis (RA), and Crohn's disease (CD) to assess which common genetic variants contribute individually and pleiotropically to disease risk. Joint modeling and pathway analysis combining the three phenotypes were performed to identify common underlying mechanisms of risk of autoimmune conditions. European American cases of MS, RA, and CD, (n = 119, 53, and 129, respectively) and 1924 controls were identified using de-identified electronic health records (EHRs) through a combination of International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) billing codes, Current Procedural Terminology (CPT) codes, medication lists, and text matching. As expected, hallmark SNPs in MS, such as DQA1 rs9271366 (OR = 1.91; p = 0.008), replicated in the present study. Both MS and CD were associated with TIMMDC1 rs2293370 (OR = 0.27, p = 0.01; OR = 0.25, p = 0.02; respectively). Additionally, PDE2A rs3781913 was significantly associated with both CD and RA (OR = 0.46, p = 0.02; OR = 0.32, p = 0.02; respectively). Joint modeling and pathway analysis identified variants within the KEGG NOD-like receptor signaling pathway and Shigellosis pathway as being correlated with the combined autoimmune phenotype. Our study replicated previously-reported genetic associations for MS and CD in a population derived from de-identified EHRs. We found evidence to support a shared genetic etiology between CD/MS and CD/RA outside of the major histocompatibility complex region and identified KEGG pathways indicative of a bacterial pathogenesis risk for autoimmunity in a joint model. Future work to elucidate this shared etiology will be key in the development of risk models as envisioned in the era of precision medicine.
Collapse
Affiliation(s)
- Nicole A Restrepo
- Department of Epidemiology and Biostatistics, Case Western Reserve University Cleveland, OH, USA
| | - Mariusz Butkiewicz
- Department of Epidemiology and Biostatistics, Case Western Reserve University Cleveland, OH, USA
| | - Josephine A McGrath
- Vanderbilt Eye Institute, Vanderbilt University Medical Center Nashville, TN, USA
| | - Dana C Crawford
- Department of Epidemiology and Biostatistics, Case Western Reserve UniversityCleveland, OH, USA; Institute for Computational Biology, Case Western Reserve UniversityCleveland, OH, USA
| |
Collapse
|
19
|
Stewart R, Davis K. 'Big data' in mental health research: current status and emerging possibilities. Soc Psychiatry Psychiatr Epidemiol 2016; 51:1055-72. [PMID: 27465245 PMCID: PMC4977335 DOI: 10.1007/s00127-016-1266-8] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/07/2016] [Accepted: 07/08/2016] [Indexed: 01/24/2023]
Abstract
PURPOSE 'Big data' are accumulating in a multitude of domains and offer novel opportunities for research. The role of these resources in mental health investigations remains relatively unexplored, although a number of datasets are in use and supporting a range of projects. We sought to review big data resources and their use in mental health research to characterise applications to date and consider directions for innovation in future. METHODS A narrative review. RESULTS Clear disparities were evident in geographic regions covered and in the disorders and interventions receiving most attention. DISCUSSION We discuss the strengths and weaknesses of the use of different types of data and the challenges of big data in general. Current research output from big data is still predominantly determined by the information and resources available and there is a need to reverse the situation so that big data platforms are more driven by the needs of clinical services and service users.
Collapse
Affiliation(s)
- Robert Stewart
- Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, Box 63, De Crespigny Park, London, SE5 8AF, UK.
| | - Katrina Davis
- Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, Box 63, De Crespigny Park, London, SE5 8AF, UK
| |
Collapse
|
20
|
Butkiewicz M, Restrepo NA, Haines JL, Crawford DC. DRUG-DRUG INTERACTION PROFILES OF MEDICATION REGIMENS EXTRACTED FROM A DE-IDENTIFIED ELECTRONIC MEDICAL RECORDS SYSTEM. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2016; 2016:33-40. [PMID: 27570646 PMCID: PMC5001747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
With age, the number of prescribed medications increases and subsequently raises the risk for adverse drug-drug interactions. These adverse effects lower quality of life and increase health care costs. Quantifying the potential burden of adverse effects before prescribing medications can be a valuable contribution to health care. This study evaluated medication lists extracted from a subset of the Vanderbilt de-identified electronic medical record system. Reported drugs were cross-referenced with the Kyoto Encyclopedia of Genes and Genomes DRUG database to identify known drug-drug interactions. On average, a medication regimen contained 6.58 medications and 2.68 drug-drug interactions. Here, we quantify the burden of potential adverse events from drug-drug interactions through drug-drug interaction profiles and include a number of alternative medications as provided by the Anatomical Therapeutic Chemical Classification System.
Collapse
Affiliation(s)
- Mariusz Butkiewicz
- Institute for Computational Biology, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, US
| | - Nicole A Restrepo
- Institute for Computational Biology, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, US
| | - Jonathan L Haines
- Institute for Computational Biology, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, US
| | - Dana C Crawford
- Institute for Computational Biology, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, US
| |
Collapse
|
21
|
Searching in the Dark: Phenotyping Diabetic Retinopathy in a De-Identified Electronic Medical Record Sample of African Americans. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2016; 2016:221-30. [PMID: 27570675 PMCID: PMC5001772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
A hurdle to EMR-based studies is the characterization and extraction of complex phenotypes not readily defined by single diagnostic/procedural codes. Here we developed an algorithm utilizing data mining techniques to identify a diabetic retinopathy (DR) cohort of type-2 diabetic African Americans from the Vanderbilt University de-identified EMR system. The algorithm incorporates a combination of diagnostic codes, current procedural terminology billing codes, medications, and text matching to identify DR when gold-standard digital photography results were unavailable. DR cases were identified with a positive predictive value of 75.3% and an accuracy of 84.8%. Controls were classified with a negative predictive value of 1.0% as could be assessed. Limited studies of DR have been performed in African Americans who are at an elevated risk of DR. Identification of EMR-based African American cohorts may help stimulate new biomedical studies that could elucidate differences in risk for the development of DR and other complex diseases.
Collapse
|
22
|
Oetjens MT, Bush WS, Denny JC, Birdwell K, Kodaman N, Verma A, Dilks HH, Pendergrass SA, Ritchie MD, Crawford DC. Evidence for extensive pleiotropy among pharmacogenes. Pharmacogenomics 2016; 17:853-66. [PMID: 27249515 DOI: 10.2217/pgs-2015-0007] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
AIM We sought to identify potential pleiotropy involving pharmacogenes. METHODS We tested 184 functional variants in 34 pharmacogenes for associations using a custom grouping of International Classification and Disease, Ninth Revision billing codes extracted from deidentified electronic health records of 6892 patients. RESULTS We replicated several associations including ABCG2 (rs2231142) and gout (p = 1.73 × 10(-7); odds ratio [OR]: 1.73; 95% CI: 1.40-2.12); and SLCO1B1 (rs4149056) and jaundice (p = 2.50 × 10(-4); OR: 1.67; 95% CI: 1.27-2.20). CONCLUSION In this systematic screen for phenotypic associations with functional variants, several novel genotype-phenotype combinations also achieved phenome-wide significance, including SLC15A2 rs1143672 and renal osteodystrophy (p = 2.67 × 10(-) (6); OR: 0.61; 95% CI: 0.49-0.75).
Collapse
Affiliation(s)
- Matthew T Oetjens
- Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37232, USA
| | - William S Bush
- Department of Epidemiology & Biostatistics, Institute for Computational Biology, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37203, USA
| | - Kelly Birdwell
- Department of Medicine, Vanderbilt University, Nashville, TN 37232, USA
| | - Nuri Kodaman
- Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37232, USA
| | - Anurag Verma
- Center for Systems Genomics, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Holli H Dilks
- Sarah Cannon Research Institute, Nashville, TN 37203 USA
| | - Sarah A Pendergrass
- Center for Systems Genomics, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Marylyn D Ritchie
- Center for Systems Genomics, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Dana C Crawford
- Department of Epidemiology & Biostatistics, Institute for Computational Biology, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
23
|
Laper SM, Restrepo NA, Crawford DC. THE CHALLENGES IN USING ELECTRONIC HEALTH RECORDS FOR PHARMACOGENOMICS AND PRECISION MEDICINE RESEARCH. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 21:369-80. [PMID: 26776201 PMCID: PMC4720980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Access and utilization of electronic health records with extensive medication lists and genetic profiles is rapidly advancing discoveries in pharmacogenomics. In this study, we analyzed ~116,000 variants on the Illumina Metabochip for response to antihypertensive and lipid lowering medications in African American adults from BioVU, the Vanderbilt University Medical Center's biorepository linked to de-identified electronic health records. Our study population included individuals who were prescribed an antihypertensive or lipid lowering medication, and who had both pre- and post-medication blood pressure or low-density lipoprotein cholesterol (LDL-C) measurements, respectively. Among those with pre- and post-medication systolic and diastolic blood pressure measurements (n=2,268), the average change in systolic and diastolic blood pressure was -0.6 mg Hg and -0.8 mm Hg, respectively. Among those with pre- and post-medication LDL-C measurements (n=1,244), the average change in LDL-C was -26.3 mg/dL. SNPs were tested for an association with change and percent change in blood pressure or blood levels of LDL-C. After adjustment for multiple testing, we did not observe any significant associations, and we were not able to replicate previously reported associations, such as in APOE and LPA, from the literature. The present study illustrates the benefits and challenges with using electronic health records linked to biorepositories for pharmacogenomic studies.
Collapse
Affiliation(s)
- Sarah M. Laper
- Eastern Virginia Medical School, Norfolk, VA, 23507, USA
| | - Nicole A. Restrepo
- Center for Human Genetics Research, Vanderbilt University, 519 Light Hall, 2215 Garland Avenue, Nashville, TN 37232, USA
| | - Dana C. Crawford
- Institute for Computational Biology, Department of Epidemiology and Biostatistics, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Suite 2527, Cleveland, OH 44106, USA
| |
Collapse
|
24
|
Dumitrescu L, Restrepo NA, Goodloe R, Boston J, Farber-Eger E, Pendergrass SA, Bush WS, Crawford DC. Towards a phenome-wide catalog of human clinical traits impacted by genetic ancestry. BioData Min 2015; 8:35. [PMID: 26566401 PMCID: PMC4642611 DOI: 10.1186/s13040-015-0068-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2014] [Accepted: 11/02/2015] [Indexed: 01/13/2023] Open
Abstract
Background Racial/ethnic differences for commonly measured clinical variables are well documented, and it has been postulated that population-specific genetic factors may play a role. The genetic heterogeneity of admixed populations, such as African Americans, provides a unique opportunity to identify genomic regions and variants associated with the clinical variability observed for diseases and traits across populations. Method To begin a systematic search for these population-specific genomic regions at the phenome-wide scale, we determined the relationship between global genetic ancestry, specifically European and African ancestry, and clinical variables measured in a population of African Americans from BioVU, Vanderbilt University’s biorepository linked to de-identified electronic medical records (EMRs) as part of the Epidemiologic Architecture using Genomics and Epidemiology (EAGLE) study. Through billing (ICD-9) codes, procedure codes, labs, and clinical notes, 36 common clinical and laboratory variables were mined from the EMR, including body mass index (BMI), kidney traits, lipid levels, blood pressure, and electrocardiographic measurements. A total of 15,863 DNA samples from non-European Americans were genotyped on the Illumina Metabochip containing ~200,000 variants, of which 11,166 were from African Americans. Tests of association were performed to examine associations between global ancestry and the phenotype of interest. Results Increased European ancestry, and conversely decreased African ancestry, was most strongly correlated with an increase in QRS duration, consistent with previous observations that African Americans tend to have shorter a QRS duration compared with European Americans. Despite known racial/ethnic disparities in blood pressure, European and African ancestry was neither associated with diastolic nor systolic blood pressure measurements. Conclusion Collectively, these results suggest that this clinical population can be used to identify traits in which population differences may be due, in part, to population-specific genetics. Electronic supplementary material The online version of this article (doi:10.1186/s13040-015-0068-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Logan Dumitrescu
- Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37232 USA
| | - Nicole A Restrepo
- Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37232 USA
| | - Robert Goodloe
- Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37232 USA
| | - Jonathan Boston
- Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37232 USA
| | - Eric Farber-Eger
- Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37232 USA
| | - Sarah A Pendergrass
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802 USA
| | - William S Bush
- Institute for Computational Biology, Department of Epidemiology and Biostatistics, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Suite 2527, Cleveland, OH 44106 USA
| | - Dana C Crawford
- Institute for Computational Biology, Department of Epidemiology and Biostatistics, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Suite 2527, Cleveland, OH 44106 USA
| |
Collapse
|
25
|
Restrepo NA, Farber-Eger E, Goodloe R, Haines JL, Crawford DC. Extracting Primary Open-Angle Glaucoma from Electronic Medical Records for Genetic Association Studies. PLoS One 2015; 10:e0127817. [PMID: 26061293 PMCID: PMC4465698 DOI: 10.1371/journal.pone.0127817] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 04/20/2015] [Indexed: 11/08/2022] Open
Abstract
Electronic medical records (EMRs) are being widely implemented for use in genetic and genomic studies. As a phenotypic rich resource, EMRs provide researchers with the opportunity to identify disease cohorts and perform genotype-phenotype association studies. The Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study, as part of the Population Architecture using Genomics and Epidemiology (PAGE) I study, has genotyped more than 15,000 individuals of diverse genetic ancestry in BioVU, the Vanderbilt University Medical Center’s biorepository linked to a de-identified version of the EMR (EAGLE BioVU). Here we develop and deploy an algorithm utilizing data mining techniques to identify primary open-angle glaucoma (POAG) in African Americans from EAGLE BioVU for genetic association studies. The algorithm described here was designed using a combination of diagnostic codes, current procedural terminology billing codes, and free text searches to identify POAG status in situations where gold-standard digital photography cannot be accessed. The case algorithm identified 267 potential POAG subjects but underperformed after manual review with a positive predictive value of 51.6% and an accuracy of 76.3%. The control algorithm identified controls with a negative predictive value of 98.3%. Although the case algorithm requires more downstream manual review for use in large-scale studies, it provides a basis by which to extract a specific clinical subtype of glaucoma from EMRs in the absence of digital photographs.
Collapse
Affiliation(s)
- Nicole A. Restrepo
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Eric Farber-Eger
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Robert Goodloe
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Jonathan L. Haines
- Department of Epidemiology & Biostatistics, Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Dana C. Crawford
- Department of Epidemiology & Biostatistics, Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio, United States of America
- * E-mail:
| |
Collapse
|
26
|
The effects of electronic medical record phenotyping details on genetic association studies: HDL-C as a case study. BioData Min 2015; 8:15. [PMID: 25969697 PMCID: PMC4428098 DOI: 10.1186/s13040-015-0048-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Accepted: 04/28/2015] [Indexed: 02/01/2023] Open
Abstract
Background Biorepositories linked to de-identified electronic medical records (EMRs) have the potential to complement traditional epidemiologic studies in genotype-phenotype studies of complex human diseases and traits. A major challenge in meeting this potential is the use of EMR-derived data to extract phenotypes and covariates for genetic association studies. Unlike traditional epidemiologic data, EMR-derived data are collected for clinical care and are therefore highly variable across patients. The variability of clinical data coupled with the challenges associated with searching unstructured clinical notes requires the development of algorithms to extract phenotypes for analysis. Given the number of possible algorithms that could be developed for any one EMR-derived phenotype, we explored here the impact algorithm decision logic has on genetic association study results for a single quantitative trait, high density lipoprotein cholesterol (HDL-C). Results We used five different algorithms to extract HDL-C from African American subjects genotyped on the Illumina Metabochip (n = 11,519) as part of Epidemiologic Architecture for Genes Linked to Environment (EAGLE). Tests of association between HDL-C and genetic risk scores for HDL-C associated variants suggest that the genetic effect size does not vary substantially across the five HDL-C definitions. Conclusions These data collectively suggest that, at least for this quantitative trait, algorithm decision logic and phenotyping details do not appreciably impact genetic association study test statistics.
Collapse
|