1
|
Stahl K, Papiol S, Budde M, Heilbronner M, Oraki Kohshour M, Falkai P, Schulze TG, Heilbronner U, Bickeböller H. Aggregating single nucleotide polymorphisms improves filtering for false-positive associations postimputation. G3 (BETHESDA, MD.) 2025; 15:jkaf043. [PMID: 40053832 PMCID: PMC12060241 DOI: 10.1093/g3journal/jkaf043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/03/2025] [Revised: 02/19/2025] [Accepted: 02/21/2025] [Indexed: 03/09/2025]
Abstract
Imputation causes bias in P-values in downstream genome-wide association studies. Imputation quality measures such as IMPUTE info are used to discriminate between false and true associations. However, implementing a high threshold often discards true associations, while a low threshold preserves false associations. This poses a challenge, especially for studies genotyped with SNP arrays. In practice, association signals register as spikes of low P-values for SNPs in close proximity owing to linkage disequilibrium, but postimputation filtering is conducted on SNPs independently. We simulated 1536 small case-control studies on the human chromosome 19 both to quantify the introduced bias and to evaluate postimputation filtering. The established IMPUTE info thresholds 0.3 and 0.8 were compared on individual SNPs and aggregated spikes in the formats "best guess genotype" and "dosage." Furthermore, we applied 2 recently published methods, Iam hiQ and MagicalRsq, to assess their effect on filtering. We found differences in false signals and imputation quality between the genotype formats, especially in the midrange between thresholds. In this midrange, 51 and 60% of associated SNPs for best guess and dosage format, respectively, are true associations. For aggregated SNPs, the majority of spikes in the midrange are true associations. We propose a new method, the Midrange Filter, which uses both thresholds and formats to classify spikes instead of SNPs. This method discards up to the same number of false signals as the upper threshold, while preserving all true associations in most simulation settings. The PsyCourse study is included as a real-data application.
Collapse
Affiliation(s)
- Katharina Stahl
- Department of Genetic Epidemiology, University Medical Center Göttingen, Göttingen 37073, Germany
| | - Sergi Papiol
- Institute of Psychiatric Phenomics and Genomics (IPPG), LMU University Hospital, Ludwig Maximilian University of Munich, Munich 80336, Germany
- Department of Psychiatry and Psychotherapy, LMU University Hospital, Ludwig Maximilian University of Munich, Munich 80336, Germany
- Department Clinical Translation, Max Planck Institute of Psychiatry, Munich 80804, Germany
| | - Monika Budde
- Institute of Psychiatric Phenomics and Genomics (IPPG), LMU University Hospital, Ludwig Maximilian University of Munich, Munich 80336, Germany
| | - Maria Heilbronner
- Institute of Psychiatric Phenomics and Genomics (IPPG), LMU University Hospital, Ludwig Maximilian University of Munich, Munich 80336, Germany
| | - Mojtaba Oraki Kohshour
- Institute of Psychiatric Phenomics and Genomics (IPPG), LMU University Hospital, Ludwig Maximilian University of Munich, Munich 80336, Germany
- Department Clinical Translation, Max Planck Institute of Psychiatry, Munich 80804, Germany
- Department of Immunology, Faculty of Medicine, Ahvaz Jundishapur University of Medical Sciences, Ahvaz 61357-15794, Iran
| | - Peter Falkai
- Department of Psychiatry and Psychotherapy, LMU University Hospital, Ludwig Maximilian University of Munich, Munich 80336, Germany
- Department Clinical Translation, Max Planck Institute of Psychiatry, Munich 80804, Germany
- German Center for Mental Health (DZPG), partner site Munich/Augsburg, Munich 80336, Germany
| | - Thomas G Schulze
- Institute of Psychiatric Phenomics and Genomics (IPPG), LMU University Hospital, Ludwig Maximilian University of Munich, Munich 80336, Germany
- German Center for Mental Health (DZPG), partner site Munich/Augsburg, Munich 80336, Germany
- Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY 13210, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Johns Hopkins University, Baltimore, MD 21287, USA
| | - Urs Heilbronner
- Institute of Psychiatric Phenomics and Genomics (IPPG), LMU University Hospital, Ludwig Maximilian University of Munich, Munich 80336, Germany
| | - Heike Bickeböller
- Department of Genetic Epidemiology, University Medical Center Göttingen, Göttingen 37073, Germany
| |
Collapse
|
2
|
Sun Q, Du J, Tang Y, Best LG, Haack K, Zhang Y, Cole SA, Franceschini N. Polygenic Scores of Cardiometabolic Risk Factors in American Indian Adults. JAMA Netw Open 2025; 8:e250535. [PMID: 40072435 PMCID: PMC11904716 DOI: 10.1001/jamanetworkopen.2025.0535] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Accepted: 01/06/2025] [Indexed: 03/14/2025] Open
Abstract
Importance Numerous efforts have been made to include diverse populations in genetic studies, but American Indian populations are still severely underrepresented. Polygenic scores derived from genetic data have been proposed in clinical care, but how polygenic scores perform in American Indian individuals and whether they can predict disease risk in this population remains unknown. Objective To study the performance of polygenic scores for cardiometabolic risk factors of lipid traits and C-reactive protein in American Indian adults and to determine whether such scores are helpful in clinical prediction for cardiometabolic diseases. Design, Setting, and Participants The Strong Heart Study (SHS) is a large American Indian cohort recruited from 1989 to 1991, with ongoing follow-up (phase VII). In this genetic association study, data from SHS American Indian participants were used in addition to data from 2 large-scale, external, ancestry-mismatched genome-wide association studies (GWASs; 450 865 individuals from a European GWAS and 33 096 individuals from a multi-ancestry GWAS) and 1 small-scale internal ancestry-matched American Indian GWAS (2000 individuals). Analyses were conducted from February 2023 to August 2024. Exposure Genetic risk score for cardiometabolic disease risk factors from 6 traits including 5 lipids (apolipoprotein A, apolipoprotein B, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, and triglycerides), and an inflammatory biomarker (C-reactive protein [CRP]). Main Outcomes and Measures Data from SHS participants and the 2 GWASs were used to construct 8 polygenic scores. The association of polygenic scores with cardiometabolic disease was assessed using 2-sided z tests and 1-sided likelihood ratio tests. Results In the 3157 SHS participants (mean [SD] age, 56.44 [8.12] years; 1845 female [58.4%]), a large European-based polygenic score had the most robust performance (mean [SD] R2 = 5.0% [1.7%]), but adding a small-scale ancestry-matched GWAS using American Indian data helped improve polygenic score prediction for 5 of 6 traits (all but CRP; mean [SD] R2, 7.6% [3.2%]). Lipid polygenic scores developed in American Indian individuals improved prediction of diabetes compared with baseline clinical risk factors (area under the curve for absolute improvement, 0.86%; 95% CI, 0.78%-0.93%; likelihood ratio test P = 3.8 × 10-3). Conclusions and Relevance In this genetic association study of lipids and CRP among American Indian individuals, polygenic scores of lipid traits were found to improve prediction of diabetes when added to clinical risk factors, although the magnitude of improvement was small. The transferability of polygenic scores derived from other populations is still a concern, with implications for the advancement of precision medicine and the potential of perpetuating health disparities, particularly in this underrepresented population.
Collapse
Affiliation(s)
- Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill
- Now with: Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
| | - Jiawen Du
- Department of Biostatistics, University of North Carolina at Chapel Hill
| | - Yihan Tang
- Department of Biostatistics, University of North Carolina at Chapel Hill
| | - Lyle G. Best
- Missouri Breaks Industries Research Inc, Eagle Butte, South Dakota
| | - Karin Haack
- Texas Biomedical Research Institute, San Antonio
| | - Ying Zhang
- Department of Biostatistics and Epidemiology, University of Oklahoma Health Sciences Center, Oklahoma City
| | | | - Nora Franceschini
- Department of Epidemiology, University of North Carolina at Chapel Hill
| |
Collapse
|
3
|
Nguyen TV, Bolormaa S, Reich CM, Chamberlain AJ, Vander Jagt CJ, Daetwyler HD, MacLeod IM. Empirical versus estimated accuracy of imputation: optimising filtering thresholds for sequence imputation. Genet Sel Evol 2024; 56:72. [PMID: 39548370 PMCID: PMC11566673 DOI: 10.1186/s12711-024-00942-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 10/30/2024] [Indexed: 11/17/2024] Open
Abstract
BACKGROUND Genotype imputation is a cost-effective method for obtaining sequence genotypes for downstream analyses such as genome-wide association studies (GWAS). However, low imputation accuracy can increase the risk of false positives, so it is important to pre-filter data or at least assess the potential limitations due to imputation accuracy. In this study, we benchmarked three different imputation programs (Beagle 5.2, Minimac4 and IMPUTE5) and compared the empirical accuracy of imputation with the software estimated accuracy of imputation (Rsqsoft). We also tested the accuracy of imputation in cattle for autosomal and X chromosomes, SNP and INDEL, when imputing from either low-density or high-density genotypes. RESULTS The accuracy of imputing sequence variants from real high-density genotypes was higher than from low-density genotypes. In our software benchmark, all programs performed well with only minor differences in accuracy. While there was a close relationship between empirical imputation accuracy and the imputation Rsqsoft, this differed considerably for Minimac4 compared to Beagle 5.2 and IMPUTE5. We found that the Rsqsoft threshold for removing poorly imputed variants must be customised according to the software and this should be accounted for when merging data from multiple studies, such as in meta-GWAS studies. We also found that imposing an Rsqsoft filter has a positive impact on genomic regions with poor imputation accuracy due to large segmental duplications that are susceptible to error-prone alignment. Overall, our results showed that on average the imputation accuracy for INDEL was approximately 6% lower than SNP for all software programs. Importantly, the imputation accuracy for the non-PAR (non-Pseudo-Autosomal Region) of the X chromosome was comparable to autosomal imputation accuracy, while for the PAR it was substantially lower, particularly when starting from low-density genotypes. CONCLUSIONS This study provides an empirically derived approach to apply customised software-specific Rsqsoft thresholds for downstream analyses of imputed variants, such as needed for a meta-GWAS. The very poor empirical imputation accuracy for variants on the PAR when starting from low density genotypes demonstrates that this region should be imputed starting from a higher density of real genotypes.
Collapse
Affiliation(s)
- Tuan V Nguyen
- Agriculture Victoria, Centre for AgriBiosciences, AgriBio, Bundoora, VIC, 3083, Australia.
| | - Sunduimijid Bolormaa
- Agriculture Victoria, Centre for AgriBiosciences, AgriBio, Bundoora, VIC, 3083, Australia
| | - Coralie M Reich
- Agriculture Victoria, Centre for AgriBiosciences, AgriBio, Bundoora, VIC, 3083, Australia
| | - Amanda J Chamberlain
- Agriculture Victoria, Centre for AgriBiosciences, AgriBio, Bundoora, VIC, 3083, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Christy J Vander Jagt
- Agriculture Victoria, Centre for AgriBiosciences, AgriBio, Bundoora, VIC, 3083, Australia
| | - Hans D Daetwyler
- Agriculture Victoria, Centre for AgriBiosciences, AgriBio, Bundoora, VIC, 3083, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Iona M MacLeod
- Agriculture Victoria, Centre for AgriBiosciences, AgriBio, Bundoora, VIC, 3083, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| |
Collapse
|
4
|
Sun Q, Karafin MS, Garrett ME, Li Y, Ashley-Koch A, Telen MJ. A genome-wide association study of alloimmunization in the TOPMed OMG-SCD cohort identifies a locus on chromosome 12. Transfusion 2024; 64:1772-1783. [PMID: 38966903 PMCID: PMC11499043 DOI: 10.1111/trf.17944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 06/10/2024] [Accepted: 06/20/2024] [Indexed: 07/06/2024]
Abstract
BACKGROUND Red cell alloimmunization after exposure to donor red cells is a very common complication of transfusion for patients with sickle cell disease (SCD), resulting frequently in accelerated donor red blood cell destruction. Patients show substantial differences in their predisposition to alloimmunization, and genetic variability is one proposed component. Although several genetic association studies have been conducted for alloimmunization, the results have been inconsistent, and the genetic determinants of alloimmunization remain largely unknown. STUDY DESIGN AND METHODS We performed a genome-wide association study (GWAS) in 236 African American (AA) SCD patients from the Outcome Modifying Genes in Sickle Cell Disease (OMG-SCD) cohort, which is part of Trans-Omics for Precision Medicine (TOPMed), with whole-genome sequencing data available. We also performed sensitivity analyses adjusting for different sets of covariates and applied different sample grouping strategies based on the number of alloantibodies patients developed. RESULTS We identified one genome-wide significant locus on chr12 (p = 3.1e-9) with no evidence of genomic inflation (lambda = 1.003). Further leveraging QTL evidence from GTEx whole blood and/or Jackson Heart Study PBMC RNA-Seq data, we identified a number of potential genes, such as ARHGAP9, STAT6, and ATP23, that may be driving the association signal. We also discovered some suggestive loci using different analysis strategies. DISCUSSION We call for the community to collect additional alloantibody information within SCD cohorts to further the understanding of the genetic basis of alloimmunization in order to improve transfusion outcomes.
Collapse
Affiliation(s)
- Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Matthew S. Karafin
- Department of Pathology and Laboratory Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Melanie E. Garrett
- Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Allison Ashley-Koch
- Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC
| | - Marilyn J. Telen
- Division of Hematology, Department of Medicine, and Duke Comprehensive Sickle Cell Center, Duke University Medical Center, Durham, NC
| |
Collapse
|
5
|
Cahoon JL, Rui X, Tang E, Simons C, Langie J, Chen M, Lo YC, Chiang CWK. Imputation accuracy across global human populations. Am J Hum Genet 2024; 111:979-989. [PMID: 38604166 PMCID: PMC11080279 DOI: 10.1016/j.ajhg.2024.03.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 03/14/2024] [Accepted: 03/15/2024] [Indexed: 04/13/2024] Open
Abstract
Genotype imputation is now fundamental for genome-wide association studies but lacks fairness due to the underrepresentation of references from non-European ancestries. The state-of-the-art imputation reference panel released by the Trans-Omics for Precision Medicine (TOPMed) initiative improved the imputation of admixed African-ancestry and Hispanic/Latino samples, but imputation for populations primarily residing outside of North America may still fall short in performance due to persisting underrepresentation. To illustrate this point, we imputed the genotypes of over 43,000 individuals across 123 populations around the world and identified numerous populations where imputation accuracy paled in comparison to that of European-ancestry populations. For instance, the mean imputation r-squared (Rsq) for variants with minor allele frequencies between 1% and 5% in Saudi Arabians (n = 1,061), Vietnamese (n = 1,264), Thai (n = 2,435), and Papua New Guineans (n = 776) were 0.79, 0.78, 0.76, and 0.62, respectively, compared to 0.90-0.93 for comparable European populations matched in sample size and SNP array content. Outside of Africa and Latin America, Rsq appeared to decrease as genetic distances to European-ancestry reference increased, as predicted. Using sequencing data as ground truth, we also showed that Rsq may over-estimate imputation accuracy for non-European populations more than European populations, suggesting further disparity in accuracy between populations. Using 1,496 sequenced individuals from Taiwan Biobank as a second reference panel to TOPMed, we also assessed a strategy to improve imputation for non-European populations with meta-imputation, but this design did not improve accuracy across frequency spectra. Taken together, our analyses suggest that we must ultimately strive to increase diversity and size to promote equity within genetics research.
Collapse
Affiliation(s)
- Jordan L Cahoon
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, Los Angeles, CA 90033, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, Los Angeles, CA 90089, USA; Department of Computer Science, University of Southern California, Los Angeles, Los Angeles, CA 90089, USA
| | - Xinyue Rui
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, Los Angeles, CA 90033, USA
| | - Echo Tang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, Los Angeles, CA 90089, USA
| | - Christopher Simons
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, Los Angeles, CA 90089, USA
| | - Jalen Langie
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, Los Angeles, CA 90033, USA
| | - Minhui Chen
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, Los Angeles, CA 90033, USA
| | - Ying-Chu Lo
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, Los Angeles, CA 90033, USA
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, Los Angeles, CA 90033, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, Los Angeles, CA 90089, USA; Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, Los Angeles, CA 90033, USA.
| |
Collapse
|
6
|
Sun Q, Yang Y, Rosen JD, Chen J, Li X, Guan W, Jiang MZ, Wen J, Pace RG, Blackman SM, Bamshad MJ, Gibson RL, Cutting GR, O'Neal WK, Knowles MR, Kooperberg C, Reiner AP, Raffield LM, Carson AP, Rich SS, Rotter JI, Loos RJF, Kenny E, Jaeger BC, Min YI, Fuchsberger C, Li Y. MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric. Am J Hum Genet 2024; 111:990-995. [PMID: 38636510 PMCID: PMC11080605 DOI: 10.1016/j.ajhg.2024.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 03/29/2024] [Accepted: 04/01/2024] [Indexed: 04/20/2024] Open
Abstract
Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.
Collapse
Affiliation(s)
- Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yingxi Yang
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
| | - Jonathan D Rosen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Xihao Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Wyliena Guan
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Min-Zhi Jiang
- Department of Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jia Wen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Rhonda G Pace
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Scott M Blackman
- Division of Pediatric Endocrinology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Michael J Bamshad
- Department of Pediatrics, University of Washington, Seattle, WA 98105, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Ronald L Gibson
- Department of Pediatrics, University of Washington, Seattle, WA 98105, USA
| | - Garry R Cutting
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Wanda K O'Neal
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael R Knowles
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Alexander P Reiner
- Department of Epidemiology, University of Washington, Seattle, WA 98195, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - April P Carson
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL 35249, USA
| | - Stephen S Rich
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia School of Medicine, Charlottesville, VA 22908, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Eimear Kenny
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA
| | - Byron C Jaeger
- Wake Forest School of Medicine, Department of Biostatistics and Data Science, Wake Forest University, Winston-Salem, NC 27109, USA
| | - Yuan-I Min
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS 39216, USA
| | - Christian Fuchsberger
- Institute for Biomedicine, Eurac Research (affiliated with the University of Lübeck), Bolzano, Italy.
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
| |
Collapse
|
7
|
Bhérer C, Eveleigh R, Trajanoska K, St-Cyr J, Paccard A, Nadukkalam Ravindran P, Caron E, Bader Asbah N, McClelland P, Wei C, Baumgartner I, Schindewolf M, Döring Y, Perley D, Lefebvre F, Lepage P, Bourgey M, Bourque G, Ragoussis J, Mooser V, Taliun D. A cost-effective sequencing method for genetic studies combining high-depth whole exome and low-depth whole genome. NPJ Genom Med 2024; 9:8. [PMID: 38326393 PMCID: PMC10850497 DOI: 10.1038/s41525-024-00390-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 12/07/2023] [Indexed: 02/09/2024] Open
Abstract
Whole genome sequencing (WGS) at high-depth (30X) allows the accurate discovery of variants in the coding and non-coding DNA regions and helps elucidate the genetic underpinnings of human health and diseases. Yet, due to the prohibitive cost of high-depth WGS, most large-scale genetic association studies use genotyping arrays or high-depth whole exome sequencing (WES). Here we propose a cost-effective method which we call "Whole Exome Genome Sequencing" (WEGS), that combines low-depth WGS and high-depth WES with up to 8 samples pooled and sequenced simultaneously (multiplexed). We experimentally assess the performance of WEGS with four different depth of coverage and sample multiplexing configurations. We show that the optimal WEGS configurations are 1.7-2.0 times cheaper than standard WES (no-plexing), 1.8-2.1 times cheaper than high-depth WGS, reach similar recall and precision rates in detecting coding variants as WES, and capture more population-specific variants in the rest of the genome that are difficult to recover when using genotype imputation methods. We apply WEGS to 862 patients with peripheral artery disease and show that it directly assesses more known disease-associated variants than a typical genotyping array and thousands of non-imputable variants per disease-associated locus.
Collapse
Affiliation(s)
- Claude Bhérer
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Robert Eveleigh
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canadian Centre for Computational Genomics, McGill University, Montréal, Québec, Canada
| | - Katerina Trajanoska
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Janick St-Cyr
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | - Antoine Paccard
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | - Praveen Nadukkalam Ravindran
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Elizabeth Caron
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | - Nimara Bader Asbah
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | - Peyton McClelland
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Clare Wei
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Iris Baumgartner
- Division of Angiology, Swiss Cardiovascular Center, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department for BioMedical Research (DBMR), University of Bern, Bern, Switzerland
| | - Marc Schindewolf
- Division of Angiology, Swiss Cardiovascular Center, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department for BioMedical Research (DBMR), University of Bern, Bern, Switzerland
| | - Yvonne Döring
- Division of Angiology, Swiss Cardiovascular Center, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department for BioMedical Research (DBMR), University of Bern, Bern, Switzerland
- Institute for Cardiovascular Prevention (IPEK), Ludwig-Maximilians University Munich, Pettenkoferstr 9, 80336, Munich, Germany
| | - Danielle Perley
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canadian Centre for Computational Genomics, McGill University, Montréal, Québec, Canada
| | - François Lefebvre
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canadian Centre for Computational Genomics, McGill University, Montréal, Québec, Canada
| | - Pierre Lepage
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | | | - Guillaume Bourque
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canadian Centre for Computational Genomics, McGill University, Montréal, Québec, Canada
| | - Jiannis Ragoussis
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | - Vincent Mooser
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Daniel Taliun
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada.
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada.
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada.
| |
Collapse
|
8
|
Sun Q, Rowland BT, Chen J, Mikhaylova AV, Avery C, Peters U, Lundin J, Matise T, Buyske S, Tao R, Mathias RA, Reiner AP, Auer PL, Cox NJ, Kooperberg C, Thornton TA, Raffield LM, Li Y. Improving polygenic risk prediction in admixed populations by explicitly modeling ancestral-differential effects via GAUDI. Nat Commun 2024; 15:1016. [PMID: 38310129 PMCID: PMC10838303 DOI: 10.1038/s41467-024-45135-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 01/16/2024] [Indexed: 02/05/2024] Open
Abstract
Polygenic risk scores (PRS) have shown successes in clinics, but most PRS methods focus only on participants with distinct primary continental ancestry without accommodating recently-admixed individuals with mosaic continental ancestry backgrounds for different segments of their genomes. Here, we develop GAUDI, a novel penalized-regression-based method specifically designed for admixed individuals. GAUDI explicitly models ancestry-differential effects while borrowing information across segments with shared ancestry in admixed genomes. We demonstrate marked advantages of GAUDI over other methods through comprehensive simulation and real data analyses for traits with associated variants exhibiting ancestral-differential effects. Leveraging data from the Women's Health Initiative study, we show that GAUDI improves PRS prediction of white blood cell count and C-reactive protein in African Americans by > 64% compared to alternative methods, and even outperforms PRS-CSx with large European GWAS for some scenarios. We believe GAUDI will be a valuable tool to mitigate disparities in PRS performance in admixed individuals.
Collapse
Affiliation(s)
- Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Bryce T Rowland
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Anna V Mikhaylova
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
| | - Christy Avery
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Ulrike Peters
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Jessica Lundin
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Tara Matise
- Department of Genetics, Rutgers University, New Brunswick, NJ, 08901, USA
| | - Steve Buyske
- Department of Statistics, Rutgers University, New Brunswick, NJ, 08901, USA
| | - Ran Tao
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Rasika A Mathias
- Department of Medicine, Johns Hopkins University, Baltimore, MD, 21287, USA
| | - Alexander P Reiner
- Department of Epidemiology, University of Washington, Seattle, WA, 98195, USA
| | - Paul L Auer
- Division of Biostatistics, Institute for Health and Equity, and Cancer Center, Medical College of Wisconsin, Milwaukee, WI, 53226, USA
| | - Nancy J Cox
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Timothy A Thornton
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
9
|
Chi Duong V, Minh Vu G, Khac Nguyen T, Tran The Nguyen H, Luong Pham T, S Vo N, Hong Hoang T. A rapid and reference-free imputation method for low-cost genotyping platforms. Sci Rep 2023; 13:23083. [PMID: 38155188 PMCID: PMC10754833 DOI: 10.1038/s41598-023-50086-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 12/15/2023] [Indexed: 12/30/2023] Open
Abstract
Most current genotype imputation methods are reference-based, which posed several challenges to users, such as high computational costs and reference panel inaccessibility. Thus, deep learning models are expected to create reference-free imputation methods performing with higher accuracy and shortening the running time. We proposed a imputation method using recurrent neural networks integrating with an additional discriminator network, namely GRUD. This method was applied to datasets from genotyping chips and Low-Pass Whole Genome Sequencing (LP-WGS) with the reference panels from The 1000 Genomes Project (1KGP) phase 3, the dataset of 4810 Singaporeans (SG10K), and The 1000 Vietnamese Genome Project (VN1K). Our model performed more accurately than other existing methods on multiple datasets, especially with common variants with large minor allele frequency, and shrank running time and memory usage. In summary, these results indicated that GRUD can be implemented in genomic analyses to improve the accuracy and running-time of genotype imputation.
Collapse
Affiliation(s)
- Vinh Chi Duong
- Center for Biomedical Informatics, Vingroup Big Data Institute, Hanoi, Vietnam
- GeneStory Joint Stock Company, Hanoi, Vietnam
| | - Giang Minh Vu
- Center for Biomedical Informatics, Vingroup Big Data Institute, Hanoi, Vietnam
- GeneStory Joint Stock Company, Hanoi, Vietnam
| | | | - Hung Tran The Nguyen
- Center for Biomedical Informatics, Vingroup Big Data Institute, Hanoi, Vietnam
- Nanyang Technological University, Singapore, Singapore
| | | | - Nam S Vo
- Center for Biomedical Informatics, Vingroup Big Data Institute, Hanoi, Vietnam.
- GeneStory Joint Stock Company, Hanoi, Vietnam.
| | - Tham Hong Hoang
- Center for Biomedical Informatics, Vingroup Big Data Institute, Hanoi, Vietnam.
- GeneStory Joint Stock Company, Hanoi, Vietnam.
| |
Collapse
|
10
|
Shi M, Tanikawa C, Munter HM, Akiyama M, Koyama S, Tomizuka K, Matsuda K, Lathrop GM, Terao C, Koido M, Kamatani Y. Genotype imputation accuracy and the quality metrics of the minor ancestry in multi-ancestry reference panels. Brief Bioinform 2023; 25:bbad509. [PMID: 38221906 PMCID: PMC10788679 DOI: 10.1093/bib/bbad509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/20/2023] [Accepted: 12/13/2023] [Indexed: 01/16/2024] Open
Abstract
Large-scale imputation reference panels are currently available and have contributed to efficient genome-wide association studies through genotype imputation. However, whether large-size multi-ancestry or small-size population-specific reference panels are the optimal choices for under-represented populations continues to be debated. We imputed genotypes of East Asian (180k Japanese) subjects using the Trans-Omics for Precision Medicine reference panel and found that the standard imputation quality metric (Rsq) overestimated dosage r2 (squared correlation between imputed dosage and true genotype) particularly in marginal-quality bins. Variance component analysis of Rsq revealed that the increased imputed-genotype certainty (dosages closer to 0, 1 or 2) caused upward bias, indicating some systemic bias in the imputation. Through systematic simulations using different template switching rates (θ value) in the hidden Markov model, we revealed that the lower θ value increased the imputed-genotype certainty and Rsq; however, dosage r2 was insensitive to the θ value, thereby causing a deviation. In simulated reference panels with different sizes and ancestral diversities, the θ value estimates from Minimac decreased with the size of a single ancestry and increased with the ancestral diversity. Thus, Rsq could be deviated from dosage r2 for a subpopulation in the multi-ancestry panel, and the deviation represents different imputed-dosage distributions. Finally, despite the impact of the θ value, distant ancestries in the reference panel contributed only a few additional variants passing a predefined Rsq threshold. We conclude that the θ value substantially impacts the imputed dosage and the imputation quality metric value.
Collapse
Affiliation(s)
- Mingyang Shi
- Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Chizu Tanikawa
- Laboratory of Clinical Genome Sequencing, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Hans Markus Munter
- Victor Phillip Dahdaleh Institute of Genomic Medicine, McGill University, Montreal, Québec, Canada
| | - Masato Akiyama
- Department of Ocular Pathology and Imaging Science, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| | - Satoshi Koyama
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Kohei Tomizuka
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Koichi Matsuda
- Laboratory of Clinical Genome Sequencing, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Gregory Mark Lathrop
- Victor Phillip Dahdaleh Institute of Genomic Medicine, McGill University, Montreal, Québec, Canada
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Masaru Koido
- Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yoichiro Kamatani
- Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| |
Collapse
|
11
|
Cahoon JL, Rui X, Tang E, Simons C, Langie J, Chen M, Lo YC, Chiang CWK. Imputation Accuracy Across Global Human Populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.22.541241. [PMID: 37292811 PMCID: PMC10245797 DOI: 10.1101/2023.05.22.541241] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Genotype imputation is now fundamental for genome-wide association studies but lacks fairness due to the underrepresentation of populations with non-European ancestries. The state-of-the-art imputation reference panel released by the Trans-Omics for Precision Medicine (TOPMed) initiative contains a substantial number of admixed African-ancestry and Hispanic/Latino samples to impute these populations with nearly the same accuracy as European-ancestry cohorts. However, imputation for populations primarily residing outside of North America may still fall short in performance due to persisting underrepresentation. To illustrate this point, we curated genome-wide array data from 23 publications published between 2008 to 2021. In total, we imputed over 43k individuals across 123 populations around the world. We identified a number of populations where imputation accuracy paled in comparison to that of European-ancestry populations. For instance, the mean imputation r-squared (Rsq) for 1-5% alleles in Saudi Arabians (N=1061), Vietnamese (N=1264), Thai (N=2435), and Papua New Guineans (N=776) were 0.79, 0.78, 0.76, and 0.62, respectively. In contrast, the mean Rsq ranged from 0.90 to 0.93 for comparable European populations matched in sample size and SNP content. Outside of Africa and Latin America, Rsq appeared to decrease as genetic distances to European reference increased, as predicted. Further analysis using sequencing data as ground truth suggested that imputation software may over-estimate imputation accuracy for non-European populations than European populations, suggesting further disparity between populations. Using 1496 whole genome sequenced individuals from Taiwan Biobank as a reference, we also assessed a strategy to improve imputation for non-European populations with meta-imputation, which can combine results from TOPMed with smaller population-specific reference panels. We found that meta-imputation in this design did not improve Rsq genome-wide. Taken together, our analysis suggests that with the current size of alternative reference panels, meta-imputation alone cannot improve imputation efficacy for underrepresented cohorts and we must ultimately strive to increase diversity and size to promote equity within genetics research.
Collapse
Affiliation(s)
- Jordan L. Cahoon
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Xinyue Rui
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Echo Tang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Christopher Simons
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Jalen Langie
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Minhui Chen
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Ying-Chu Lo
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Charleston W. K. Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90033, USA
| |
Collapse
|
12
|
Sun Q, Broadaway KA, Edmiston SN, Fajgenbaum K, Miller-Fleming T, Westerkam LL, Melendez-Gonzalez M, Bui H, Blum FR, Levitt B, Lin L, Hao H, Harris KM, Liu Z, Thomas NE, Cox NJ, Li Y, Mohlke KL, Sayed CJ. Genetic Variants Associated With Hidradenitis Suppurativa. JAMA Dermatol 2023; 159:930-938. [PMID: 37494057 PMCID: PMC10372759 DOI: 10.1001/jamadermatol.2023.2217] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 04/25/2023] [Indexed: 07/27/2023]
Abstract
Importance Hidradenitis suppurativa (HS) is a common and severely morbid chronic inflammatory skin disease that is reported to be highly heritable. However, the genetic understanding of HS is insufficient, and limited genome-wide association studies (GWASs) have been performed for HS, which have not identified significant risk loci. Objective To identify genetic variants associated with HS and to shed light on the underlying genes and genetic mechanisms. Design, Setting, and Participants This genetic association study recruited 753 patients with HS in the HS Program for Research and Care Excellence (HS ProCARE) at the University of North Carolina Department of Dermatology from August 2018 to July 2021. A GWAS was performed for 720 patients (after quality control) with controls from the Add Health study and then meta-analyzed with 2 large biobanks, UK Biobank (247 cases) and FinnGen (673 cases). Variants at 3 loci were tested for replication in the BioVU biobank (290 cases). Data analysis was performed from September 2021 to December 2022. Main Outcomes and Measures Main outcome measures are loci identified, with association of P < 1 × 10-8 considered significant. Results A total of 753 patients were recruited, with 720 included in the analysis. Mean (SD) age at symptom onset was 20.3 (10.57) years and at enrollment was 35.3 (13.52) years; 360 (50.0%) patients were Black, and 575 (79.7%) were female. In a meta-analysis of the 4 studies, 2 HS-associated loci were identified and replicated, with lead variants rs10512572 (P = 2.3 × 10-11) and rs17090189 (P = 2.1 × 10-8) near the SOX9 and KLF5 genes, respectively. Variants at these loci are located in enhancer regulatory elements detected in skin tissue. Conclusions and Relevance In this genetic association study, common variants associated with HS located near the SOX9 and KLF5 genes were associated with risk of HS. These or other nearby genes may be associated with genetic risk of disease and the development of clinical features, such as cysts, comedones, and inflammatory tunnels, that are unique to HS. New insights into disease pathogenesis related to these genes may help predict disease progression and novel treatment approaches in the future.
Collapse
Affiliation(s)
- Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill
| | | | - Sharon N. Edmiston
- Department of Dermatology, University of North Carolina at Chapel Hill School of Medicine
- Lineberger Comprehensive Cancer Center, Chapel Hill, North Carolina
| | - Kristen Fajgenbaum
- Department of Dermatology, University of North Carolina at Chapel Hill School of Medicine
| | - Tyne Miller-Fleming
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Linnea Lackstrom Westerkam
- Department of Dermatology, University of North Carolina at Chapel Hill School of Medicine
- University of North Carolina at Chapel Hill School of Medicine
| | | | - Helen Bui
- Department of Internal Medicine, University of North Carolina at Chapel Hill School of Medicine
| | | | - Brandt Levitt
- Carolina Population Center, University of North Carolina at Chapel Hill
| | - Lan Lin
- Department of Dermatology, University of North Carolina at Chapel Hill School of Medicine
| | - Honglin Hao
- Department of Dermatology, University of North Carolina at Chapel Hill School of Medicine
| | - Kathleen Mullan Harris
- Carolina Population Center, University of North Carolina at Chapel Hill
- Sociology Department, University of North Carolina at Chapel Hill
| | - Zhi Liu
- Department of Dermatology, University of North Carolina at Chapel Hill School of Medicine
- Lineberger Comprehensive Cancer Center, Chapel Hill, North Carolina
| | - Nancy E. Thomas
- Department of Dermatology, University of North Carolina at Chapel Hill School of Medicine
- Carolina Population Center, University of North Carolina at Chapel Hill
| | - Nancy J. Cox
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill
- Department of Genetics, University of North Carolina at Chapel Hill
| | - Karen L. Mohlke
- Department of Genetics, University of North Carolina at Chapel Hill
| | - Christopher J. Sayed
- Department of Dermatology, University of North Carolina at Chapel Hill School of Medicine
| |
Collapse
|