1
|
Lewis ACF, Chisholm RL, Connolly JJ, Esplin ED, Glessner J, Gordon A, Green RC, Hakonarson H, Harr M, Holm IA, Jarvik GP, Karlson E, Kenny EE, Kottyan L, Lennon N, Linder JE, Luo Y, Martin LJ, Perez E, Puckelwartz MJ, Rasmussen-Torvik LJ, Sabatello M, Sharp RR, Smoller JW, Sterling R, Terek S, Wei WQ, Fullerton SM. Managing differential performance of polygenic risk scores across groups: Real-world experience of the eMERGE Network. Am J Hum Genet 2024; 111:999-1005. [PMID: 38688278 PMCID: PMC11179244 DOI: 10.1016/j.ajhg.2024.04.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 04/10/2024] [Accepted: 04/11/2024] [Indexed: 05/02/2024] Open
Abstract
The differential performance of polygenic risk scores (PRSs) by group is one of the major ethical barriers to their clinical use. It is also one of the main practical challenges for any implementation effort. The social repercussions of how people are grouped in PRS research must be considered in communications with research participants, including return of results. Here, we outline the decisions faced and choices made by a large multi-site clinical implementation study returning PRSs to diverse participants in handling this issue of differential performance. Our approach to managing the complexities associated with the differential performance of PRSs serves as a case study that can help future implementers of PRSs to plot an anticipatory course in response to this issue.
Collapse
Affiliation(s)
- Anna C F Lewis
- Edmond and Lily Safra Center for Ethics, Harvard University, Cambridge, MA, USA; Department of Genetics, Brigham and Women's Hospital, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA.
| | - Rex L Chisholm
- Center for Genetic Medicine, Northwestern University, Evanston, IL, USA
| | - John J Connolly
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | | | - Joe Glessner
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Adam Gordon
- Center for Genetic Medicine, Northwestern University, Evanston, IL, USA; Department of Pharmacology, Northwestern University, Evanston, IL, USA
| | - Robert C Green
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Ariadne Labs, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA; Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA, USA; Division of Pulmonary Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Margaret Harr
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Ingrid A Holm
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Gail P Jarvik
- Division of Medical Genetics, Department of Medicine and Department of Genome Science, University of Washington Medical Center, Seattle, WA, USA
| | - Elizabeth Karlson
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA; Mass General Brigham Personalized Medicine, Boston, MA, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine, New York City, NY, USA; Center for Clinical Translational Genomics, Icahn School of Medicine, New York City, NY, USA; Division of Genomic Medicine, Department of Medicine, Icahn School of Medicine, New York City, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine, New York City, NY, USA
| | - Leah Kottyan
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Niall Lennon
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jodell E Linder
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Evanston, IL, USA
| | - Lisa J Martin
- Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Emma Perez
- Mass General Brigham Personalized Medicine, Boston, MA, USA
| | - Megan J Puckelwartz
- Center for Genetic Medicine, Northwestern University, Evanston, IL, USA; Department of Pharmacology, Northwestern University, Evanston, IL, USA
| | - Laura J Rasmussen-Torvik
- Center for Genetic Medicine, Northwestern University, Evanston, IL, USA; Department of Preventive Medicine, Northwestern University, Evanston, IL, USA
| | - Maya Sabatello
- Center for Precision Medicine and Genomics, Department of Medicine, Columbia University Irving Medical Center, New York City, NY, USA; Division of Ethics, Department of Medical Humanities and Ethics, Columbia University Irving Medical Center, New York City, NY, USA
| | | | - Jordan W Smoller
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA; Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA
| | - Rene Sterling
- Division of Genomics and Society, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shannon Terek
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Stephanie M Fullerton
- Department of Bioethics & Humanities, University of Washington School of Medicine, Seattle, WA, USA
| |
Collapse
|
2
|
Gao Y, Cui Y. Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvement. Genome Med 2024; 16:76. [PMID: 38835075 DOI: 10.1186/s13073-024-01345-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 05/17/2024] [Indexed: 06/06/2024] Open
Abstract
BACKGROUND Accurate prediction of an individual's predisposition to diseases is vital for preventive medicine and early intervention. Various statistical and machine learning models have been developed for disease prediction using clinico-genomic data. However, the accuracy of clinico-genomic prediction of diseases may vary significantly across ancestry groups due to their unequal representation in clinical genomic datasets. METHODS We introduced a deep transfer learning approach to improve the performance of clinico-genomic prediction models for data-disadvantaged ancestry groups. We conducted machine learning experiments on multi-ancestral genomic datasets of lung cancer, prostate cancer, and Alzheimer's disease, as well as on synthetic datasets with built-in data inequality and distribution shifts across ancestry groups. RESULTS Deep transfer learning significantly improved disease prediction accuracy for data-disadvantaged populations in our multi-ancestral machine learning experiments. In contrast, transfer learning based on linear frameworks did not achieve comparable improvements for these data-disadvantaged populations. CONCLUSIONS This study shows that deep transfer learning can enhance fairness in multi-ancestral machine learning by improving prediction accuracy for data-disadvantaged populations without compromising prediction accuracy for other populations, thus providing a Pareto improvement towards equitable clinico-genomic prediction of diseases.
Collapse
Affiliation(s)
- Yan Gao
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
- Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
| | - Yan Cui
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA.
- Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA.
- Center for Cancer Research, University of Tennessee Health Science Center, Memphis, TN, 38163, USA.
| |
Collapse
|
3
|
Ojima T, Namba S, Suzuki K, Yamamoto K, Sonehara K, Narita A, Kamatani Y, Tamiya G, Yamamoto M, Yamauchi T, Kadowaki T, Okada Y. Body mass index stratification optimizes polygenic prediction of type 2 diabetes in cross-biobank analyses. Nat Genet 2024; 56:1100-1109. [PMID: 38862855 DOI: 10.1038/s41588-024-01782-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Accepted: 04/26/2024] [Indexed: 06/13/2024]
Abstract
Type 2 diabetes (T2D) shows heterogeneous body mass index (BMI) sensitivity. Here, we performed stratification based on BMI to optimize predictions for BMI-related diseases. We obtained BMI-stratified datasets using data from more than 195,000 individuals (nT2D = 55,284) from BioBank Japan (BBJ) and UK Biobank. T2D heritability in the low-BMI group was greater than that in the high-BMI group. Polygenic predictions of T2D toward low-BMI targets had pseudo-R2 values that were more than 22% higher than BMI-unstratified targets. Polygenic risk scores (PRSs) from low-BMI discovery outperformed PRSs from high BMI, while PRSs from BMI-unstratified discovery performed best. Pathway-specific PRSs demonstrated the biological contributions of pathogenic pathways. Low-BMI T2D cases showed higher rates of neuropathy and retinopathy. Combining BMI stratification and a method integrating cross-population effects, T2D predictions showed greater than 37% improvements over unstratified-matched-population prediction. We replicated findings in the Tohoku Medical Megabank (n = 26,000) and the second BBJ cohort (n = 33,096). Our findings suggest that target stratification based on existing traits can improve the polygenic prediction of heterogeneous diseases.
Collapse
Affiliation(s)
- Takafumi Ojima
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Graduate School of Medicine, Tohoku University, Sendai, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
| | - Shinichi Namba
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Genome Informatics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Ken Suzuki
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Kenichi Yamamoto
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Pediatrics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan
- Laboratory of Children's Health and Genetics, Division of Health Science, Osaka University Graduate School of Medicine, Osaka, Japan
| | - Kyuto Sonehara
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Department of Genome Informatics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Akira Narita
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Yoichiro Kamatani
- Laboratory of Complex Trait Genomics, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Gen Tamiya
- Graduate School of Medicine, Tohoku University, Sendai, Japan
- Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Masayuki Yamamoto
- Graduate School of Medicine, Tohoku University, Sendai, Japan
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Toshimasa Yamauchi
- Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | | | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan.
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Department of Genome Informatics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan.
- Premium Research Institute for Human Metaverse Medicine (WPI-PRIMe), Osaka University, Osaka, Japan.
| |
Collapse
|
4
|
Edwards AC, Lannoy S, Stephenson ME, Kendler KS, Salvatore JE. Divorce, genetic risk, and suicidal thoughts and behaviors in a sample with recurrent major depressive disorder. J Affect Disord 2024; 354:642-648. [PMID: 38521136 PMCID: PMC11015957 DOI: 10.1016/j.jad.2024.03.100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/18/2024] [Accepted: 03/20/2024] [Indexed: 03/25/2024]
Abstract
BACKGROUND Theories of risk for suicidal thoughts and behaviors (STB) implicate both interpersonal and biological factors. Divorce/separation and aggregate genetic liability are robustly associated with STB, but have seldom been evaluated in conjunction with one another. Furthermore, whether these factors are effective predictors in high-risk populations is not clear. METHODS Analyses were conducted in a sample of Han Chinese women with severe recurrent major depressive disorder (maximum N = 4380). Logistic regressions were used to evaluate the associations between divorce/separation and polygenic scores (PGS) for suicidal ideation or behavior with STB. Where appropriate, additive interactions between divorce and PGS were tested. RESULTS Divorce/separation was significantly associated with increased risk of suicidal ideation, plans, and attempts (odds ratios = 1.28-1.61). PGS for suicidal ideation were not associated with STB, while PGS for suicidal behavior were associated with ideation and plans (odds ratios = 1.08-1.09). There were no significant interactions between divorce/separation and PGS. CONCLUSIONS Consistent with theories of suicidality, the disruption or end of an important interpersonal relationship is an indicator of risk for STB. Aggregate genetic liability for suicidal behavior more modestly contributes to risk, but does not exacerbate the negative impact of divorce. Thus, even within a high-risk sample, interpersonal and biological exposures distinguish between those who do and do not experience STB, and could motivate targeted screening. Further research is necessary to evaluate whether and how the context of divorce contributes to variation in its effect on STB risk.
Collapse
Affiliation(s)
- Alexis C Edwards
- Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University School of Medicine, Richmond, VA, USA.
| | - Séverine Lannoy
- Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University School of Medicine, Richmond, VA, USA
| | - Mallory E Stephenson
- Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University School of Medicine, Richmond, VA, USA
| | - Kenneth S Kendler
- Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University School of Medicine, Richmond, VA, USA
| | - Jessica E Salvatore
- Department of Psychiatry, Robert Wood Johnson Medical School, Rutgers University, Piscataway, NJ, USA
| |
Collapse
|
5
|
Wang Y, He Y, Shi Y, Qian DC, Gray KJ, Winn R, Martin AR. Aspiring toward equitable benefits from genomic advances to individuals of ancestrally diverse backgrounds. Am J Hum Genet 2024; 111:809-824. [PMID: 38642557 PMCID: PMC11080611 DOI: 10.1016/j.ajhg.2024.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 04/01/2024] [Accepted: 04/01/2024] [Indexed: 04/22/2024] Open
Abstract
Advancements in genomic technologies have shown remarkable promise for improving health trajectories. The Human Genome Project has catalyzed the integration of genomic tools into clinical practice, such as disease risk assessment, prenatal testing and reproductive genomics, cancer diagnostics and prognostication, and therapeutic decision making. Despite the promise of genomic technologies, their full potential remains untapped without including individuals of diverse ancestries and integrating social determinants of health (SDOHs). The NHGRI launched the 2020 Strategic Vision with ten bold predictions by 2030, including "individuals from ancestrally diverse backgrounds will benefit equitably from advances in human genomics." Meeting this goal requires a holistic approach that brings together genomic advancements with careful consideration to healthcare access as well as SDOHs to ensure that translation of genetics research is inclusive, affordable, and accessible and ultimately narrows rather than widens health disparities. With this prediction in mind, this review delves into the two paramount applications of genetic testing-reproductive genomics and precision oncology. When discussing these applications of genomic advancements, we evaluate current accessibility limitations, highlight challenges in achieving representativeness, and propose paths forward to realize the ultimate goal of their equitable applications.
Collapse
Affiliation(s)
- Ying Wang
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA.
| | - Yixuan He
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Yue Shi
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Reproductive Medicine Center, the First Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510080, China
| | - David C Qian
- Department of Thoracic Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Kathryn J Gray
- Department of Obstetrics and Gynecology, University of Washington, Seattle, WA, USA
| | - Robert Winn
- Virginia Commonwealth University Massey Cancer Center, Richmond, VA, USA
| | - Alicia R Martin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA.
| |
Collapse
|
6
|
Vilhjálmsson BJ. Towards fair and clinically relevant polygenic predictions. Trends Genet 2024; 40:379-380. [PMID: 38643035 DOI: 10.1016/j.tig.2024.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 04/03/2024] [Indexed: 04/22/2024]
Abstract
Lennon et al. recently proposed a clinical polygenic score (PGS) pipeline as part of the Electronic Medical Records and Genomics (eMERGE) network initiative. In this spotlight article we discuss the broader context for the use of PGS in preventive medicine and highlight key limitations and challenges facing their inclusion in prediction models.
Collapse
Affiliation(s)
- Bjarni Jóhann Vilhjálmsson
- National Centre for Register-based Research, Aarhus BSS, Aarhus University, Aarhus, Denmark; Bioinformatics Research Centre, Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark; Novo Nordisk Foundation Centre for Genomics Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
7
|
Zheng Z, Liu S, Sidorenko J, Wang Y, Lin T, Yengo L, Turley P, Ani A, Wang R, Nolte IM, Snieder H, Yang J, Wray NR, Goddard ME, Visscher PM, Zeng J. Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries. Nat Genet 2024; 56:767-777. [PMID: 38689000 PMCID: PMC11096109 DOI: 10.1038/s41588-024-01704-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Accepted: 03/05/2024] [Indexed: 05/02/2024]
Abstract
We develop a method, SBayesRC, that integrates genome-wide association study (GWAS) summary statistics with functional genomic annotations to improve polygenic prediction of complex traits. Our method is scalable to whole-genome variant analysis and refines signals from functional annotations by allowing them to affect both causal variant probability and causal effect distribution. We analyze 50 complex traits and diseases using ∼7 million common single-nucleotide polymorphisms (SNPs) and 96 annotations. SBayesRC improves prediction accuracy by 14% in European ancestry and up to 34% in cross-ancestry prediction compared to the baseline method SBayesR, which does not use annotations, and outperforms other methods, including LDpred2, LDpred-funct, MegaPRS, PolyPred-S and PRS-CSx. Investigation of factors affecting prediction accuracy identifies a significant interaction between SNP density and annotation information, suggesting whole-genome sequence variants with annotations may further improve prediction. Functional partitioning analysis highlights a major contribution of evolutionary constrained regions to prediction accuracy and the largest per-SNP contribution from nonsynonymous SNPs.
Collapse
Affiliation(s)
- Zhili Zheng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia.
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| | - Shouye Liu
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Julia Sidorenko
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Ying Wang
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Tian Lin
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Loic Yengo
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Patrick Turley
- Center for Economic and Social Research, University of Southern California, Los Angeles, CA, USA
- Department of Economics, University of Southern California, Los Angeles, CA, USA
| | - Alireza Ani
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
- Department of Bioinformatics, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Rujia Wang
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Ilja M Nolte
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Harold Snieder
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Jian Yang
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China
| | - Naomi R Wray
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
- Department of Psychiatry, University of Oxford, Oxford, UK
| | - Michael E Goddard
- Faculty of Veterinary and Agricultural Science, University of Melbourne, Parkville, Victoria, Australia
- Biosciences Research Division, Department of Economic Development, Jobs, Transport and Resources, Bundoora, Victoria, Australia
| | - Peter M Visscher
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Jian Zeng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia.
| |
Collapse
|
8
|
Kingdom R, Beaumont RN, Wood AR, Weedon MN, Wright CF. Genetic modifiers of rare variants in monogenic developmental disorder loci. Nat Genet 2024; 56:861-868. [PMID: 38637616 PMCID: PMC11096126 DOI: 10.1038/s41588-024-01710-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 03/06/2024] [Indexed: 04/20/2024]
Abstract
Rare damaging variants in a large number of genes are known to cause monogenic developmental disorders (DDs) and have also been shown to cause milder subclinical phenotypes in population cohorts. Here, we show that carrying multiple (2-5) rare damaging variants across 599 dominant DD genes has an additive adverse effect on numerous cognitive and socioeconomic traits in UK Biobank, which can be partially counterbalanced by a higher educational attainment polygenic score (EA-PGS). Phenotypic deviators from expected EA-PGS could be partly explained by the enrichment or depletion of rare DD variants. Among carriers of rare DD variants, those with a DD-related clinical diagnosis had a substantially lower EA-PGS and more severe phenotype than those without a clinical diagnosis. Our results suggest that the overall burden of both rare and common variants can modify the expressivity of a phenotype, which may then influence whether an individual reaches the threshold for clinical disease.
Collapse
Affiliation(s)
- Rebecca Kingdom
- Department of Clinical and Biomedical Sciences, University of Exeter Medical School, Royal Devon & Exeter Hospital, Exeter, UK
| | - Robin N Beaumont
- Department of Clinical and Biomedical Sciences, University of Exeter Medical School, Royal Devon & Exeter Hospital, Exeter, UK
| | - Andrew R Wood
- Department of Clinical and Biomedical Sciences, University of Exeter Medical School, Royal Devon & Exeter Hospital, Exeter, UK
| | - Michael N Weedon
- Department of Clinical and Biomedical Sciences, University of Exeter Medical School, Royal Devon & Exeter Hospital, Exeter, UK
| | - Caroline F Wright
- Department of Clinical and Biomedical Sciences, University of Exeter Medical School, Royal Devon & Exeter Hospital, Exeter, UK.
| |
Collapse
|
9
|
Troubat L, Fettahoglu D, Henches L, Aschard H, Julienne H. Multi-trait GWAS for diverse ancestries: mapping the knowledge gap. BMC Genomics 2024; 25:375. [PMID: 38627641 PMCID: PMC11022331 DOI: 10.1186/s12864-024-10293-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 04/09/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND Approximately 95% of samples analyzed in univariate genome-wide association studies (GWAS) are of European ancestry. This bias toward European ancestry populations in association screening also exists for other analyses and methods that are often developed and tested on European ancestry only. However, existing data in non-European populations, which are often of modest sample size, could benefit from innovative approaches as recently illustrated in the context of polygenic risk scores. METHODS Here, we extend and assess the potential limitations and gains of our multi-trait GWAS pipeline, JASS (Joint Analysis of Summary Statistics), for the analysis of non-European ancestries. To this end, we conducted the joint GWAS of 19 hematological traits and glycemic traits across five ancestries (European (EUR), admixed American (AMR), African (AFR), East Asian (EAS), and South-East Asian (SAS)). RESULTS We detected 367 new genome-wide significant associations in non-European populations (15 in Admixed American (AMR), 72 in African (AFR) and 280 in East Asian (EAS)). New associations detected represent 5%, 17% and 13% of associations in the AFR, AMR and EAS populations, respectively. Overall, multi-trait testing increases the replication of European associated loci in non-European ancestry by 15%. Pleiotropic effects were highly similar at significant loci across ancestries (e.g. the mean correlation between multi-trait genetic effects of EUR and EAS ancestries was 0.88). For hematological traits, strong discrepancies in multi-trait genetic effects are tied to known evolutionary divergences: the ARKC1 loci, which is adaptive to overcome p.vivax induced malaria. CONCLUSIONS Multi-trait GWAS can be a valuable tool to narrow the genetic knowledge gap between European and non-European populations.
Collapse
Affiliation(s)
- Lucie Troubat
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, F-75015, France
| | - Deniz Fettahoglu
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, F-75015, France
| | - Léo Henches
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, F-75015, France
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, F-75015, France
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Hanna Julienne
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, F-75015, France.
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, Paris, F-75015, France.
| |
Collapse
|
10
|
Hui D, Dudek S, Kiryluk K, Walunas TL, Kullo IJ, Wei WQ, Tiwari HK, Peterson JF, Chung WK, Davis B, Khan A, Kottyan L, Limdi NA, Feng Q, Puckelwartz MJ, Weng C, Smith JL, Karlson EW, Center RG, Jarvik GP, Ritchie MD. Risk factors affecting polygenic score performance across diverse cohorts. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.05.10.23289777. [PMID: 38645167 PMCID: PMC11030495 DOI: 10.1101/2023.05.10.23289777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Apart from ancestry, personal or environmental covariates may contribute to differences in polygenic score (PGS) performance. We analyzed effects of covariate stratification and interaction on body mass index (BMI) PGS (PGS BMI ) across four cohorts of European (N=491,111) and African (N=21,612) ancestry. Stratifying on binary covariates and quintiles for continuous covariates, 18/62 covariates had significant and replicable R 2 differences among strata. Covariates with the largest differences included age, sex, blood lipids, physical activity, and alcohol consumption, with R 2 being nearly double between best and worst performing quintiles for certain covariates. 28 covariates had significant PGS BMI -covariate interaction effects, modifying PGS BMI effects by nearly 20% per standard deviation change. We observed overlap between covariates that had significant R 2 differences among strata and interaction effects - across all covariates, their main effects on BMI were correlated with their maximum R 2 differences and interaction effects (0.56 and 0.58, respectively), suggesting high-PGS BMI individuals have highest R 2 and increase in PGS effect. Using quantile regression, we show the effect of PGS BMI increases as BMI itself increases, and that these differences in effects are directly related to differences in R 2 when stratifying by different covariates. Given significant and replicable evidence for context-specific PGS BMI performance and effects, we investigated ways to increase model performance taking into account non-linear effects. Machine learning models (neural networks) increased relative model R 2 (mean 23%) across datasets. Finally, creating PGS BMI directly from GxAge GWAS effects increased relative R 2 by 7.8%. These results demonstrate that certain covariates, especially those most associated with BMI, significantly affect both PGS BMI performance and effects across diverse cohorts and ancestries, and we provide avenues to improve model performance that consider these effects.
Collapse
|
11
|
Timmins IR, Dudbridge F. Bayesian approach to assessing population differences in genetic risk of disease with application to prostate cancer. PLoS Genet 2024; 20:e1011212. [PMID: 38630784 PMCID: PMC11023298 DOI: 10.1371/journal.pgen.1011212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 03/07/2024] [Indexed: 04/19/2024] Open
Abstract
Population differences in risk of disease are common, but the potential genetic basis for these differences is not well understood. A standard approach is to compare genetic risk across populations by testing for mean differences in polygenic scores, but existing studies that use this approach do not account for statistical noise in effect estimates (i.e., the GWAS betas) that arise due to the finite sample size of GWAS training data. Here, we show using Bayesian polygenic score methods that the level of uncertainty in estimates of genetic risk differences across populations is highly dependent on the GWAS training sample size, the polygenicity (number of causal variants), and genetic distance (FST) between the populations considered. We derive a Wald test for formally assessing the difference in genetic risk across populations, which we show to have calibrated type 1 error rates under a simplified assumption that all SNPs are independent, which we achieve in practise using linkage disequilibrium (LD) pruning. We further provide closed-form expressions for assessing the uncertainty in estimates of relative genetic risk across populations under the special case of an infinitesimal genetic architecture. We suggest that for many complex traits and diseases, particularly those with more polygenic architectures, current GWAS sample sizes are insufficient to detect moderate differences in genetic risk across populations, though more substantial differences in relative genetic risk (relative risk > 1.5) can be detected. We show that conventional approaches that do not account for sampling error from the training sample, such as using a simple t-test, have very high type 1 error rates. When applying our approach to prostate cancer, we demonstrate a higher genetic risk in African Ancestry men, with lower risk in men of European followed by East Asian ancestry.
Collapse
Affiliation(s)
- Iain R. Timmins
- Department of Population Health Sciences, University of Leicester, Leicester, United Kingdom
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, United Kingdom
- Statistical Innovation, AstraZeneca, Cambridge, United Kingdom
| | | | - Frank Dudbridge
- Department of Population Health Sciences, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
12
|
Kresge HA, Blostein F, Goleva S, Albiñana C, Revez JA, Wray NR, Vilhjálmsson BJ, Zhu Z, McGrath JJ, Davis LK. Phenomewide Association Study of Health Outcomes Associated With the Genetic Correlates of 25 Hydroxyvitamin D Concentration and Vitamin D Binding Protein Concentration. Twin Res Hum Genet 2024; 27:69-79. [PMID: 38644690 PMCID: PMC11138239 DOI: 10.1017/thg.2024.19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
While it is known that vitamin D deficiency is associated with adverse bone outcomes, it remains unclear whether low vitamin D status may increase the risk of a wider range of health outcomes. We had the opportunity to explore the association between common genetic variants associated with both 25 hydroxyvitamin D (25OHD) and the vitamin D binding protein (DBP, encoded by the GC gene) with a comprehensive range of health disorders and laboratory tests in a large academic medical center. We used summary statistics for 25OHD and DBP to generate polygenic scores (PGS) for 66,482 participants with primarily European ancestry and 13,285 participants with primarily African ancestry from the Vanderbilt University Medical Center Biobank (BioVU). We examined the predictive properties of PGS25OHD, and two scores related to DBP concentration with respect to 1322 health-related phenotypes and 315 laboratory-measured phenotypes from electronic health records. In those with European ancestry: (a) the PGS25OHD and PGSDBP scores, and individual SNPs rs4588 and rs7041 were associated with both 25OHD concentration and 1,25 dihydroxyvitamin D concentrations; (b) higher PGS25OHD was associated with decreased concentrations of triglycerides and cholesterol, and reduced risks of vitamin D deficiency, disorders of lipid metabolism, and diabetes. In general, the findings for the African ancestry group were consistent with findings from the European ancestry analyses. Our study confirms the utility of PGS and two key variants within the GC gene (rs4588 and rs7041) to predict the risk of vitamin D deficiency in clinical settings and highlights the shared biology between vitamin D-related genetic pathways a range of health outcomes.
Collapse
Affiliation(s)
- Hailey A. Kresge
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Freida Blostein
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Slavina Goleva
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Clara Albiñana
- National Centre for Register-Based Research, Aarhus University, Aarhus V, Denmark
- Department of Psychiatry, University of Oxford, Oxford, UK
| | - Joana A. Revez
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| | - Naomi R. Wray
- Department of Psychiatry, University of Oxford, Oxford, UK
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
- Queensland Brain Institute, University of Queensland, Brisbane, QLD, Australia
| | - Bjarni J. Vilhjálmsson
- National Centre for Register-Based Research, Aarhus University, Aarhus V, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus C, Denmark
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute, Cambridge, MA, USA
| | - Zhihong Zhu
- National Centre for Register-Based Research, Aarhus University, Aarhus V, Denmark
| | - John J. McGrath
- National Centre for Register-Based Research, Aarhus University, Aarhus V, Denmark
- Queensland Brain Institute, University of Queensland, Brisbane, QLD, Australia
- Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Wacol, QLD, Australia
| | - Lea K. Davis
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Division of Neurology, Pharmacology and Special Education, Vanderbilt Kennedy Center, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
13
|
Wang Z, Fu G, Ma G, Wang C, Wang Q, Lu C, Fu L, Zhang X, Cong B, Li S. The association between DNA methylation and human height and a prospective model of DNA methylation-based height prediction. Hum Genet 2024; 143:401-421. [PMID: 38507014 DOI: 10.1007/s00439-024-02659-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/13/2024] [Indexed: 03/22/2024]
Abstract
As a vital anthropometric characteristic, human height information not only helps to understand overall developmental status and genetic risk factors, but is also important for forensic DNA phenotyping. We utilized linear regression analysis to test the association between each CpG probe and the height phenotype. Next, we designed a methylation sequencing panel targeting 959 CpGs and subsequent height inference models were constructed for the Chinese population. A total of 11,730 height-associated sites were identified. By employing KPCA and deep neural networks, a prediction model was developed, of which the cross-validation RMSE, MAE and R2 were 5.62 cm, 4.45 cm and 0.64, respectively. Genetic factors could explain 39.4% of the methylation level variance of sites used in the height inference models. Collectively, we demonstrated an association between height and DNA methylation status through an EWAS analysis. Targeted methylation sequencing of only 959 CpGs combined with deep learning techniques could provide a model to estimate human height with higher accuracy than SNP-based prediction models.
Collapse
Affiliation(s)
- Zhonghua Wang
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Guangping Fu
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Guanju Ma
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Chunyan Wang
- Physical Examination Center of Shijiazhuang People's Hospital, Shijiazhuang, 050011, Hebei, China
| | - Qian Wang
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Chaolong Lu
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Lihong Fu
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Xiaojing Zhang
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Bin Cong
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Shujin Li
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China.
| |
Collapse
|
14
|
Lo Faro V, Bhattacharya A, Zhou W, Zhou D, Wang Y, Läll K, Kanai M, Lopera-Maya E, Straub P, Pawar P, Tao R, Zhong X, Namba S, Sanna S, Nolte IM, Okada Y, Ingold N, MacGregor S, Snieder H, Surakka I, Shortt J, Gignoux C, Rafaels N, Crooks K, Verma A, Verma SS, Guare L, Rader DJ, Willer C, Martin AR, Brantley MA, Gamazon ER, Jansonius NM, Joos K, Cox NJ, Hirbo J. Novel ancestry-specific primary open-angle glaucoma loci and shared biology with vascular mechanisms and cell proliferation. Cell Rep Med 2024; 5:101430. [PMID: 38382466 PMCID: PMC10897632 DOI: 10.1016/j.xcrm.2024.101430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 03/28/2023] [Accepted: 01/25/2024] [Indexed: 02/23/2024]
Abstract
Primary open-angle glaucoma (POAG), a leading cause of irreversible blindness globally, shows disparity in prevalence and manifestations across ancestries. We perform meta-analysis across 15 biobanks (of the Global Biobank Meta-analysis Initiative) (n = 1,487,441: cases = 26,848) and merge with previous multi-ancestry studies, with the combined dataset representing the largest and most diverse POAG study to date (n = 1,478,037: cases = 46,325) and identify 17 novel significant loci, 5 of which were ancestry specific. Gene-enrichment and transcriptome-wide association analyses implicate vascular and cancer genes, a fifth of which are primary ciliary related. We perform an extensive statistical analysis of SIX6 and CDKN2B-AS1 loci in human GTEx data and across large electronic health records showing interaction between SIX6 gene and causal variants in the chr9p21.3 locus, with expression effect on CDKN2A/B. Our results suggest that some POAG risk variants may be ancestry specific, sex specific, or both, and support the contribution of genes involved in programmed cell death in POAG pathogenesis.
Collapse
Affiliation(s)
- Valeria Lo Faro
- Department of Ophthalmology, Amsterdam University Medical Center (AMC), Amsterdam, the Netherlands; Department of Clinical Genetics, Amsterdam University Medical Center (AMC), Amsterdam, the Netherlands; Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Wei Zhou
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Dan Zhou
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ying Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Kristi Läll
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Department of Statistical Genetics, Osaka University Graduate School of Medicine, Osaka, Japan; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Esteban Lopera-Maya
- University of Groningen, UMCG, Department of Genetics, Groningen, the Netherlands
| | - Peter Straub
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Priyanka Pawar
- Vanderbilt Eye Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ran Tao
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Xue Zhong
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Shinichi Namba
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Osaka, Japan
| | - Serena Sanna
- University of Groningen, UMCG, Department of Genetics, Groningen, the Netherlands; Institute for Genetics and Biomedical Research (IRGB), National Research Council (CNR), Cagliari, Italy
| | - Ilja M Nolte
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Osaka, Japan; Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan; Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka, Japan; Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka, Japan; Center for Infectious Disease Education and Research (CiDER), Osaka University, Osaka, Japan
| | - Nathan Ingold
- Statistical Genetics, QIMR Berghofer Medical Research Institute, Queensland University of Technology, Brisbane, QLD, Australia; School of Biomedical Sciences, Faculty of Health, Queensland University of Technology, Brisbane, QLD, Australia
| | - Stuart MacGregor
- Statistical Genetics, QIMR Berghofer Medical Research Institute, Queensland University of Technology, Brisbane, QLD, Australia
| | - Harold Snieder
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Ida Surakka
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Jonathan Shortt
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Chris Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Nicholas Rafaels
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kristy Crooks
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Anurag Verma
- Department of Medicine, Division of Translational Medicine and Human Genetics, University of Pennsylvania, Philadelphia, PA, USA; Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Shefali S Verma
- Department of Pathology, University of Pennsylvania, Philadelphia, PA, USA
| | - Lindsay Guare
- Department of Pathology, University of Pennsylvania, Philadelphia, PA, USA; Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Daniel J Rader
- Department of Medicine, Division of Translational Medicine and Human Genetics, University of Pennsylvania, Philadelphia, PA, USA; Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA; Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Cristen Willer
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, Norway; Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA; Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Alicia R Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Milam A Brantley
- Vanderbilt Eye Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Eric R Gamazon
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Nomdo M Jansonius
- Department of Ophthalmology, Amsterdam University Medical Center (AMC), Amsterdam, the Netherlands
| | - Karen Joos
- Vanderbilt Eye Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Nancy J Cox
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jibril Hirbo
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
15
|
Xiang R, Kelemen M, Xu Y, Harris LW, Parkinson H, Inouye M, Lambert SA. Recent advances in polygenic scores: translation, equitability, methods and FAIR tools. Genome Med 2024; 16:33. [PMID: 38373998 PMCID: PMC10875792 DOI: 10.1186/s13073-024-01304-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 02/07/2024] [Indexed: 02/21/2024] Open
Abstract
Polygenic scores (PGS) can be used for risk stratification by quantifying individuals' genetic predisposition to disease, and many potentially clinically useful applications have been proposed. Here, we review the latest potential benefits of PGS in the clinic and challenges to implementation. PGS could augment risk stratification through combined use with traditional risk factors (demographics, disease-specific risk factors, family history, etc.), to support diagnostic pathways, to predict groups with therapeutic benefits, and to increase the efficiency of clinical trials. However, there exist challenges to maximizing the clinical utility of PGS, including FAIR (Findable, Accessible, Interoperable, and Reusable) use and standardized sharing of the genomic data needed to develop and recalculate PGS, the equitable performance of PGS across populations and ancestries, the generation of robust and reproducible PGS calculations, and the responsible communication and interpretation of results. We outline how these challenges may be overcome analytically and with more diverse data as well as highlight sustained community efforts to achieve equitable, impactful, and responsible use of PGS in healthcare.
Collapse
Affiliation(s)
- Ruidong Xiang
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Martin Kelemen
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
| | - Yu Xu
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
| | - Laura W Harris
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK.
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK.
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK.
| | - Samuel A Lambert
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
16
|
Aw AJ, McRae J, Rahmani E, Song YS. Highly parameterized polygenic scores tend to overfit to population stratification via random effects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.27.577589. [PMID: 38352303 PMCID: PMC10862757 DOI: 10.1101/2024.01.27.577589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
Polygenic scores (PGSs), increasingly used in clinical settings, frequently include many genetic variants, with performance typically peaking at thousands of variants. Such highly parameterized PGSs often include variants that do not pass a genome-wide significance threshold. We propose a mathematical perspective that renders the effects of many of these non-significant variants random rather than causal, with the randomness capturing population structure. We devise methods to assess variant effect randomness and population stratification bias. Applying these methods to 141 traits from the UK Biobank, we find that, for many PGSs, the effects of non-significant variants are considerably random, with the extent of randomness associated with the degree of overfitting to population structure of the discovery cohort. Our findings explain why highly parameterized PGSs simultaneously have superior cohort-specific performance and limited generalizability, suggesting the critical need for variant randomness tests in PGS evaluation. Supporting code and a dashboard are available at https://github.com/songlab-cal/StratPGS.
Collapse
Affiliation(s)
- Alan J. Aw
- Department of Statistics, University of California, Berkeley
- Center for Computational Biology, University of California, Berkeley
- Artificial Intelligence Laboratory, Illumina Inc
| | - Jeremy McRae
- Artificial Intelligence Laboratory, Illumina Inc
| | - Elior Rahmani
- Department of Computational Medicine, University of California, Los Angeles
| | - Yun S. Song
- Department of Statistics, University of California, Berkeley
- Center for Computational Biology, University of California, Berkeley
- Computer Science Division, University of California, Berkeley
| |
Collapse
|
17
|
Kachuri L, Chatterjee N, Hirbo J, Schaid DJ, Martin I, Kullo IJ, Kenny EE, Pasaniuc B, Witte JS, Ge T. Principles and methods for transferring polygenic risk scores across global populations. Nat Rev Genet 2024; 25:8-25. [PMID: 37620596 PMCID: PMC10961971 DOI: 10.1038/s41576-023-00637-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2023] [Indexed: 08/26/2023]
Abstract
Polygenic risk scores (PRSs) summarize the genetic predisposition of a complex human trait or disease and may become a valuable tool for advancing precision medicine. However, PRSs that are developed in populations of predominantly European genetic ancestries can increase health disparities due to poor predictive performance in individuals of diverse and complex genetic ancestries. We describe genetic and modifiable risk factors that limit the transferability of PRSs across populations and review the strengths and weaknesses of existing PRS construction methods for diverse ancestries. Developing PRSs that benefit global populations in research and clinical settings provides an opportunity for innovation and is essential for health equity.
Collapse
Affiliation(s)
- Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Jibril Hirbo
- Department of Medicine Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Daniel J Schaid
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Iman Martin
- Division of Genomic Medicine, National Human Genome Research Institute, Bethesda, MD, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bogdan Pasaniuc
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - John S Witte
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA.
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Department of Genetics, Stanford University, Stanford, CA, USA.
| | - Tian Ge
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
18
|
Sohail M, Moreno-Estrada A. The Mexican Biobank Project promotes genetic discovery, inclusive science and local capacity building. Dis Model Mech 2024; 17:dmm050522. [PMID: 38299665 PMCID: PMC10855211 DOI: 10.1242/dmm.050522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2024] Open
Abstract
Diversifying genotype-phenotype databases is essential to understanding complex trait and disease etiology across different environments and genetic ancestries. The rise of biobanks across the world is helping reveal the genetic and environmental architecture of multiple disease traits but the diversity they capture remains limited. To help close this gap, the Mexican Biobank (MXB) Project was recently generated, and has already revealed fine-scale genetic ancestries and demographic histories across the country, and their impact on trait-relevant genetic variation. This will help guide future genetic epidemiology and public health efforts, and has also improved polygenic prediction for several traits in Mexican populations compared with using data from other genome-wide association studies, such as the UK Biobank. The MXB illustrates the importance of transnational initiatives and funding calls that prioritize local leadership and capacity building to move towards inclusive genomic science.
Collapse
Affiliation(s)
- Mashaal Sohail
- Genómica Computacional, Centro de Ciencias Genómicas (CCG), Universidad Nacional Autónoma de México (UNAM), 62209 Cuernavaca, Morelos, México
| | - Andrés Moreno-Estrada
- Unidad de Genómica Avanzada (UGA-LANGEBIO), Centro de Investigación y Estudios Avanzados del IPN (Cinvestav), 36821 Irapuato, Guanajuato, México
| |
Collapse
|
19
|
Hoggart CJ, Choi SW, García-González J, Souaiaia T, Preuss M, O'Reilly PF. BridgePRS leverages shared genetic effects across ancestries to increase polygenic risk score portability. Nat Genet 2024; 56:180-186. [PMID: 38123642 PMCID: PMC10786716 DOI: 10.1038/s41588-023-01583-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 10/20/2023] [Indexed: 12/23/2023]
Abstract
Here we present BridgePRS, a novel Bayesian polygenic risk score (PRS) method that leverages shared genetic effects across ancestries to increase PRS portability. We evaluate BridgePRS via simulations and real UK Biobank data across 19 traits in individuals of African, South Asian and East Asian ancestry, using both UK Biobank and Biobank Japan genome-wide association study summary statistics; out-of-cohort validation is performed in the Mount Sinai (New York) BioMe biobank. BridgePRS is compared with the leading alternative, PRS-CSx, and two other PRS methods. Simulations suggest that the performance of BridgePRS relative to PRS-CSx increases as uncertainty increases: with lower trait heritability, higher polygenicity and greater between-population genetic diversity; and when causal variants are not present in the data. In real data, BridgePRS has a 61% larger average R2 than PRS-CSx in out-of-cohort prediction of African ancestry samples in BioMe (P = 6 × 10-5). BridgePRS is a computationally efficient, user-friendly and powerful approach for PRS analyses in non-European ancestries.
Collapse
Affiliation(s)
- Clive J Hoggart
- Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, New York, NY, USA.
| | - Shing Wan Choi
- Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, New York, NY, USA
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Judit García-González
- Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, New York, NY, USA
| | - Tade Souaiaia
- Department of Cellular Biology, Suny Downstate Health Sciences, Brooklyn, NY, USA
| | - Michael Preuss
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine, Mount Sinai, New York, NY, USA
| | - Paul F O'Reilly
- Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, New York, NY, USA.
| |
Collapse
|
20
|
Zhang MJ, Durvasula A, Chiang C, Koch EM, Strober BJ, Shi H, Barton AR, Kim SS, Weissbrod O, Loh PR, Gazal S, Sunyaev S, Price AL. Pervasive correlations between causal disease effects of proximal SNPs vary with functional annotations and implicate stabilizing selection. RESEARCH SQUARE 2023:rs.3.rs-3707248. [PMID: 38168385 PMCID: PMC10760228 DOI: 10.21203/rs.3.rs-3707248/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
The genetic architecture of human diseases and complex traits has been extensively studied, but little is known about the relationship of causal disease effect sizes between proximal SNPs, which have largely been assumed to be independent. We introduce a new method, LD SNP-pair effect correlation regression (LDSPEC), to estimate the correlation of causal disease effect sizes of derived alleles between proximal SNPs, depending on their allele frequencies, LD, and functional annotations; LDSPEC produced robust estimates in simulations across various genetic architectures. We applied LDSPEC to 70 diseases and complex traits from the UK Biobank (average N=306K), meta-analyzing results across diseases/traits. We detected significantly nonzero effect correlations for proximal SNP pairs (e.g., -0.37±0.09 for low-frequency positive-LD 0-100bp SNP pairs) that decayed with distance (e.g., -0.07±0.01 for low-frequency positive-LD 1-10kb), varied with allele frequency (e.g., -0.15±0.04 for common positive-LD 0-100bp), and varied with LD between SNPs (e.g., +0.12±0.05 for common negative-LD 0-100bp) (because we consider derived alleles, positive-LD and negative-LD SNP pairs may yield very different results). We further determined that SNP pairs with shared functions had stronger effect correlations that spanned longer genomic distances, e.g., -0.37±0.08 for low-frequency positive-LD same-gene promoter SNP pairs (average genomic distance of 47kb (due to alternative splicing)) and -0.32±0.04 for low-frequency positive-LD H3K27ac 0-1kb SNP pairs. Consequently, SNP-heritability estimates were substantially smaller than estimates of the sum of causal effect size variances across all SNPs (ratio of 0.87±0.02 across diseases/traits), particularly for certain functional annotations (e.g., 0.78±0.01 for common Super enhancer SNPs)-even though these quantities are widely assumed to be equal. We recapitulated our findings via forward simulations with an evolutionary model involving stabilizing selection, implicating the action of linkage masking, whereby haplotypes containing linked SNPs with opposite effects on disease have reduced effects on fitness and escape negative selection.
Collapse
Affiliation(s)
- Martin Jinye Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Arun Durvasula
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Colby Chiang
- Department of Pediatrics, Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA
| | - Evan M. Koch
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Benjamin J. Strober
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Alison R. Barton
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Samuel S. Kim
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Po-Ru Loh
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Steven Gazal
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
- Department of Quantitative and Computational Biology, University of Southern California
- Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California
| | - Shamil Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Alkes L. Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
21
|
Zhang MJ, Durvasula A, Chiang C, Koch EM, Strober BJ, Shi H, Barton AR, Kim SS, Weissbrod O, Loh PR, Gazal S, Sunyaev S, Price AL. Pervasive correlations between causal disease effects of proximal SNPs vary with functional annotations and implicate stabilizing selection. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.04.23299391. [PMID: 38106023 PMCID: PMC10723494 DOI: 10.1101/2023.12.04.23299391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
The genetic architecture of human diseases and complex traits has been extensively studied, but little is known about the relationship of causal disease effect sizes between proximal SNPs, which have largely been assumed to be independent. We introduce a new method, LD SNP-pair effect correlation regression (LDSPEC), to estimate the correlation of causal disease effect sizes of derived alleles between proximal SNPs, depending on their allele frequencies, LD, and functional annotations; LDSPEC produced robust estimates in simulations across various genetic architectures. We applied LDSPEC to 70 diseases and complex traits from the UK Biobank (average N=306K), meta-analyzing results across diseases/traits. We detected significantly nonzero effect correlations for proximal SNP pairs (e.g., -0.37±0.09 for low-frequency positive-LD 0-100bp SNP pairs) that decayed with distance (e.g., -0.07±0.01 for low-frequency positive-LD 1-10kb), varied with allele frequency (e.g., -0.15±0.04 for common positive-LD 0-100bp), and varied with LD between SNPs (e.g., +0.12±0.05 for common negative-LD 0-100bp) (because we consider derived alleles, positive-LD and negative-LD SNP pairs may yield very different results). We further determined that SNP pairs with shared functions had stronger effect correlations that spanned longer genomic distances, e.g., -0.37±0.08 for low-frequency positive-LD same-gene promoter SNP pairs (average genomic distance of 47kb (due to alternative splicing)) and -0.32±0.04 for low-frequency positive-LD H3K27ac 0-1kb SNP pairs. Consequently, SNP-heritability estimates were substantially smaller than estimates of the sum of causal effect size variances across all SNPs (ratio of 0.87±0.02 across diseases/traits), particularly for certain functional annotations (e.g., 0.78±0.01 for common Super enhancer SNPs)-even though these quantities are widely assumed to be equal. We recapitulated our findings via forward simulations with an evolutionary model involving stabilizing selection, implicating the action of linkage masking, whereby haplotypes containing linked SNPs with opposite effects on disease have reduced effects on fitness and escape negative selection.
Collapse
Affiliation(s)
- Martin Jinye Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Arun Durvasula
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Colby Chiang
- Department of Pediatrics, Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA
| | - Evan M Koch
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Benjamin J Strober
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Alison R Barton
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Samuel S Kim
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Po-Ru Loh
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Steven Gazal
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
- Department of Quantitative and Computational Biology, University of Southern California
- Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California
| | - Shamil Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
22
|
Maldonado BL, Piqué DG, Kaplan RC, Claw KG, Gignoux CR. Genetic risk prediction in Hispanics/Latinos: milestones, challenges, and social-ethical considerations. J Community Genet 2023; 14:543-553. [PMID: 37962783 PMCID: PMC10725387 DOI: 10.1007/s12687-023-00686-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 10/18/2023] [Indexed: 11/15/2023] Open
Abstract
Genome-wide association studies (GWAS) have allowed the identification of disease-associated variants, which can be leveraged to build polygenic scores (PGSs). Even though PGSs can be a valuable tool in personalized medicine, their predictive power is limited in populations of non-European ancestry, particularly in admixed populations. Recent efforts have focused on increasing racial and ethnic diversity in GWAS, thus, addressing some of the limitations of genetic risk prediction in these populations. Even with these efforts, few studies focus exclusively on Hispanics/Latinos. Additionally, Hispanic/Latino populations are often considered a single population despite varying admixture proportions between and within ethnic groups, diverse genetic heterogeneity, and demographic history. Combined with highly heterogeneous environmental and socioeconomic exposures, this diversity can reduce the transferability of genetic risk prediction models. Given the recent increase of genomic studies that include Hispanics/Latinos, we review the milestones and efforts that focus on genetic risk prediction, summarize the potential for improving PGS transferability, and highlight the challenges yet to be addressed. Additionally, we summarize social-ethical considerations and provide ideas to promote genetic risk prediction models that can be implemented equitably.
Collapse
Affiliation(s)
- Betzaida L Maldonado
- Human Medical Genetics & Genomics Graduate Program, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA.
- Colorado Center for Personalized Medicine, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA.
- Department of Biomedical Informatics, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA.
| | - Daniel G Piqué
- Colorado Center for Personalized Medicine, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
- Section of Genetics and Metabolism, Department of Pediatrics, Children's Hospital Colorado, Aurora, CO, USA
| | - Robert C Kaplan
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Katrina G Claw
- Human Medical Genetics & Genomics Graduate Program, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
- Colorado Center for Personalized Medicine, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
- Department of Biomedical Informatics, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
| | - Christopher R Gignoux
- Human Medical Genetics & Genomics Graduate Program, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
- Colorado Center for Personalized Medicine, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
- Department of Biomedical Informatics, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
23
|
Zhai S, Mehrotra DV, Shen J. Applying polygenic risk score methods to pharmacogenomics GWAS: challenges and opportunities. Brief Bioinform 2023; 25:bbad470. [PMID: 38152980 PMCID: PMC10782924 DOI: 10.1093/bib/bbad470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/20/2023] [Accepted: 11/28/2023] [Indexed: 12/29/2023] Open
Abstract
Polygenic risk scores (PRSs) have emerged as promising tools for the prediction of human diseases and complex traits in disease genome-wide association studies (GWAS). Applying PRSs to pharmacogenomics (PGx) studies has begun to show great potential for improving patient stratification and drug response prediction. However, there are unique challenges that arise when applying PRSs to PGx GWAS beyond those typically encountered in disease GWAS (e.g. Eurocentric or trans-ethnic bias). These challenges include: (i) the lack of knowledge about whether PGx or disease GWAS/variants should be used in the base cohort (BC); (ii) the small sample sizes in PGx GWAS with corresponding low power and (iii) the more complex PRS statistical modeling required for handling both prognostic and predictive effects simultaneously. To gain insights in this landscape about the general trends, challenges and possible solutions, we first conduct a systematic review of both PRS applications and PRS method development in PGx GWAS. To further address the challenges, we propose (i) a novel PRS application strategy by leveraging both PGx and disease GWAS summary statistics in the BC for PRS construction and (ii) a new Bayesian method (PRS-PGx-Bayesx) to reduce Eurocentric or cross-population PRS prediction bias. Extensive simulations are conducted to demonstrate their advantages over existing PRS methods applied in PGx GWAS. Our systematic review and methodology research work not only highlights current gaps and key considerations while applying PRS methods to PGx GWAS, but also provides possible solutions for better PGx PRS applications and future research.
Collapse
Affiliation(s)
- Song Zhai
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, PA 19454, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| |
Collapse
|
24
|
Veller C, Przeworski M, Coop G. Causal interpretations of family GWAS in the presence of heterogeneous effects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.13.566950. [PMID: 38014124 PMCID: PMC10680648 DOI: 10.1101/2023.11.13.566950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Family-based genome-wide association studies (GWAS) have emerged as a gold standard for assessing causal effects of alleles and polygenic scores. Notably, family studies are often claimed to provide an unbiased estimate of the average causal effect (or average treatment effect; ATE) of an allele, on the basis of an analogy between the random transmission of alleles from parents to children and a randomized controlled trial. Here, we show that this interpretation does not hold in general. Because Mendelian segregation only randomizes alleles among children of heterozygotes, the effects of alleles in the children of homozygotes are not observable. Consequently, if an allele has different average effects in the children of homozygotes and heterozygotes, as can arise in the presence of gene-by-environment interactions, gene-by-gene interactions, or differences in LD patterns, family studies provide a biased estimate of the average effect in the sample. At a single locus, family-based association studies can be thought of as providing an unbiased estimate of the average effect in the children of heterozygotes (i.e., a local average treatment effect; LATE). This interpretation does not extend to polygenic scores, however, because different sets of SNPs are heterozygous in each family. Therefore, other than under specific conditions, the within-family regression slope of a PGS cannot be assumed to provide an unbiased estimate for any subset or weighted average of families. Instead, family-based studies can be reinterpreted as enabling an unbiased estimate of the extent to which Mendelian segregation at loci in the PGS contributes to the population-level variance in the trait. Because this estimate does not include the between-family variance, however, this interpretation applies to only (roughly) half of the sample PGS variance. In practice, the potential biases of a family-based GWAS are likely smaller than those arising from confounding in a standard, population-based GWAS, and so family studies remain important for the dissection of genetic contributions to phenotypic variation. Nonetheless, the causal interpretation of family-based GWAS estimates is less straightforward than has been widely appreciated.
Collapse
Affiliation(s)
- Carl Veller
- Department of Ecology and Evolution, University of Chicago
| | - Molly Przeworski
- Department of Biological Sciences, Columbia University
- Department of Systems Biology, Columbia University
| | - Graham Coop
- Center for Population Biology and Department of Evolution and Ecology, University of California, Davis
| |
Collapse
|
25
|
Du Z, Iyyanki T, Lessard S, Chao M, Asbrand C, Nassar D, Klinger K, de Rinaldis E, Khader S, Chatelain C. Genome-wide association study analysis of disease severity in Acne reveals novel biological insights. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.11.13.23298473. [PMID: 38014089 PMCID: PMC10680891 DOI: 10.1101/2023.11.13.23298473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Acne vulgaris is a common skin disease that affects >85% of teenage young adults among which >8% develop severe lesions that leaves permanent scars. Genetic heritability studies of acne in twin cohorts have estimated that the heritability for acne is 80%. Previous genome-wide association studies (GWAS) have identified 50 genetic loci associated with increased risk of developing acne when compared to healthy individuals. However only a few studies have investigated genetic association with disease severity. GWAS of disease progression may provide a more effective approach to unveil potential disease modifying therapeutic targets. Here, we performed a multi-ethnic GWAS analysis to capture disease severity in acne patients by using individuals with normal acne as a control. Our cohort consists of a total of 2,956 participants, including 290 severe acne cases and 930 normal acne controls from FinnGen, and 522 cases and 1,214 controls from BioVU. We also performed mendelian randomization (MR), colocalization analyses and transcriptome-wide association study (TWAS) to identify putative causal genes. Lastly, we performed gene-set enrichment analysis using MAGMA to implicate biological pathways that drive disease severity in Acne. We identified two new loci associated with acne severity at the genome-wide significance level, six novel associated genes by MR, colocalization and TWAS analyses, including genes CDC7, SLC7A1, ADAM23, TTLL10, CDK20 and DNAJA4 , and 5 novel pathways by MAGMA analyses. Our study suggests that the etiologies of acne susceptibility and severity have limited overlap, with only 26% of known acne risk loci presenting nominal association with acne severity and none of the novel severity associated genes reported as associated with acne risk in previous GWAS.
Collapse
|
26
|
Janardhanan M, Sen S, Shankarappa B, Purushottam M. Molecular genetics of neuropsychiatric illness: some musings. Front Genet 2023; 14:1203017. [PMID: 38028602 PMCID: PMC10646253 DOI: 10.3389/fgene.2023.1203017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 10/16/2023] [Indexed: 12/01/2023] Open
Abstract
Research into the genetic underpinnings of neuropsychiatric illness has occurred at many levels. As more information accumulates, it appears that many approaches may each offer their unique perspective. The search for low penetrance and common variants, that may mediate risk, has necessitated the formation of many international consortia, to pool resources, and achieve the large sample sizes needed to discover these variants. There has been the parallel development of statistical methods to analyse large datasets and present summary statistics which allows data comparison across studies. Even so, the results of studies on well-characterised clinical datasets of modest sizes can be enlightening and provide important clues to understanding these complex disorders. We describe the use of common variants, at multiallelic loci like TOMM40 and APOE to study dementia, weighted genetic risk scores for alcohol-induced liver cirrhosis and whole exome sequencing to identify rare variants in genes like PLA2G6 in familial psychoses and schizophrenia in our Indian population.
Collapse
Affiliation(s)
| | | | | | - Meera Purushottam
- Molecular Genetics Laboratory, Department of Psychiatry, National Institute of Mental Health and Neurosciences, Bengaluru, India
| |
Collapse
|
27
|
Fatumo S, Sathan D, Samtal C, Isewon I, Tamuhla T, Soremekun C, Jafali J, Panji S, Tiffin N, Fakim YJ. Polygenic risk scores for disease risk prediction in Africa: current challenges and future directions. Genome Med 2023; 15:87. [PMID: 37904243 PMCID: PMC10614359 DOI: 10.1186/s13073-023-01245-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 10/12/2023] [Indexed: 11/01/2023] Open
Abstract
Early identification of genetic risk factors for complex diseases can enable timely interventions and prevent serious outcomes, including mortality. While the genetics underlying many Mendelian diseases have been elucidated, it is harder to predict risk for complex diseases arising from the combined effects of many genetic variants with smaller individual effects on disease aetiology. Polygenic risk scores (PRS), which combine multiple contributing variants to predict disease risk, have the potential to influence the implementation for precision medicine. However, the majority of existing PRS were developed from European data with limited transferability to African populations. Notably, African populations have diverse genetic backgrounds, and a genomic architecture with smaller haplotype blocks compared to European genomes. Subsequently, growing evidence shows that using large-scale African ancestry cohorts as discovery for PRS development may generate more generalizable findings. Here, we (1) discuss the factors contributing to the poor transferability of PRS in African populations, (2) showcase the novel Africa genomic datasets for PRS development, (3) explore the potential clinical utility of PRS in African populations, and (4) provide insight into the future of PRS in Africa.
Collapse
Affiliation(s)
- Segun Fatumo
- The African Computational Genomics (TACG) Research Group, MRC/UVRI and LSHTM, Entebbe, Uganda.
- H3Africa Bioinformatics Network (H3ABioNet) Node, Centre for Genomics Research and Innovation, NABDA/FMST, Abuja, Nigeria.
- Department of Non-Communicable Disease Epidemiology (NCDE), London School of Hygiene and Tropical Medicine, Keppel St, London, WC1E 7HT, UK.
| | - Dassen Sathan
- H3Africa Bioinformatics Network (H3ABioNet) Node, University of Mauritius, Reduit, Mauritius
| | - Chaimae Samtal
- Laboratory of Biotechnology, Environment, Agri-Food and Health, Faculty of Sciences Dhar El Mahraz-Sidi Mohammed Ben Abdellah University, 30000, Fez, Morocco
| | - Itunuoluwa Isewon
- Department of Computer and Information Sciences, Covenant University, P. M. B. 1023, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Km 10 Idiroko Road, P.M.B. 1023, Ota, Ogun State, Nigeria
- Covenant Applied Informatics and Communication African Centre of Excellence (CApIC-ACE), Covenant University, P.M.B. 1023, Ota, Ogun State, Nigeria
| | - Tsaone Tamuhla
- Division of Computational Biology, Integrative Biomedical Sciences Department, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa
| | - Chisom Soremekun
- The African Computational Genomics (TACG) Research Group, MRC/UVRI and LSHTM, Entebbe, Uganda
- H3Africa Bioinformatics Network (H3ABioNet) Node, Centre for Genomics Research and Innovation, NABDA/FMST, Abuja, Nigeria
- Department of Immunology and Molecular Biology, College of Health Science, Makerere University, Kampala, Uganda
| | - James Jafali
- Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi
- Clinical Infection, Microbiology & Immunology, The University of Liverpool, Liverpool, UK
| | - Sumir Panji
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa
| | - Nicki Tiffin
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa
| | | |
Collapse
|
28
|
Wang Y, Kanai M, Tan T, Kamariza M, Tsuo K, Yuan K, Zhou W, Okada Y, Huang H, Turley P, Atkinson EG, Martin AR. Polygenic prediction across populations is influenced by ancestry, genetic architecture, and methodology. CELL GENOMICS 2023; 3:100408. [PMID: 37868036 PMCID: PMC10589629 DOI: 10.1016/j.xgen.2023.100408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 06/21/2023] [Accepted: 08/22/2023] [Indexed: 10/24/2023]
Abstract
Polygenic risk scores (PRSs) developed from multi-ancestry genome-wide association studies (GWASs), PRSmulti, hold promise for improving PRS accuracy and generalizability across populations. To establish best practices for leveraging the increasing diversity of genomic studies, we investigated how various factors affect the performance of PRSmulti compared with PRSs constructed from single-ancestry GWASs (PRSsingle). Through extensive simulations and empirical analyses, we showed that PRSmulti overall outperformed PRSsingle in understudied populations, except when the understudied population represented a small proportion of the multi-ancestry GWAS. Furthermore, integrating PRSs based on local ancestry-informed GWASs and large-scale, European-based PRSs improved predictive performance in understudied African populations, especially for less polygenic traits with large-effect ancestry-enriched variants. Our work highlights the importance of diversifying genomic studies to achieve equitable PRS performance across ancestral populations and provides guidance for developing PRSs from multiple studies.
Collapse
Affiliation(s)
- Ying Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Taotao Tan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | - Kristin Tsuo
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kai Yuan
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Wei Zhou
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Center for Infectious Disease Education and Research (CiDER), and Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita 565-0871, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo 113-0033, Japan
| | - the BioBank Japan Project
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Society of Fellows, Harvard University, Cambridge, MA 02138, USA
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Center for Infectious Disease Education and Research (CiDER), and Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita 565-0871, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo 113-0033, Japan
- Department of Economics, and Center for Economic and Social Research, University of Southern California, Los Angeles, CA, USA
| | - Hailiang Huang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Patrick Turley
- Department of Economics, and Center for Economic and Social Research, University of Southern California, Los Angeles, CA, USA
| | - Elizabeth G. Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Alicia R. Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
29
|
van Duijvenboden S, Ramírez J, Young WJ, Olczak KJ, Ahmed F, Alhammadi MJAY, Bell CG, Morris AP, Munroe PB. Integration of genetic fine-mapping and multi-omics data reveals candidate effector genes for hypertension. Am J Hum Genet 2023; 110:1718-1734. [PMID: 37683633 PMCID: PMC10577090 DOI: 10.1016/j.ajhg.2023.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 08/11/2023] [Accepted: 08/11/2023] [Indexed: 09/10/2023] Open
Abstract
Genome-wide association studies of blood pressure (BP) have identified >1,000 loci, but the effector genes and biological pathways at these loci are mostly unknown. Using published association summary statistics, we conducted annotation-informed fine-mapping incorporating tissue-specific chromatin segmentation and colocalization to identify causal variants and candidate effector genes for systolic BP, diastolic BP, and pulse pressure. We observed 532 distinct signals associated with ≥2 BP traits and 84 with all three. For >20% of signals, a single variant accounted for >75% posterior probability, 65 were missense variants in known (SLC39A8, ADRB2, and DBH) and previously unreported BP candidate genes (NRIP1 and MMP14). In disease-relevant tissues, we colocalized >80 and >400 distinct signals for each BP trait with cis-eQTLs and regulatory regions from promoter capture Hi-C, respectively. Integrating mouse, human disorder, gene expression and tissue abundance data, and literature review, we provide consolidated evidence for 436 BP candidate genes for future functional validation and discover several potential drug targets.
Collapse
Affiliation(s)
- Stefan van Duijvenboden
- William Harvey Research Institute, Barts and the London Faculty of Medicine and Dentistry, Queen Mary University of London, EC1M 6BQ London, UK; Institute of Cardiovascular Science, University College London, London, UK; Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Julia Ramírez
- William Harvey Research Institute, Barts and the London Faculty of Medicine and Dentistry, Queen Mary University of London, EC1M 6BQ London, UK; Aragon Institute of Engineering Research, University of Zaragoza, Zaragoza, Spain; Centro de Investigación Biomédica en Red - Bioingeniería, Biomateriales y Nanomedicina, Zaragoza, Spain
| | - William J Young
- William Harvey Research Institute, Barts and the London Faculty of Medicine and Dentistry, Queen Mary University of London, EC1M 6BQ London, UK; Barts Heart Centre, St Bartholomew's Hospital, EC1A 7BE London, UK
| | - Kaya J Olczak
- William Harvey Research Institute, Barts and the London Faculty of Medicine and Dentistry, Queen Mary University of London, EC1M 6BQ London, UK
| | - Farah Ahmed
- William Harvey Research Institute, Barts and the London Faculty of Medicine and Dentistry, Queen Mary University of London, EC1M 6BQ London, UK
| | | | - Christopher G Bell
- William Harvey Research Institute, Barts and the London Faculty of Medicine and Dentistry, Queen Mary University of London, EC1M 6BQ London, UK
| | - Andrew P Morris
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, The University of Manchester, Manchester, UK; National Institute of Health and Care Research, Manchester Biomedical Research Centre, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK.
| | - Patricia B Munroe
- William Harvey Research Institute, Barts and the London Faculty of Medicine and Dentistry, Queen Mary University of London, EC1M 6BQ London, UK; National Institute of Health and Care Research, Barts Cardiovascular Biomedical Research Centre, Queen Mary University of London, EC1M 6BQ London, UK.
| |
Collapse
|
30
|
Campos AI, Namba S, Lin SC, Nam K, Sidorenko J, Wang H, Kamatani Y, Wang LH, Lee S, Lin YF, Feng YCA, Okada Y, Visscher PM, Yengo L. Boosting the power of genome-wide association studies within and across ancestries by using polygenic scores. Nat Genet 2023; 55:1769-1776. [PMID: 37723263 DOI: 10.1038/s41588-023-01500-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 08/14/2023] [Indexed: 09/20/2023]
Abstract
Genome-wide association studies (GWASs) have been mostly conducted in populations of European ancestry, which currently limits the transferability of their findings to other populations. Here, we show, through theory, simulations and applications to real data, that adjustment of GWAS analyses for polygenic scores (PGSs) increases the statistical power for discovery across all ancestries. We applied this method to analyze seven traits available in three large biobanks with participants of East Asian ancestry (n = 340,000 in total) and report 139 additional associations across traits. We also present a two-stage meta-analysis strategy whereby, in contributing cohorts, a PGS-adjusted GWAS is rerun using PGSs derived from a first round of a standard meta-analysis. On average, across traits, this approach yields a 1.26-fold increase in the number of detected associations (range 1.07- to 1.76-fold increase). Altogether, our study demonstrates the value of using PGSs to increase the power of GWASs in underrepresented populations and promotes such an analytical strategy for future GWAS meta-analyses.
Collapse
Affiliation(s)
- Adrian I Campos
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia.
| | - Shinichi Namba
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Shu-Chin Lin
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli, Taiwan
| | - Kisung Nam
- Graduate School of Data Science, Seoul National University, Seoul, Republic of Korea
| | - Julia Sidorenko
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Huanwei Wang
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Yoichiro Kamatani
- Laboratory of Complex Trait Genomics, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Ling-Hua Wang
- Institute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Seunggeun Lee
- Graduate School of Data Science, Seoul National University, Seoul, Republic of Korea
| | - Yen-Feng Lin
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli, Taiwan
| | - Yen-Chen Anne Feng
- Institute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, Taipei, Taiwan
- Division of Biostatistics and Data Science, Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Genome Informatics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan
- Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita, Japan
| | - Peter M Visscher
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Loic Yengo
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia.
| |
Collapse
|
31
|
Zhao W, Zhang Z, Wang Z, Ma P, Pan Y, Wang Q, Zhang Z. Factors affecting the accuracy of genomic prediction in joint pig populations. Animal 2023; 17:100980. [PMID: 37797495 DOI: 10.1016/j.animal.2023.100980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 08/28/2023] [Accepted: 08/31/2023] [Indexed: 10/07/2023] Open
Abstract
Genomic prediction (GP) has greatly advanced animal and plant breeding over the past two decades. GP in joint populations is a feasible method to improve the accuracy of genomic estimated breeding values in small populations. However, there is still a need to understand the factors that influence GP in joint populations. This study used simulated data and real data from Duroc pig populations to examine the impact of linkage disequilibrium (LD), causal variants effect sizes (CVESs), and minor allele frequencies (MAF) of SNPs on the accuracy of genomic prediction in joint populations. Three prediction methods were used: genomic best linear unbiased prediction (GBLUP), single-step GBLUP and multi-trait GBLUP. Results from the simulated datasets showed that the accuracies of GP in joint populations were always higher than those in a single population when only LD inconsistencies existed. However, single-step GBLUP accuracy in joint populations decreased as the correlation of MAF between populations decreased, while the accuracy of GBLUP is consistently higher in joint populations than in a single population. As the correlation of CVES between populations decreased, the accuracy of both GBLUP and single-step GBLUP in joint populations declined. Analysis of real Duroc populations showed low genetic correlation, similar to the simulated relationship between the most distant populations. In most cases in Duroc populations, GP have higher accuracies in joint populations than in individual population. In conclusion, the consistency of CVES plays a more important role in multi-population GP. The genetic relatedness of the Duroc populations is so weak that the prediction accuracy of GP in joint populations is reduced in some traits. Multi-trait GBLUP is a competitive method for the joint breeding evaluation.
Collapse
Affiliation(s)
- Wei Zhao
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiaotong University, 800# Dongchuan Road, Shang, East 200240, China
| | - Zhenyang Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, East 310058, China
| | - Zhen Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, East 310058, China
| | - Peipei Ma
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiaotong University, 800# Dongchuan Road, Shang, East 200240, China
| | - Yuchun Pan
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, East 310058, China; Hainan Institute, Zhejiang University, Yongyou Industrial Park, Yazhou Bay Sci-Tech City, Sanya 572000, China
| | - Qishan Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, East 310058, China; Hainan Institute, Zhejiang University, Yongyou Industrial Park, Yazhou Bay Sci-Tech City, Sanya 572000, China
| | - Zhe Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, East 310058, China.
| |
Collapse
|
32
|
Zhang H, Zhan J, Jin J, Zhang J, Lu W, Zhao R, Ahearn TU, Yu Z, O'Connell J, Jiang Y, Chen T, Okuhara D, Garcia-Closas M, Lin X, Koelsch BL, Chatterjee N. A new method for multiancestry polygenic prediction improves performance across diverse populations. Nat Genet 2023; 55:1757-1768. [PMID: 37749244 PMCID: PMC10923245 DOI: 10.1038/s41588-023-01501-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 08/16/2023] [Indexed: 09/27/2023]
Abstract
Polygenic risk scores (PRSs) increasingly predict complex traits; however, suboptimal performance in non-European populations raise concerns about clinical applications and health inequities. We developed CT-SLEB, a powerful and scalable method to calculate PRSs, using ancestry-specific genome-wide association study summary statistics from multiancestry training samples, integrating clumping and thresholding, empirical Bayes and superlearning. We evaluated CT-SLEB and nine alternative methods with large-scale simulated genome-wide association studies (~19 million common variants) and datasets from 23andMe, Inc., the Global Lipids Genetics Consortium, All of Us and UK Biobank, involving 5.1 million individuals of diverse ancestry, with 1.18 million individuals from four non-European populations across 13 complex traits. Results demonstrated that CT-SLEB significantly improves PRS performance in non-European populations compared with simple alternatives, with comparable or superior performance to a recent, computationally intensive method. Moreover, our simulation studies offered insights into sample size requirements and SNP density effects on multiancestry risk prediction.
Collapse
Affiliation(s)
- Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | | | - Jin Jin
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Jingning Zhang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Wenxuan Lu
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA
| | - Ruzhang Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Thomas U Ahearn
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Zhi Yu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Tony Chen
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Montserrat Garcia-Closas
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
- Division of Genetics and Epidemiology, Institute of Cancer Research, London, UK
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | | | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
33
|
Yang X, Kar S, Antoniou AC, Pharoah PDP. Polygenic scores in cancer. Nat Rev Cancer 2023; 23:619-630. [PMID: 37479830 DOI: 10.1038/s41568-023-00599-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/12/2023] [Indexed: 07/23/2023]
Abstract
Since the publication of the first genome-wide association study for cancer in 2007, thousands of common alleles that are associated with the risk of cancer have been identified. The relative risk associated with individual variants is small and of limited clinical significance. However, the combined effect of multiple risk variants as captured by polygenic scores (PGSs) may be much greater and therefore provide risk discrimination that is clinically useful. We review the considerable research efforts over the past 15 years for developing statistical methods for PGSs and their application in large-scale genome-wide association studies to develop PGSs for various cancers. We review the predictive performance of these PGSs and the multiple challenges currently limiting the clinical application of PGSs. Despite this, PGSs are beginning to be incorporated into clinical multifactorial risk prediction models to stratify risk in both clinical trials and clinical implementation studies.
Collapse
Affiliation(s)
- Xin Yang
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Siddhartha Kar
- MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
- Early Cancer Institute, Department of Oncology, University of Cambridge, Cambridge, UK
| | - Antonis C Antoniou
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Paul D P Pharoah
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
| |
Collapse
|
34
|
Blanc J, Berg JJ. Testing for differences in polygenic scores in the presence of confounding. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.12.532301. [PMID: 36993707 PMCID: PMC10055004 DOI: 10.1101/2023.03.12.532301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Polygenic scores have become an important tool in human genetics, enabling the prediction of individuals' phenotypes from their genotypes. Understanding how the pattern of differences in polygenic score predictions across individuals intersects with variation in ancestry can provide insights into the evolutionary forces acting on the trait in question, and is important for understanding health disparities. However, because most polygenic scores are computed using effect estimates from population samples, they are susceptible to confounding by both genetic and environmental effects that are correlated with ancestry. The extent to which this confounding drives patterns in the distribution of polygenic scores depends on patterns of population structure in both the original estimation panel and in the prediction/test panel. Here, we use theory from population and statistical genetics, together with simulations, to study the procedure of testing for an association between polygenic scores and axes of ancestry variation in the presence of confounding. We use a general model of genetic relatedness to describe how confounding in the estimation panel biases the distribution of polygenic scores in a way that depends on the degree of overlap in population structure between panels. We then show how this confounding can bias tests for associations between polygenic scores and important axes of ancestry variation in the test panel. Finally, we use the understanding gained from this analysis to develop a method that uses patterns of genetic similarity between the two panels to guard against these biases, and show that this method can provide better protection against confounding than the standard PCA-based approach.
Collapse
Affiliation(s)
- Jennifer Blanc
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Jeremy J. Berg
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| |
Collapse
|
35
|
Gao Y, Sharma T, Cui Y. Addressing the Challenge of Biomedical Data Inequality: An Artificial Intelligence Perspective. Annu Rev Biomed Data Sci 2023; 6:153-171. [PMID: 37104653 PMCID: PMC10529864 DOI: 10.1146/annurev-biodatasci-020722-020704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Artificial intelligence (AI) and other data-driven technologies hold great promise to transform healthcare and confer the predictive power essential to precision medicine. However, the existing biomedical data, which are a vital resource and foundation for developing medical AI models, do not reflect the diversity of the human population. The low representation in biomedical data has become a significant health risk for non-European populations, and the growing application of AI opens a new pathway for this health risk to manifest and amplify. Here we review the current status of biomedical data inequality and present a conceptual framework for understanding its impacts on machine learning. We also discuss the recent advances in algorithmic interventions for mitigating health disparities arising from biomedical data inequality. Finally, we briefly discuss the newly identified disparity in data quality among ethnic groups and its potential impacts on machine learning.
Collapse
Affiliation(s)
- Yan Gao
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| | - Teena Sharma
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| | - Yan Cui
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| |
Collapse
|
36
|
Hou K, Xu Z, Ding Y, Harpak A, Pasaniuc B. Calibrated prediction intervals for polygenic scores across diverse contexts. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.24.23293056. [PMID: 37546999 PMCID: PMC10402211 DOI: 10.1101/2023.07.24.23293056] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields from agriculture to personalized medicine. We analyze data from two large biobanks in the US (All of Us) and the UK (UK Biobank) to find widespread variability in PGS performance across contexts. Many contexts, including age, sex, and income, impact PGS accuracies with similar magnitudes as genetic ancestry. PGSs trained in single versus multi-ancestry cohorts show similar context-specificity in their accuracies. We introduce trait prediction intervals that are allowed to vary across contexts as a principled approach to account for context-specific PGS accuracy in genomic prediction. We model the impact of all contexts in a joint framework to enable PGS-based trait predictions that are well-calibrated (contain the trait value with 90% probability in all contexts), whereas methods that ignore context are mis-calibrated. We show that prediction intervals need to be adjusted for all considered traits ranging from 10% for diastolic blood pressure to 80% for waist circumference. Adjustment of prediction intervals depends on the dataset; for example, prediction intervals for education years need to be adjusted by 90% in All of Us versus 8% in UK Biobank. Our results provide a path forward towards utilization of PGS as a prediction tool across all individuals regardless of their contexts while highlighting the importance of comprehensive profile of context information in study design and data collection.
Collapse
Affiliation(s)
- Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ziqi Xu
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Arbel Harpak
- Department of Population Health, The University of Texas at Austin, Austin, TX, USA
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, Los Angeles
| |
Collapse
|
37
|
Raben TG, Lello L, Widen E, Hsu SDH. Biobank-scale methods and projections for sparse polygenic prediction from machine learning. Sci Rep 2023; 13:11662. [PMID: 37468507 PMCID: PMC10356957 DOI: 10.1038/s41598-023-37580-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 06/23/2023] [Indexed: 07/21/2023] Open
Abstract
In this paper we characterize the performance of linear models trained via widely-used sparse machine learning algorithms. We build polygenic scores and examine performance as a function of training set size, genetic ancestral background, and training method. We show that predictor performance is most strongly dependent on size of training data, with smaller gains from algorithmic improvements. We find that LASSO generally performs as well as the best methods, judged by a variety of metrics. We also investigate performance characteristics of predictors trained on one genetic ancestry group when applied to another. Using LASSO, we develop a novel method for projecting AUC and correlation as a function of data size (i.e., for new biobanks) and characterize the asymptotic limit of performance. Additionally, for LASSO (compressed sensing) we show that performance metrics and predictor sparsity are in agreement with theoretical predictions from the Donoho-Tanner phase transition. Specifically, a future predictor trained in the Taiwan Precision Medicine Initiative for asthma can achieve an AUC of [Formula: see text] and for height a correlation of [Formula: see text] for a Taiwanese population. This is above the measured values of [Formula: see text] and [Formula: see text], respectively, for UK Biobank trained predictors applied to a European population.
Collapse
Affiliation(s)
- Timothy G Raben
- Department of Physics and Astronomy, Michigan State University, Michigan, USA.
| | - Louis Lello
- Department of Physics and Astronomy, Michigan State University, Michigan, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| | - Erik Widen
- Department of Physics and Astronomy, Michigan State University, Michigan, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| | - Stephen D H Hsu
- Department of Physics and Astronomy, Michigan State University, Michigan, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| |
Collapse
|
38
|
Bocher O, Gilly A, Park YC, Zeggini E, Morris AP. Bridging the diversity gap: Analytical and study design considerations for improving the accuracy of trans-ancestry genetic prediction. HGG ADVANCES 2023; 4:100214. [PMID: 37448981 PMCID: PMC10336686 DOI: 10.1016/j.xhgg.2023.100214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 06/12/2023] [Indexed: 07/18/2023] Open
Abstract
Genetic prediction of common complex disease risk is an essential component of precision medicine. Currently, genome-wide association studies (GWASs) are mostly composed of European-ancestry samples and resulting polygenic scores (PGSs) have been shown to poorly transfer to other ancestries partly due to heterogeneity of allelic effects between populations. Fixed-effects (FETA) and random-effects (RETA) trans-ancestry meta-analyses do not model such ancestry-related heterogeneity, while ancestry-specific (AS) scores may suffer from low power due to low sample sizes. In contrast, trans-ancestry meta-regression (TAMR) builds ancestry-aware PGS that account for more complex trans-ancestry architectures. Here, we examine the predictive performance of these four PGSs under multiple genetic architectures and ancestry configurations. We show that the predictive performance of FETA and RETA is strongly affected by cross-ancestry genetic heterogeneity, while AS PGS performance decreases in under-represented target populations. TAMR PGS is also impacted by heterogeneity but maintains good prediction performance in most situations, especially in ancestry-diverse scenarios. In simulations of human complex traits, TAMR scores currently explain 25% more phenotypic variance than AS in triglyceride levels and 33% more phenotypic variance than FETA in type 2 diabetes in most non-European populations. Importantly, a high proportion of non-European-ancestry individuals is needed to reach prediction levels that are comparable in those populations to the one observed in European-ancestry studies. Our results highlight the need to rebalance the ancestral composition of GWAS to enable accurate prediction in non-European-ancestry groups, and demonstrate the relevance of meta-regression approaches for compensating some of the current population biases in GWAS.
Collapse
Affiliation(s)
| | | | | | - Eleftheria Zeggini
- ITG, Helmholtz Zentrum München, Munich, Germany
- Technical University of Munich, Munich, Germany
- Klinikum Rechts der Isar, Munich, Germany
| | - Andrew P. Morris
- ITG, Helmholtz Zentrum München, Munich, Germany
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, University of Manchester, Manchester, UK
| |
Collapse
|
39
|
Wu Y, Goleva SB, Breidenbach LB, Kim M, MacGregor S, Gandal MJ, Davis LK, Wray NR. 150 risk variants for diverticular disease of intestine prioritize cell types and enable polygenic prediction of disease susceptibility. CELL GENOMICS 2023; 3:100326. [PMID: 37492107 PMCID: PMC10363821 DOI: 10.1016/j.xgen.2023.100326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 03/11/2023] [Accepted: 04/20/2023] [Indexed: 07/27/2023]
Abstract
We conducted a genome-wide association study (GWAS) analysis of diverticular disease (DivD) of intestine within 724,372 individuals and identified 150 independent genome-wide significant DNA variants. Integration of the GWAS results with human gut single-cell RNA sequencing data implicated gut myocyte, mesothelial and stromal cells, and enteric neurons and glia in DivD development. Ninety-five genes were prioritized based on multiple lines of evidence, including SLC9A3, a drug target gene of tenapanor used for the treatment of the constipation subtype of irritable bowel syndrome. A DivD polygenic score (PGS) enables effective risk prediction (area under the curve [AUC], 0.688; 95% confidence interval [CI], 0.645-0.732) and the top 20% PGS was associated with ∼3.6-fold increased DivD risk relative to the remaining population. Our statistical and bioinformatic analyses suggest that the mechanism of DivD is through colon structure, gut motility, gastrointestinal mucus, and ionic homeostasis. Our analyses reinforce the link between gastrointestinal disorders and the enteric nervous system through genetics.
Collapse
Affiliation(s)
- Yeda Wu
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
- QIMR Berghofer Medical Research Institute, Brisbane, QLD 4029, Australia
| | - Slavina B. Goleva
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Lindsay B. Breidenbach
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Minsoo Kim
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Program in Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Stuart MacGregor
- QIMR Berghofer Medical Research Institute, Brisbane, QLD 4029, Australia
| | - Michael J. Gandal
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Program in Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Lea K. Davis
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Department of Psychiatry and Behavioural Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University, 511-A Light Hall, 2215 Garland Avenue, Nashville, TN 37232, USA
| | - Naomi R. Wray
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
40
|
Lehmann B, Mackintosh M, McVean G, Holmes C. Optimal strategies for learning multi-ancestry polygenic scores vary across traits. Nat Commun 2023; 14:4023. [PMID: 37419925 PMCID: PMC10328935 DOI: 10.1038/s41467-023-38930-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 05/22/2023] [Indexed: 07/09/2023] Open
Abstract
Polygenic scores (PGSs) are individual-level measures that aggregate the genome-wide genetic predisposition to a given trait. As PGS have predominantly been developed using European-ancestry samples, trait prediction using such European ancestry-derived PGS is less accurate in non-European ancestry individuals. Although there has been recent progress in combining multiple PGS trained on distinct populations, the problem of how to maximize performance given a multiple-ancestry cohort is largely unexplored. Here, we investigate the effect of sample size and ancestry composition on PGS performance for fifteen traits in UK Biobank. For some traits, PGS estimated using a relatively small African-ancestry training set outperformed, on an African-ancestry test set, PGS estimated using a much larger European-ancestry only training set. We observe similar, but not identical, results when considering other minority-ancestry groups within UK Biobank. Our results emphasise the importance of targeted data collection from underrepresented groups in order to address existing disparities in PGS performance.
Collapse
Affiliation(s)
- Brieuc Lehmann
- Department of Statistical Science, University College London, London, UK.
| | | | - Gil McVean
- Big Data Institute, University of Oxford, Oxford, UK
| | - Chris Holmes
- The Alan Turing Institute, London, UK
- Big Data Institute, University of Oxford, Oxford, UK
- Department of Statistics, University of Oxford, Oxford, UK
| |
Collapse
|
41
|
Ding Y, Hou K, Xu Z, Pimplaskar A, Petter E, Boulier K, Privé F, Vilhjálmsson BJ, Olde Loohuis LM, Pasaniuc B. Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature 2023; 618:774-781. [PMID: 37198491 PMCID: PMC10284707 DOI: 10.1038/s41586-023-06079-4] [Citation(s) in RCA: 51] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 04/12/2023] [Indexed: 05/19/2023]
Abstract
Polygenic scores (PGSs) have limited portability across different groupings of individuals (for example, by genetic ancestries and/or social determinants of health), preventing their equitable use1-3. PGS portability has typically been assessed using a single aggregate population-level statistic (for example, R2)4, ignoring inter-individual variation within the population. Here, using a large and diverse Los Angeles biobank5 (ATLAS, n = 36,778) along with the UK Biobank6 (UKBB, n = 487,409), we show that PGS accuracy decreases individual-to-individual along the continuum of genetic ancestries7 in all considered populations, even within traditionally labelled 'homogeneous' genetic ancestries. The decreasing trend is well captured by a continuous measure of genetic distance (GD) from the PGS training data: Pearson correlation of -0.95 between GD and PGS accuracy averaged across 84 traits. When applying PGS models trained on individuals labelled as white British in the UKBB to individuals with European ancestries in ATLAS, individuals in the furthest GD decile have 14% lower accuracy relative to the closest decile; notably, the closest GD decile of individuals with Hispanic Latino American ancestries show similar PGS performance to the furthest GD decile of individuals with European ancestries. GD is significantly correlated with PGS estimates themselves for 82 of 84 traits, further emphasizing the importance of incorporating the continuum of genetic ancestries in PGS interpretation. Our results highlight the need to move away from discrete genetic ancestry clusters towards the continuum of genetic ancestries when considering PGSs.
Collapse
Affiliation(s)
- Yi Ding
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
| | - Kangcheng Hou
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Ziqi Xu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Aditya Pimplaskar
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Ella Petter
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Kristin Boulier
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Florian Privé
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark
| | - Bjarni J Vilhjálmsson
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute, Cambridge, MA, USA
| | - Loes M Olde Loohuis
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Institute for Precision Health, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
42
|
Trastulla L, Moser S, Jiménez-Barrón LT, Andlauer TF, von Scheidt M, Budde M, Heilbronner U, Papiol S, Teumer A, Homuth G, Falkai P, Völzke H, Dörr M, Schulze TG, Gagneur J, Iorio F, Müller-Myhsok B, Schunkert H, Ziller MJ. Distinct genetic liability profiles define clinically relevant patient strata across common diseases. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.10.23289788. [PMID: 37214898 PMCID: PMC10197798 DOI: 10.1101/2023.05.10.23289788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Genome-wide association studies have unearthed a wealth of genetic associations across many complex diseases. However, translating these associations into biological mechanisms contributing to disease etiology and heterogeneity has been challenging. Here, we hypothesize that the effects of disease-associated genetic variants converge onto distinct cell type specific molecular pathways within distinct subgroups of patients. In order to test this hypothesis, we develop the CASTom-iGEx pipeline to operationalize individual level genotype data to interpret personal polygenic risk and identify the genetic basis of clinical heterogeneity. The paradigmatic application of this approach to coronary artery disease and schizophrenia reveals a convergence of disease associated variant effects onto known and novel genes, pathways, and biological processes. The biological process specific genetic liabilities are not equally distributed across patients. Instead, they defined genetically distinct groups of patients, characterized by different profiles across pathways, endophenotypes, and disease severity. These results provide further evidence for a genetic contribution to clinical heterogeneity and point to the existence of partially distinct pathomechanisms across patient subgroups. Thus, the universally applicable approach presented here has the potential to constitute an important component of future personalized medicine concepts.
Collapse
Affiliation(s)
- Lucia Trastulla
- Max Planck Institute of Psychiatry, Munich, Germany
- Technische Universität München Medical Graduate Center Experimental Medicine, Munich, Germany
- Human Technopole, Milan, Italy
| | - Sylvain Moser
- Max Planck Institute of Psychiatry, Munich, Germany
- Technische Universität München Medical Graduate Center Experimental Medicine, Munich, Germany
- International Max Planck Research School for Translational Psychiatry (IMPRS-TP), Munich, Germany
| | - Laura T. Jiménez-Barrón
- Max Planck Institute of Psychiatry, Munich, Germany
- International Max Planck Research School for Translational Psychiatry (IMPRS-TP), Munich, Germany
| | | | - Moritz von Scheidt
- Klinik für Herz-und Kreislauferkrankungen, Deutsches Herzzentrum München, Technical University Munich, Munich, Germany
- German Center for Cardiovascular Research (DZHK), Partner Site Munich Heart Alliance, Munich, Germany
| | | | - Monika Budde
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich 80336, Germany
| | - Urs Heilbronner
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich 80336, Germany
| | - Sergi Papiol
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich 80336, Germany
| | - Alexander Teumer
- German Center for Cardiovascular Research (DZHK), Partner Site Greifswald, Greifswald, Germany
- Institute of Community Medicine, University Medicine Greifswald, Greifswald, Germany
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Greifswald, Germany
| | - Georg Homuth
- Interfaculty Institute for Genetics and Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - Peter Falkai
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, Munich 80336, Germany
| | - Henry Völzke
- German Center for Cardiovascular Research (DZHK), Partner Site Greifswald, Greifswald, Germany
- Institute of Community Medicine, University Medicine Greifswald, Greifswald, Germany
| | - Marcus Dörr
- Department of Internal Medicine B, University Medicine Greifswald, Greifswald, Germany
- German Center for Cardiovascular Research (DZHK), Partner Site Greifswald, Greifswald, Germany
| | - Thomas G. Schulze
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich 80336, Germany
| | - Julien Gagneur
- Department of Informatics, Technical University of Munich, Garching, Germany
| | | | - Bertram Müller-Myhsok
- Max Planck Institute of Psychiatry, Munich, Germany
- Institute of Translational Medicine, University of Liverpool, Liverpool, UK
| | - Heribert Schunkert
- Klinik für Herz-und Kreislauferkrankungen, Deutsches Herzzentrum München, Technical University Munich, Munich, Germany
- German Center for Cardiovascular Research (DZHK), Partner Site Munich Heart Alliance, Munich, Germany
| | - Michael J. Ziller
- Max Planck Institute of Psychiatry, Munich, Germany
- Department of Psychiatry, University of Münster, Münster, Germany
- Center for Soft Nanoscience, University of Münster, Münster, Germany
| |
Collapse
|
43
|
Zhu C, Ming MJ, Cole JM, Edge MD, Kirkpatrick M, Harpak A. Amplification is the primary mode of gene-by-sex interaction in complex human traits. CELL GENOMICS 2023; 3:100297. [PMID: 37228747 PMCID: PMC10203050 DOI: 10.1016/j.xgen.2023.100297] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 12/15/2022] [Accepted: 03/13/2023] [Indexed: 05/27/2023]
Abstract
Sex differences in complex traits are suspected to be in part due to widespread gene-by-sex interactions (GxSex), but empirical evidence has been elusive. Here, we infer the mixture of ways in which polygenic effects on physiological traits covary between males and females. We find that GxSex is pervasive but acts primarily through systematic sex differences in the magnitude of many genetic effects ("amplification") rather than in the identity of causal variants. Amplification patterns account for sex differences in trait variance. In some cases, testosterone may mediate amplification. Finally, we develop a population-genetic test linking GxSex to contemporary natural selection and find evidence of sexually antagonistic selection on variants affecting testosterone levels. Our results suggest that amplification of polygenic effects is a common mode of GxSex that may contribute to sex differences and fuel their evolution.
Collapse
Affiliation(s)
- Carrie Zhu
- Department of Population Health, The University of Texas at Austin, Austin, TX, USA
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Matthew J. Ming
- Department of Population Health, The University of Texas at Austin, Austin, TX, USA
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Jared M. Cole
- Department of Population Health, The University of Texas at Austin, Austin, TX, USA
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Michael D. Edge
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Mark Kirkpatrick
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Arbel Harpak
- Department of Population Health, The University of Texas at Austin, Austin, TX, USA
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
44
|
Ahern J, Thompson W, Fan CC, Loughnan R. Comparing Pruning and Thresholding with Continuous Shrinkage Polygenic Score Methods in a Large Sample of Ancestrally Diverse Adolescents from the ABCD Study ®. Behav Genet 2023; 53:292-309. [PMID: 37017779 PMCID: PMC10655749 DOI: 10.1007/s10519-023-10139-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 02/28/2023] [Indexed: 04/06/2023]
Abstract
Using individuals' genetic data researchers can generate Polygenic Scores (PS) that are able to predict risk for diseases, variability in different behaviors as well as anthropomorphic measures. This is achieved by leveraging models learned from previously published large Genome-Wide Association Studies (GWASs) associating locations in the genome with a phenotype of interest. Previous GWASs have predominantly been performed in European ancestry individuals. This is of concern as PS generated in samples with a different ancestry to the original training GWAS have been shown to have lower performance and limited portability, and many efforts are now underway to collect genetic databases on individuals of diverse ancestries. In this study, we compare multiple methods of generating PS, including pruning and thresholding and Bayesian continuous shrinkage models, to determine which of them is best able to overcome these limitations. To do this we use the ABCD Study, a longitudinal cohort with deep phenotyping on individuals of diverse ancestry. We generate PS for anthropometric and psychiatric phenotypes using previously published GWAS summary statistics and examine their performance in three subsamples of ABCD: African ancestry individuals (n = 811), European ancestry Individuals (n = 6703), and admixed ancestry individuals (n = 3664). We find that the single ancestry continuous shrinkage method, PRScs (CS), and the multi ancestry meta method, PRScsx Meta (CSx Meta), show the best performance across ancestries and phenotypes.
Collapse
Affiliation(s)
- Jonathan Ahern
- Department of Cognitive Science, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA.
- Center for Human Development, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92161, USA.
| | - Wesley Thompson
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, 9500 Gilman Drive, La Jolla, San Diego, CA, 92161, USA
- Center for Population Neuroscience and Genetics, Laureate Institute for Brain Research, Tulsa, OK, 74103, USA
| | - Chun Chieh Fan
- Center for Population Neuroscience and Genetics, Laureate Institute for Brain Research, Tulsa, OK, 74103, USA
- Department of Radiology, University of California, San Diego School of Medicine, 9500 Gilman Drive, La Jolla, CA, 92037, USA
| | - Robert Loughnan
- Department of Cognitive Science, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
- Center for Human Development, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92161, USA
| |
Collapse
|
45
|
Majara L, Kalungi A, Koen N, Tsuo K, Wang Y, Gupta R, Nkambule LL, Zar H, Stein DJ, Kinyanda E, Atkinson EG, Martin AR. Low and differential polygenic score generalizability among African populations due largely to genetic diversity. HGG ADVANCES 2023; 4:100184. [PMID: 36873096 PMCID: PMC9982687 DOI: 10.1016/j.xhgg.2023.100184] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 02/04/2023] [Indexed: 02/15/2023] Open
Abstract
African populations are vastly underrepresented in genetic studies but have the most genetic variation and face wide-ranging environmental exposures globally. Because systematic evaluations of genetic prediction had not yet been conducted in ancestries that span African diversity, we calculated polygenic risk scores (PRSs) in simulations across Africa and in empirical data from South Africa, Uganda, and the United Kingdom to better understand the generalizability of genetic studies. PRS accuracy improves with ancestry-matched discovery cohorts more than from ancestry-mismatched studies. Within ancestrally and ethnically diverse South African individuals, we find that PRS accuracy is low for all traits but varies across groups. Differences in African ancestries contribute more to variability in PRS accuracy than other large cohort differences considered between individuals in the United Kingdom versus Uganda. We computed PRS in African ancestry populations using existing European-only versus ancestrally diverse genetic studies; the increased diversity produced the largest accuracy gains for hemoglobin concentration and white blood cell count, reflecting large-effect ancestry-enriched variants in genes known to influence sickle cell anemia and the allergic response, respectively. Differences in PRS accuracy across African ancestries originating from diverse regions are as large as across out-of-Africa continental ancestries, requiring commensurate nuance.
Collapse
Affiliation(s)
- Lerato Majara
- Global Initiative for Neuropsychiatric Genetics Education in Research (GINGER), Harvard T.H. Chan School of Public Health, Department of Epidemiology, Boston, MA, USA
- MRC Human Genetics Research Unit, Division of Human Genetics, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Observatory 7925, South Africa
- Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, South Africa
| | - Allan Kalungi
- Global Initiative for Neuropsychiatric Genetics Education in Research (GINGER), Harvard T.H. Chan School of Public Health, Department of Epidemiology, Boston, MA, USA
- Department of Psychiatry, College of Health Sciences, Makerere University, Kampala, Uganda
- Department of Psychiatry, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
- Mental Health Project, Medical Research Council/Uganda Virus Research Institute (MRC/UVRI) & London School of Hygiene and Tropical Medicine (LSHTM), Uganda Research Unit, Entebbe, Uganda
| | - Nastassja Koen
- Global Initiative for Neuropsychiatric Genetics Education in Research (GINGER), Harvard T.H. Chan School of Public Health, Department of Epidemiology, Boston, MA, USA
- Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, South Africa
- South African Medical Research Council (SAMRC) Unit on Risk and Resilience in Mental Disorders, Cape Town, South Africa
| | - Kristin Tsuo
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA
| | - Ying Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Rahul Gupta
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA
| | - Lethukuthula L. Nkambule
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Heather Zar
- Department of Paediatrics and Child Health, Red Cross Children’s Hospital and Medical Research Council Unit on Child and Adolescent Health, University of Cape Town, Cape Town, South Africa
| | - Dan J. Stein
- Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, South Africa
- South African Medical Research Council (SAMRC) Unit on Risk and Resilience in Mental Disorders, Cape Town, South Africa
| | - Eugene Kinyanda
- Mental Health Project, Medical Research Council/Uganda Virus Research Institute (MRC/UVRI) & London School of Hygiene and Tropical Medicine (LSHTM), Uganda Research Unit, Entebbe, Uganda
| | - Elizabeth G. Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Alicia R. Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
46
|
Adam Y, Sadeeq S, Kumuthini J, Ajayi O, Wells G, Solomon R, Ogunlana O, Adetiba E, Iweala E, Brors B, Adebiyi E. Polygenic Risk Score in African populations: progress and challenges. F1000Res 2023; 11:175. [PMID: 37273966 PMCID: PMC10233318 DOI: 10.12688/f1000research.76218.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/10/2023] [Indexed: 06/06/2023] Open
Abstract
Polygenic Risk Score (PRS) analysis is a method that predicts the genetic risk of an individual towards targeted traits. Even when there are no significant markers, it gives evidence of a genetic effect beyond the results of Genome-Wide Association Studies (GWAS). Moreover, it selects single nucleotide polymorphisms (SNPs) that contribute to the disease with low effect size making it more precise at individual level risk prediction. PRS analysis addresses the shortfall of GWAS by taking into account the SNPs/alleles with low effect size but play an indispensable role to the observed phenotypic/trait variance. PRS analysis has applications that investigate the genetic basis of several traits, which includes rare diseases. However, the accuracy of PRS analysis depends on the genomic data of the underlying population. For instance, several studies show that obtaining higher prediction power of PRS analysis is challenging for non-Europeans. In this manuscript, we review the conventional PRS methods and their application to sub-Saharan African communities. We conclude that lack of sufficient GWAS data and tools is the limiting factor of applying PRS analysis to sub-Saharan populations. We recommend developing Africa-specific PRS methods and tools for estimating and analyzing African population data for clinical evaluation of PRSs of interest and predicting rare diseases.
Collapse
Affiliation(s)
- Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Suraju Sadeeq
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Judit Kumuthini
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Olabode Ajayi
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Gordon Wells
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Rotimi Solomon
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Olubanke Ogunlana
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Emmanuel Adetiba
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Electrical & Information Engineering (EIE), Covenant University, Ota, Ogun State, 112212, Nigeria
- HRA, Institute for Systems Science, Durban University of Technology, Durban, South Africa
| | - Emeka Iweala
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Benedikt Brors
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| |
Collapse
|
47
|
Adam Y, Sadeeq S, Kumuthini J, Ajayi O, Wells G, Solomon R, Ogunlana O, Adetiba E, Iweala E, Brors B, Adebiyi E. Polygenic Risk Score in African populations: progress and challenges. F1000Res 2023; 11:175. [PMID: 37273966 PMCID: PMC10233318 DOI: 10.12688/f1000research.76218.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/10/2023] [Indexed: 11/23/2023] Open
Abstract
Polygenic Risk Score (PRS) analysis is a method that predicts the genetic risk of an individual towards targeted traits. Even when there are no significant markers, it gives evidence of a genetic effect beyond the results of Genome-Wide Association Studies (GWAS). Moreover, it selects single nucleotide polymorphisms (SNPs) that contribute to the disease with low effect size making it more precise at individual level risk prediction. PRS analysis addresses the shortfall of GWAS by taking into account the SNPs/alleles with low effect size but play an indispensable role to the observed phenotypic/trait variance. PRS analysis has applications that investigate the genetic basis of several traits, which includes rare diseases. However, the accuracy of PRS analysis depends on the genomic data of the underlying population. For instance, several studies show that obtaining higher prediction power of PRS analysis is challenging for non-Europeans. In this manuscript, we review the conventional PRS methods and their application to sub-Saharan African communities. We conclude that lack of sufficient GWAS data and tools is the limiting factor of applying PRS analysis to sub-Saharan populations. We recommend developing Africa-specific PRS methods and tools for estimating and analyzing African population data for clinical evaluation of PRSs of interest and predicting rare diseases.
Collapse
Affiliation(s)
- Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Suraju Sadeeq
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Judit Kumuthini
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Olabode Ajayi
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Gordon Wells
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Rotimi Solomon
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Olubanke Ogunlana
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Emmanuel Adetiba
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Electrical & Information Engineering (EIE), Covenant University, Ota, Ogun State, 112212, Nigeria
- HRA, Institute for Systems Science, Durban University of Technology, Durban, South Africa
| | - Emeka Iweala
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Benedikt Brors
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| |
Collapse
|
48
|
Hou K, Ding Y, Xu Z, Wu Y, Bhattacharya A, Mester R, Belbin GM, Buyske S, Conti DV, Darst BF, Fornage M, Gignoux C, Guo X, Haiman C, Kenny EE, Kim M, Kooperberg C, Lange L, Manichaikul A, North KE, Peters U, Rasmussen-Torvik LJ, Rich SS, Rotter JI, Wheeler HE, Wojcik GL, Zhou Y, Sankararaman S, Pasaniuc B. Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals. Nat Genet 2023; 55:549-558. [PMID: 36941441 PMCID: PMC11120833 DOI: 10.1038/s41588-023-01338-6] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 02/16/2023] [Indexed: 03/23/2023]
Abstract
Individuals of admixed ancestries (for example, African Americans) inherit a mosaic of ancestry segments (local ancestry) originating from multiple continental ancestral populations. This offers the unique opportunity of investigating the similarity of genetic effects on traits across ancestries within the same population. Here we introduce an approach to estimate correlation of causal genetic effects (radmix) across local ancestries and analyze 38 complex traits in African-European admixed individuals (N = 53,001) to observe very high correlations (meta-analysis radmix = 0.95, 95% credible interval 0.93-0.97), much higher than correlation of causal effects across continental ancestries. We replicate our results using regression-based methods from marginal genome-wide association study summary statistics. We also report realistic scenarios where regression-based methods yield inflated heterogeneity-by-ancestry due to ancestry-specific tagging of causal effects, and/or polygenicity. Our results motivate genetic analyses that assume minimal heterogeneity in causal effects by ancestry, with implications for the inclusion of ancestry-diverse individuals in studies.
Collapse
Affiliation(s)
- Kangcheng Hou
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
| | - Yi Ding
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Ziqi Xu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Yue Wu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Rachel Mester
- Graduate Program in Biomathematics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Gillian M Belbin
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Steve Buyske
- Department of Statistics, Rutgers University, Piscataway, NJ, USA
| | - David V Conti
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Burcu F Darst
- Division of Public Health Science, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Myriam Fornage
- Brown Foundation Institute for Molecular Medicine, The University of Texas Health Science Center, Houston, TX, USA
| | - Chris Gignoux
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado, Denver, CO, USA
| | - Xiuqing Guo
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Lundquist Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Christopher Haiman
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Michelle Kim
- Division of Public Health Science, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Charles Kooperberg
- Division of Public Health Science, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Leslie Lange
- Department of Medicine, University of Colorado, Aurora, CO, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Kari E North
- Department of Statistics, Rutgers University, Piscataway, NJ, USA
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Ulrike Peters
- Division of Public Health Science, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Laura J Rasmussen-Torvik
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Stephen S Rich
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Lundquist Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Heather E Wheeler
- Department of Biology, Loyola University Chicago, Chicago, IL, USA
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, USA
| | - Genevieve L Wojcik
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - Ying Zhou
- Division of Public Health Science, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Sriram Sankararaman
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
- Department of Computer Science, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
| |
Collapse
|
49
|
Breedon JR, Marshall CR, Giovannoni G, van Heel DA, Dobson R, Jacobs BM. Polygenic risk score prediction of multiple sclerosis in individuals of South Asian ancestry. Brain Commun 2023; 5:fcad041. [PMID: 37006331 PMCID: PMC10053643 DOI: 10.1093/braincomms/fcad041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 10/12/2022] [Accepted: 02/21/2023] [Indexed: 02/24/2023] Open
Abstract
Polygenic risk scores aggregate an individual's burden of risk alleles to estimate the overall genetic risk for a specific trait or disease. Polygenic risk scores derived from genome-wide association studies of European populations perform poorly for other ancestral groups. Given the potential for future clinical utility, underperformance of polygenic risk scores in South Asian populations has the potential to reinforce health inequalities. To determine whether European-derived polygenic risk scores underperform at multiple sclerosis prediction in a South Asian-ancestry population compared with a European-ancestry cohort, we used data from two longitudinal genetic cohort studies: Genes & Health (2015-present), a study of ∼50 000 British-Bangladeshi and British-Pakistani individuals, and UK Biobank (2006-present), which is comprised of ∼500 000 predominantly White British individuals. We compared individuals with and without multiple sclerosis in both studies (Genes & Health: N Cases = 42, N Control = 40 490; UK Biobank: N Cases = 2091, N Control = 374 866). Polygenic risk scores were calculated using clumping and thresholding with risk allele effect sizes obtained from the largest multiple sclerosis genome-wide association study to date. Scores were calculated with and without the major histocompatibility complex region, the most influential locus in determining multiple sclerosis risk. Polygenic risk score prediction was evaluated using Nagelkerke's pseudo-R 2 metric adjusted for case ascertainment, age, sex and the first four genetic principal components. We found that, as expected, European-derived polygenic risk scores perform poorly in the Genes & Health cohort, explaining 1.1% (including the major histocompatibility complex) and 1.5% (excluding the major histocompatibility complex) of disease risk. In contrast, multiple sclerosis polygenic risk scores explained 4.8% (including the major histocompatibility complex) and 2.8% (excluding the major histocompatibility complex) of disease risk in European-ancestry UK Biobank participants. These findings suggest that polygenic risk score prediction of multiple sclerosis based on European genome-wide association study results is less accurate in a South Asian population. Genetic studies of ancestrally diverse populations are required to ensure that polygenic risk scores can be useful across ancestries.
Collapse
Affiliation(s)
- Joshua R Breedon
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London EC1M 6BQ, UK
| | - Charles R Marshall
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London EC1M 6BQ, UK
- Department of Neurology, Royal London Hospital, London E1 1FR, UK
| | - Gavin Giovannoni
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London EC1M 6BQ, UK
- Department of Neurology, Royal London Hospital, London E1 1FR, UK
- Blizard Institute, Queen Mary University of London, London E1 2AT, UK
| | - David A van Heel
- Blizard Institute, Queen Mary University of London, London E1 2AT, UK
| | - Ruth Dobson
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London EC1M 6BQ, UK
- Department of Neurology, Royal London Hospital, London E1 1FR, UK
| | - Benjamin M Jacobs
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London EC1M 6BQ, UK
- Department of Neurology, Royal London Hospital, London E1 1FR, UK
| |
Collapse
|
50
|
Hoggart C, Choi SW, García-González J, Souaiaia T, Preuss M, O'Reilly P. BridgePRS : A powerful trans-ancestry Polygenic Risk Score method. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.17.528938. [PMID: 36865148 PMCID: PMC9979992 DOI: 10.1101/2023.02.17.528938] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/26/2023]
Abstract
Polygenic Risk Scores (PRS) have huge potential to contribute to biomedical research and to a future of precision medicine, but to date their calculation relies largely on Europeanancestry GWAS data. This global bias makes most PRS substantially less accurate in individuals of non-European ancestry. Here we present BridgePRS , a novel Bayesian PRS method that leverages shared genetic effects across ancestries to increase the accuracy of PRS in non-European populations. The performance of BridgePRS is evaluated in simulated data and real UK Biobank (UKB) data across 19 traits in African, South Asian and East Asian ancestry individuals, using both UKB and Biobank Japan GWAS summary statistics. BridgePRS is compared to the leading alternative, PRS-CSx , and two single-ancestry PRS methods adapted for trans-ancestry prediction. PRS trained in the UK Biobank are then validated out-of-cohort in the independent Mount Sinai (New York) Bio Me Biobank. Simulations reveal that BridgePRS performance, relative to PRS-CSx , increases as uncertainty increases: with lower heritability, higher polygenicity, greater between-population genetic diversity, and when causal variants are not present in the data. Our simulation results are consistent with real data analyses in which BridgePRS has better predictive accuracy in African ancestry samples, especially in out-of-cohort prediction (into Bio Me ), which shows a 60% boost in mean R 2 compared to PRS-CSx ( P = 2 × 10 -6 ). BridgePRS performs the full PRS analysis pipeline, is computationally efficient, and is a powerful method for deriving PRS in diverse and under-represented ancestry populations.
Collapse
|