1
|
German J, Cordioli M, Tozzo V, Urbut S, Arumäe K, Smit RAJ, Lee J, Li JH, Janucik A, Ding Y, Akinkuolie A, Heyne HO, Eoli A, Saad C, Al-Sarraj Y, Abdel-Latif R, Mohammed S, Hail MA, Barry A, Wang Z, Cajuso T, Corbetta A, Natarajan P, Ripatti S, Philippakis A, Szczerbinski L, Pasaniuc B, Kutalik Z, Mbarek H, Loos RJF, Vainik U, Ganna A. Association between plausible genetic factors and weight loss from GLP1-RA and bariatric surgery. Nat Med 2025:10.1038/s41591-025-03645-3. [PMID: 40251273 DOI: 10.1038/s41591-025-03645-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Accepted: 03/07/2025] [Indexed: 04/20/2025]
Abstract
Obesity is a major public health challenge. Glucagon-like peptide-1 receptor agonists (GLP1-RA) and bariatric surgery (BS) are effective weight loss interventions; however, the genetic factors influencing treatment response remain largely unexplored. Moreover, most previous studies have focused on race and ethnicity rather than genetic ancestry. Here we analyzed 10,960 individuals from 9 multiancestry biobank studies across 6 countries to assess the impact of known genetic factors on weight loss. Between 6 and 12 months, GLP1-RA users had an average weight change of -3.93% or -6.00%, depending on the outcome definition, with modest ancestry-based differences. BS patients experienced -21.17% weight change between 6 and 48 months. We found no significant associations between GLP1-RA-induced weight loss and polygenic scores for body mass index or type 2 diabetes, nor with missense variants in GLP1R. A higher body mass index polygenic score was modestly linked to lower weight loss after BS (+0.7% per s.d., P = 1.24 × 10-4), but the effect attenuated in sensitivity analyses. Our findings suggest known genetic factors have limited impact on GLP1-RA effectiveness with respect to weight change and confirm treatment efficacy across ancestry groups.
Collapse
Affiliation(s)
- Jakob German
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mattia Cordioli
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Veronica Tozzo
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Sarah Urbut
- Division of Cardiovascular Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Kadri Arumäe
- Institute of Psychology, University of Tartu, Tartu, Estonia
| | - Roelof A J Smit
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Environmental Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jiwoo Lee
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Josephine H Li
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Diabetes Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Adrian Janucik
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Digital Medicine, Medical University of Bialystok, Bialystok, Poland
| | - Yi Ding
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Akintunde Akinkuolie
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Henrike O Heyne
- Hasso Plattner Institute, University of Potsdam, Potsdam, Germany
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Andrea Eoli
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Chadi Saad
- Qatar Genome Program, Qatar Precision Health Institute, Qatar Foundation, Doha, Qatar
| | - Yasser Al-Sarraj
- Qatar Genome Program, Qatar Precision Health Institute, Qatar Foundation, Doha, Qatar
| | - Rania Abdel-Latif
- Qatar Genome Program, Qatar Precision Health Institute, Qatar Foundation, Doha, Qatar
| | - Shaban Mohammed
- Department of Pharmacy, Hamad Medical Corporation, Doha, Qatar
| | - Moza Al Hail
- Department of Pharmacy, Hamad Medical Corporation, Doha, Qatar
| | - Alexandra Barry
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Zhe Wang
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Tatiana Cajuso
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Department of Pathology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA
| | - Andrea Corbetta
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Health Data Science Centre, Human Technopole, Milan, Italy
- MOX - Laboratory for Modeling and Scientific Computing, Department of Mathematics, Politecnico di Milano, Milan, Italy
| | - Pradeep Natarajan
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Personalized Medicine, Mass General Brigham, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Samuli Ripatti
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Public Health, Clinicum, University of Helsinki, Helsinki, Finland
- Analytic & Translational Genetics Unit, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Anthony Philippakis
- Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lukasz Szczerbinski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Diabetes Unit, Massachusetts General Hospital, Boston, MA, USA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Endocrinology, Diabetology and Internal Medicine, Medical University of Bialystok, Bialystok, Poland
- Clinical Research Centre, Medical University of Bialystok, Bialystok, Poland
| | - Bogdan Pasaniuc
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Pennsylvania, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Institute of Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Zoltán Kutalik
- University Center for Primary Care and Public Health, Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Hamdi Mbarek
- Qatar Genome Program, Qatar Precision Health Institute, Qatar Foundation, Doha, Qatar
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Environmental Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Uku Vainik
- Institute of Psychology, University of Tartu, Tartu, Estonia
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
- Department of Neurology and Neurosurgery, McGill University, Montreal, Quebec, Canada
| | - Andrea Ganna
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland.
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
2
|
Pimplaskar A, Qiu J, Lapinska S, Tozzo V, Chiang JN, Pasaniuc B, Olde Loohuis LM. Inclusion bias affects common variant discovery and replication in a health-system linked biobank. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.04.04.25325131. [PMID: 40236437 PMCID: PMC11998835 DOI: 10.1101/2025.04.04.25325131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
Electronic Health Records (EHR) -linked biobanks have emerged as promising tools for precision medicine, enabling the integration of clinical and molecular data for individual risk assessment. Association studies performed in biobank studies can connect common genetic variation to clinical phenotypes, such as through the use of polygenic scores (PGS), which are starting to have utility in aiding clinician decision making. However, while biobanks aggregate large amounts of data effectively for such studies, most employ various opt-in consent protocols, and, as a result, are expected to be subject to participation and recruitment biases. The extent to which biases affect genetic analyses in biobanks remains unstudied. In this study, we quantify bias and evaluate its impact on genetic analyses, using the UCLA ATLAS Community Health Initiative as a case study. Our analyses reveal that a wide array of factors, particularly socio-demographic characteristics and healthcare utilization patterns, influence participation, effectively differentiating biobank participants from the broader patient population (AUROC = 0.85, AUPRC = 0.82). Through weighting the sample using inverse probability weights derived from probabilities of enrollment, we replicated 54% more known GWAS variants than models that did not take bias into account (e.g. associations between variants in the PPARG gene and type 2 diabetes). We further show that PGS-Phenome wide associations are affected by the weighting scheme, and suggest associations corroborated by weighted analyses to be more robust. Our results highlight that genetic analyses within biobanks should account for inclusion biases, and suggest inverse probability weighting as a potential approach.
Collapse
Affiliation(s)
- Aditya Pimplaskar
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, UCLA, Los Angeles, CA, USA
| | - Junqiong Qiu
- Department of Computational Medicine, UCLA, Los Angeles, CA, USA
| | - Sandra Lapinska
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, UCLA, Los Angeles, CA, USA
| | - Veronica Tozzo
- Department of Computational Medicine, UCLA, Los Angeles, CA, USA
| | - Jeffrey N Chiang
- Department of Computational Medicine, UCLA, Los Angeles, CA, USA
- Department of Neurosurgery, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| | - Bogdan Pasaniuc
- Department of Computational Medicine, UCLA, Los Angeles, CA, USA
- Department of Human Genetics, UCLA, Los Angeles, CA, USA
| | - Loes M Olde Loohuis
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, UCLA, Los Angeles, CA, USA
- Department of Human Genetics, UCLA, Los Angeles, CA, USA
| |
Collapse
|
3
|
Blair DR, Risch N. Reduced Penetrance is Common Among Predicted Loss-of-Function Variants and is Likely Driven by Residual Allelic Activity. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2024.09.23.24314008. [PMID: 39399029 PMCID: PMC11469360 DOI: 10.1101/2024.09.23.24314008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Loss-of-function genetic variants (LoFs) often result in severe phenotypes, including autosomal dominant diseases driven by haploinsufficiency. Due to low carrier frequencies, their penetrance is generally unknown but typically variable. Here, we investigate the penetrance of >6,000 predicted LoFs (pLoFs) linked to 91 haploinsufficient diseases using a cohort of ≈24,000 carriers with linked electronic health record data. We find evidence for widespread reduced penetrance, which persisted after accounting for variant annotation artifacts, missed diagnoses, and incomplete clinical data. We thus hypothesized that many pLoFs have incomplete penetrance, which may be driven by residual allelic activity. To test this, we trained machine learning models to predict pLoF penetrance using variant-specific genomic features that may correlate with incomplete loss-of-function. The models were predictive of pLoF penetrance across a range of diseases and variant types, including those with prior clinical evidence for pathogenicity. This suggests that many pLoFs have incomplete penetrance due to residual allelic activity, complicating disease prognostication in asymptomatic carriers.
Collapse
|
4
|
Chow RD, Nathanson KL, Parikh RB. Phenotypic evaluation of deep learning models for classifying germline variant pathogenicity. NPJ Precis Oncol 2024; 8:235. [PMID: 39427061 PMCID: PMC11490490 DOI: 10.1038/s41698-024-00710-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 09/16/2024] [Indexed: 10/21/2024] Open
Abstract
Deep learning models for predicting variant pathogenicity have not been thoroughly evaluated on real-world clinical phenotypes. Here, we apply state-of-the-art pathogenicity prediction models to hereditary breast cancer gene variants in UK Biobank participants. Model predictions for missense variants in BRCA1, BRCA2 and PALB2, but not ATM and CHEK2, were associated with breast cancer risk. However, deep learning models had limited clinical utility when specifically applied to variants of uncertain significance.
Collapse
Affiliation(s)
- Ryan D Chow
- Department of Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA.
| | - Katherine L Nathanson
- Basser Center for BRCA, Abramson Cancer Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Division of Translational Medicine and Human Genetics, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Ravi B Parikh
- Division of Health Policy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Center for Cancer Care Innovation, Abramson Cancer Center, Philadelphia, PA, USA
- Division of Hematology and Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA
| |
Collapse
|
5
|
Abramowitz SA, Boulier K, Keat K, Cardone KM, Shivakumar M, DePaolo J, Judy R, Kim D, Rader DJ, Ritchie MD, Voight BF, Pasaniuc B, Levin MG, Damrauer SM. Population Performance and Individual Agreement of Coronary Artery Disease Polygenic Risk Scores. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.07.25.24310931. [PMID: 39108513 PMCID: PMC11302700 DOI: 10.1101/2024.07.25.24310931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/12/2024]
Abstract
Importance Polygenic risk scores (PRSs) for coronary artery disease (CAD) are a growing clinical and commercial reality. Whether existing scores provide similar individual-level assessments of disease liability is a critical consideration for clinical implementation that remains uncharacterized. Objective Characterize the reliability of CAD PRSs that perform equivalently at the population level at predicting individual-level risk. Design Cross-sectional Study. Setting All of Us Research Program (AOU), Penn Medicine Biobank (PMBB), and UCLA ATLAS Precision Health Biobank. Participants Volunteers of diverse genetic backgrounds enrolled in AOU, PMBB, and UCLA with available electronic health record and genotyping data. Exposures Polygenic risk for CAD from previously published PRSs and new PRSs developed separately from the testing cohorts. Main Outcomes and Measures Sets of CAD PRSs that perform population prediction equivalently were identified by comparing calibration and discrimination (Brier score and AUROC) of generalized linear models of prevalent CAD using Bayesian analysis of variance. Among equivalently performing scores, individual-level agreement between risk estimates was tested with intraclass correlation (ICC) and Light's Kappa, measures of inter-rater reliability. Results 50 PRSs were calculated for 171,095 AOU participants. When included in a model of prevalent CAD, 48 scores had practically equivalent Brier scores and AUROCs (region of practical equivalence = 0.02). Across these scores, 84% of participants had at least one score in both the top and bottom risk quintile. Continuous agreement of individual risk predictions from the 48 scores was poor, with an ICC of 0.351 (95% CI; 0.349, 0.352). Agreement between two statistically equivalent scores was moderate, with an ICC of 0.649 (95% CI; 0.646, 0.652). Light's Kappa, used to evaluate consistency of assignment to high-risk thresholds, did not exceed 0.56 (interpreted as 'fair') across statistically and practically equivalent scores. Repeating the analysis among 41,193 PMBB and 50,748 UCLA participants yielded different sets of statistically and practically equivalent scores which also lacked strong individual agreement. Conclusions and Relevance Across three diverse biobanks, CAD PRSs that performed equivalently at the population level produced unreliable individual risk estimates. Approaches to clinical implementation of CAD PRSs must consider the potential for discordant individual risk estimates from otherwise indistinguishable scores.
Collapse
|
6
|
Jeong M, Pazokitoroudi A, Liu Z, Sankararaman S. Scalable summary-statistics-based heritability estimation method with individual genotype level accuracy. Genome Res 2024; 34:1286-1293. [PMID: 39038848 PMCID: PMC11529871 DOI: 10.1101/gr.279207.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 07/12/2024] [Indexed: 07/24/2024]
Abstract
SNP heritability, the proportion of phenotypic variation explained by genotyped SNPs, is an important parameter in understanding the genetic architecture underlying various diseases and traits. Methods that aim to estimate SNP heritability from individual genotype and phenotype data are limited by their ability to scale to Biobank-scale data sets and by the restrictions in access to individual-level data. These limitations have motivated the development of methods that only require summary statistics. Although the availability of publicly accessible summary statistics makes them widely applicable, these methods lack the accuracy of methods that utilize individual genotypes. Here we present a SUMmary-statistics-based Randomized Haseman-Elston regression (SUM-RHE), a method that can estimate the SNP heritability of complex phenotypes with accuracies comparable to approaches that require individual genotypes, while exclusively relying on summary statistics. SUM-RHE employs Genome-Wide Association Study (GWAS) summary statistics and statistics obtained on a reference population, which can be efficiently estimated and readily shared for public use. Our results demonstrate that SUM-RHE obtains estimates of SNP heritability that are substantially more accurate compared with other summary statistic methods and on par with methods that rely on individual-level data.
Collapse
Affiliation(s)
- Moonseong Jeong
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA;
| | - Ali Pazokitoroudi
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA
- Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Zhengtong Liu
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA;
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California 90095, USA
| |
Collapse
|
7
|
Fu M, Valiente-Banuet L, Wadhwa SS, Pasaniuc B, Vossel K, Chang TS. Improving genetic risk modeling of dementia from real-world data in underrepresented populations. Commun Biol 2024; 7:1049. [PMID: 39183196 PMCID: PMC11345412 DOI: 10.1038/s42003-024-06742-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 08/16/2024] [Indexed: 08/27/2024] Open
Abstract
Genetic risk modeling for dementia offers significant benefits, but studies based on real-world data, particularly for underrepresented populations, are limited. We employ an Elastic Net model for dementia risk prediction using single-nucleotide polymorphisms prioritized by functional genomic data from multiple neurodegenerative disease genome-wide association studies. We compare this model with APOE and polygenic risk score models across genetic ancestry groups (Hispanic Latino American sample: 610 patients with 126 cases; African American sample: 440 patients with 84 cases; East Asian American sample: 673 patients with 75 cases), using electronic health records from UCLA Health for discovery and the All of Us cohort for validation. Our model significantly outperforms other models across multiple ancestries, improving the area-under-precision-recall curve by 31-84% (Wilcoxon signed-rank test p-value <0.05) and the area-under-the-receiver-operating characteristic by 11-17% (DeLong test p-value <0.05) compared to the APOE and the polygenic risk score models. We identify shared and ancestry-specific risk genes and biological pathways, reinforcing and adding to existing knowledge. Our study highlights the benefits of integrating functional mapping, multiple neurodegenerative diseases, and machine learning for genetic risk models in diverse populations. Our findings hold potential for refining precision medicine strategies in dementia diagnosis.
Collapse
Affiliation(s)
- Mingzhou Fu
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Medical Informatics Home Area, Department of Bioinformatics, University of California, Los Angeles, Los Angeles, CA, 90024, USA
| | - Leopoldo Valiente-Banuet
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Satpal S Wadhwa
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Bogdan Pasaniuc
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, 90095, USA
| | - Keith Vossel
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Timothy S Chang
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
| |
Collapse
|
8
|
Huang J, Kleman N, Basu S, Shriver MD, Zaidi AA. Interpreting SNP heritability in admixed populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.04.551959. [PMID: 37577588 PMCID: PMC10418213 DOI: 10.1101/2023.08.04.551959] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
SNP heritabilityh s n p 2 is defined as the proportion of phenotypic variance explained by genotyped SNPs and is believed to be a lower bound of heritability (h 2 ), being equal to it if all causal variants are known. Despite the simple intuition behindh s n p 2 , its interpretation and equivalence toh 2 is unclear, particularly in the presence of population structure and assortative mating. It is well known that population structure can lead to inflation inh ˆ s n p 2 estimates because of confounding due to linkage disequilibrium (LD) or shared environment. Here we use analytical theory and simulations to demonstrate thath s n p 2 estimates can be biased in admixed populations, even in the absence of confounding and even if all causal variants are known. This is because admixture generates LD, which contributes to the genetic variance, and therefore to heritability. Genome-wide restricted maximum likelihood (GREML) does not capture this contribution leading to under- or over-estimates ofh s n p 2 relative toh 2 , depending on the genetic architecture. In contrast, Haseman-Elston (HE) regression exaggerates the LD contribution leading to biases in the opposite direction. For the same reason, GREML and HE estimates of local ancestry heritabilityh γ 2 are also biased. We describe this bias inh ˆ s n p 2 andh ˆ γ 2 as a function of admixture history and the genetic architecture of the trait and show that it can be recovered under some conditions. We clarify the interpretation ofh ˆ s n p 2 in admixed populations and discuss its implication for genome-wide association studies and polygenic prediction.
Collapse
Affiliation(s)
- Jinguo Huang
- Bioinformatics and Genomics, Huck Institutes of the Life Sciences, Pennsylvania State University
- Department of Anthropology, Pennsylvania State University
| | - Nicole Kleman
- Department of Genetics, Cell Biology, and Development, University of Minnesota
| | - Saonli Basu
- Department of Biostatistics, University of Minnesota
| | | | - Arslan A. Zaidi
- Department of Genetics, Cell Biology, and Development, University of Minnesota
- Institute of Health Informatics, University of Minnesota
| |
Collapse
|
9
|
Chang T, Fu M, Valiente-Banuet L, Wadhwa S, Pasaniuc B, Vossel K. Improving genetic risk modeling of dementia from real-world data in underrepresented populations. RESEARCH SQUARE 2024:rs.3.rs-3911508. [PMID: 38410460 PMCID: PMC10896371 DOI: 10.21203/rs.3.rs-3911508/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
BACKGROUND Genetic risk modeling for dementia offers significant benefits, but studies based on real-world data, particularly for underrepresented populations, are limited. METHODS We employed an Elastic Net model for dementia risk prediction using single-nucleotide polymorphisms prioritized by functional genomic data from multiple neurodegenerative disease genome-wide association studies. We compared this model with APOE and polygenic risk score models across genetic ancestry groups, using electronic health records from UCLA Health for discovery and All of Us cohort for validation. RESULTS Our model significantly outperforms other models across multiple ancestries, improving the area-under-precision-recall curve by 21-61% and the area-under-the-receiver-operating characteristic by 10-21% compared to the APOEand the polygenic risk score models. We identified shared and ancestry-specific risk genes and biological pathways, reinforcing and adding to existing knowledge. CONCLUSIONS Our study highlights benefits of integrating functional mapping, multiple neurodegenerative diseases, and machine learning for genetic risk models in diverse populations. Our findings hold potential for refining precision medicine strategies in dementia diagnosis.
Collapse
Affiliation(s)
- Timothy Chang
- David Geffen School of Medicine, University of California, Los Angeles
| | | | | | | | | | | |
Collapse
|
10
|
Fu M, Valiente-Banuet L, Wadhwa SS, UCLA Precision Health Data Discovery Repository Working Group, UCLA Precision Health ATLAS Working Group, Pasaniuc B, Vossel K, Chang TS. Improving genetic risk modeling of dementia from real-world data in underrepresented populations. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.05.24302355. [PMID: 38370649 PMCID: PMC10871463 DOI: 10.1101/2024.02.05.24302355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
BACKGROUND Genetic risk modeling for dementia offers significant benefits, but studies based on real-world data, particularly for underrepresented populations, are limited. METHODS We employed an Elastic Net model for dementia risk prediction using single-nucleotide polymorphisms prioritized by functional genomic data from multiple neurodegenerative disease genome-wide association studies. We compared this model with APOE and polygenic risk score models across genetic ancestry groups, using electronic health records from UCLA Health for discovery and All of Us cohort for validation. RESULTS Our model significantly outperforms other models across multiple ancestries, improving the area-under-precision-recall curve by 21-61% and the area-under-the-receiver-operating characteristic by 10-21% compared to the APOE and the polygenic risk score models. We identified shared and ancestry-specific risk genes and biological pathways, reinforcing and adding to existing knowledge. CONCLUSIONS Our study highlights benefits of integrating functional mapping, multiple neurodegenerative diseases, and machine learning for genetic risk models in diverse populations. Our findings hold potential for refining precision medicine strategies in dementia diagnosis.
Collapse
Affiliation(s)
- Mingzhou Fu
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, United States
- Medical Informatics Home Area, Department of Bioinformatics, University of California, Los Angeles, Los Angeles, CA, 90024, United States
| | - Leopoldo Valiente-Banuet
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, United States
| | - Satpal S. Wadhwa
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, United States
| | | | | | - Bogdan Pasaniuc
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, 90095, USA
| | - Keith Vossel
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, United States
| | - Timothy S. Chang
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, United States
| |
Collapse
|
11
|
Fu M, Tran T, Eskin E, Lajonchere C, Pasaniuc B, Geschwind DH, Vossel K, Chang TS. Multi-class Modeling Identifies Shared Genetic Risk for Late-onset Epilepsy and Alzheimer's Disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.05.24302353. [PMID: 38370677 PMCID: PMC10871371 DOI: 10.1101/2024.02.05.24302353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Background Previous studies have established a strong link between late-onset epilepsy (LOE) and Alzheimer's disease (AD). However, their shared genetic risk beyond the APOE gene remains unclear. Our study sought to examine the shared genetic factors of AD and LOE, interpret the biological pathways involved, and evaluate how AD onset may be mediated by LOE and shared genetic risks. Methods We defined phenotypes using phecodes mapped from diagnosis codes, with patients' records aged 60-90. A two-step Least Absolute Shrinkage and Selection Operator (LASSO) workflow was used to identify shared genetic variants based on prior AD GWAS integrated with functional genomic data. We calculated an AD-LOE shared risk score and used it as a proxy in a causal mediation analysis. We used electronic health records from an academic health center (UCLA Health) for discovery analyses and validated our findings in a multi-institutional EHR database (All of Us). Results The two-step LASSO method identified 34 shared genetic loci between AD and LOE, including the APOE region. These loci were mapped to 65 genes, which showed enrichment in molecular functions and pathways such as tau protein binding and lipoprotein metabolism. Individuals with high predicted shared risk scores have a higher risk of developing AD, LOE, or both in their later life compared to those with low-risk scores. LOE partially mediates the effect of AD-LOE shared genetic risk on AD (15% proportion mediated on average). Validation results from All of Us were consistent with findings from the UCLA sample. Conclusions We employed a machine learning approach to identify shared genetic risks of AD and LOE. In addition to providing substantial evidence for the significant contribution of the APOE-TOMM40-APOC1 gene cluster to shared risk, we uncovered novel genes that may contribute. Our study is one of the first to utilize All of Us genetic data to investigate AD, and provides valuable insights into the potential common and disease-specific mechanisms underlying AD and LOE, which could have profound implications for the future of disease prevention and the development of targeted treatment strategies to combat the co-occurrence of these two diseases.
Collapse
Affiliation(s)
- Mingzhou Fu
- Mary S. Easton Center for Alzheimer’s Research and Care, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
- Medical Informatics Home Area, Department of Bioinformatics, University of California, Los Angeles, CA 90095, USA
| | - Thai Tran
- Medical Informatics Home Area, Department of Bioinformatics, University of California, Los Angeles, CA 90095, USA
| | - Eleazar Eskin
- Department of Computational Medicine, University of California, Los Angeles, CA 90095, USA
| | - Clara Lajonchere
- Institute of Precision Health, University of California, Los Angeles, CA 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Bogdan Pasaniuc
- Department of Computational Medicine, University of California, Los Angeles, CA 90095, USA
| | - Daniel H. Geschwind
- Institute of Precision Health, University of California, Los Angeles, CA 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Keith Vossel
- Mary S. Easton Center for Alzheimer’s Research and Care, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Timothy S Chang
- Mary S. Easton Center for Alzheimer’s Research and Care, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
12
|
Venkateswaran V, Boulier K, Ding Y, Johnson R, Bhattacharya A, Pasaniuc B. Polygenic scores for tobacco use provide insights into systemic health risks in a diverse EHR-linked biobank in Los Angeles. Transl Psychiatry 2024; 14:38. [PMID: 38238290 PMCID: PMC10796315 DOI: 10.1038/s41398-024-02743-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 12/19/2023] [Accepted: 01/08/2024] [Indexed: 01/22/2024] Open
Abstract
Tobacco use is a major risk factor for many diseases and is heavily influenced by environmental factors with significant underlying genetic contributions. Here, we evaluated the predictive performance, risk stratification, and potential systemic health effects of tobacco use disorder (TUD) predisposing germline variants using a European- ancestry-derived polygenic score (PGS) in 24,202 participants from the multi-ancestry, hospital-based UCLA ATLAS biobank. Among genetically inferred ancestry groups (GIAs), TUD-PGS was significantly associated with TUD in European American (EA) (OR: 1.20, CI: [1.16, 1.24]), Hispanic/Latin American (HL) (OR:1.19, CI: [1.11, 1.28]), and East Asian American (EAA) (OR: 1.18, CI: [1.06, 1.31]) GIAs but not in African American (AA) GIA (OR: 1.04, CI: [0.93, 1.17]). Similarly, TUD-PGS offered strong risk stratification across PGS quantiles in EA and HL GIAs and inconsistently in EAA and AA GIAs. In a cross-ancestry phenome-wide association meta-analysis, TUD-PGS was associated with cardiometabolic, respiratory, and psychiatric phecodes (17 phecodes at P < 2.7E-05). In individuals with no history of smoking, the top TUD-PGS associations with obesity and alcohol-related disorders (P = 3.54E-07, 1.61E-06) persist. Mendelian Randomization (MR) analysis provides evidence of a causal association between adiposity measures and tobacco use. Inconsistent predictive performance of the TUD-PGS across GIAs motivates the inclusion of multiple ancestry populations at all levels of genetic research of tobacco use for equitable clinical translation of TUD-PGS. Phenome associations suggest that TUD-predisposed individuals may require comprehensive tobacco use prevention and management approaches to address underlying addictive tendencies.
Collapse
Affiliation(s)
- Vidhya Venkateswaran
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Oral Biology, School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Office of the Director and National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, 20892, USA.
| | - Kristin Boulier
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Medicine, Division of Cardiology, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Yi Ding
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Ruth Johnson
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
- Institute for Data Science in Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Institute of Precision Health, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| |
Collapse
|
13
|
Maldonado BL, Piqué DG, Kaplan RC, Claw KG, Gignoux CR. Genetic risk prediction in Hispanics/Latinos: milestones, challenges, and social-ethical considerations. J Community Genet 2023; 14:543-553. [PMID: 37962783 PMCID: PMC10725387 DOI: 10.1007/s12687-023-00686-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 10/18/2023] [Indexed: 11/15/2023] Open
Abstract
Genome-wide association studies (GWAS) have allowed the identification of disease-associated variants, which can be leveraged to build polygenic scores (PGSs). Even though PGSs can be a valuable tool in personalized medicine, their predictive power is limited in populations of non-European ancestry, particularly in admixed populations. Recent efforts have focused on increasing racial and ethnic diversity in GWAS, thus, addressing some of the limitations of genetic risk prediction in these populations. Even with these efforts, few studies focus exclusively on Hispanics/Latinos. Additionally, Hispanic/Latino populations are often considered a single population despite varying admixture proportions between and within ethnic groups, diverse genetic heterogeneity, and demographic history. Combined with highly heterogeneous environmental and socioeconomic exposures, this diversity can reduce the transferability of genetic risk prediction models. Given the recent increase of genomic studies that include Hispanics/Latinos, we review the milestones and efforts that focus on genetic risk prediction, summarize the potential for improving PGS transferability, and highlight the challenges yet to be addressed. Additionally, we summarize social-ethical considerations and provide ideas to promote genetic risk prediction models that can be implemented equitably.
Collapse
Affiliation(s)
- Betzaida L Maldonado
- Human Medical Genetics & Genomics Graduate Program, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA.
- Colorado Center for Personalized Medicine, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA.
- Department of Biomedical Informatics, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA.
| | - Daniel G Piqué
- Colorado Center for Personalized Medicine, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
- Section of Genetics and Metabolism, Department of Pediatrics, Children's Hospital Colorado, Aurora, CO, USA
| | - Robert C Kaplan
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Katrina G Claw
- Human Medical Genetics & Genomics Graduate Program, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
- Colorado Center for Personalized Medicine, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
- Department of Biomedical Informatics, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
| | - Christopher R Gignoux
- Human Medical Genetics & Genomics Graduate Program, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
- Colorado Center for Personalized Medicine, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
- Department of Biomedical Informatics, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
14
|
Koyama S, Wang Y, Paruchuri K, Uddin MM, Cho SMJ, Urbut SM, Haidermota S, Hornsby WE, Green RC, Daly MJ, Neale BM, Ellinor PT, Smoller JW, Lebo MS, Karlson EW, Martin AR, Natarajan P. Decoding Genetics, Ancestry, and Geospatial Context for Precision Health. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.24.23297096. [PMID: 37961173 PMCID: PMC10635180 DOI: 10.1101/2023.10.24.23297096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Mass General Brigham, an integrated healthcare system based in the Greater Boston area of Massachusetts, annually serves 1.5 million patients. We established the Mass General Brigham Biobank (MGBB), encompassing 142,238 participants, to unravel the intricate relationships among genomic profiles, environmental context, and disease manifestations within clinical practice. In this study, we highlight the impact of ancestral diversity in the MGBB by employing population genetics, geospatial assessment, and association analyses of rare and common genetic variants. The population structures captured by the genetics mirror the sequential immigration to the Greater Boston area throughout American history, highlighting communities tied to shared genetic and environmental factors. Our investigation underscores the potency of unbiased, large-scale analyses in a healthcare-affiliated biobank, elucidating the dynamic interplay across genetics, immigration, structural geospatial factors, and health outcomes in one of the earliest American sites of European colonization.
Collapse
Affiliation(s)
- Satoshi Koyama
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Ying Wang
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kaavya Paruchuri
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Md Mesbah Uddin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - So Mi J. Cho
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Integrative Research Center for Cerebrovascular and Cardiovascular Diseases, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Sarah M. Urbut
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Sara Haidermota
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Whitney E. Hornsby
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Robert C. Green
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine (Genetics), MassGeneralBrigham, Boston, MA, USA
- Broad Institute and Ariadne Labs, Boston, MA, USA
| | - Mark J. Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Molecular Medicine Finland (FIMM), Finland
- University of Helsinki, Helsinki, Finland
| | - Benjamin M. Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Patrick T. Ellinor
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Jordan W. Smoller
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Matthew S. Lebo
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Mass General Brigham Personalized Medicine, Cambridge, MA, USA
- Department of Pathology, Brigham and Women’s Hospital, Boston, MA, USA
| | - Elizabeth W. Karlson
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Mass General Brigham Personalized Medicine, Cambridge, MA, USA
- Division of Rheumatology, Inflammation and Immunity, Department of Medicine, Brigham and Women’s Hospital., Boston, MA, USA
| | - Alicia R. Martin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Pradeep Natarajan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
15
|
Caggiano C, Boudaie A, Shemirani R, Mefford J, Petter E, Chiu A, Ercelen D, He R, Tward D, Paul KC, Chang TS, Pasaniuc B, Kenny EE, Shortt JA, Gignoux CR, Balliu B, Arboleda VA, Belbin G, Zaitlen N. Disease risk and healthcare utilization among ancestrally diverse groups in the Los Angeles region. Nat Med 2023; 29:1845-1856. [PMID: 37464048 PMCID: PMC11121511 DOI: 10.1038/s41591-023-02425-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 05/30/2023] [Indexed: 07/20/2023]
Abstract
An individual's disease risk is affected by the populations that they belong to, due to shared genetics and environmental factors. The study of fine-scale populations in clinical care is important for identifying and reducing health disparities and for developing personalized interventions. To assess patterns of clinical diagnoses and healthcare utilization by fine-scale populations, we leveraged genetic data and electronic medical records from 35,968 patients as part of the UCLA ATLAS Community Health Initiative. We defined clusters of individuals using identity by descent, a form of genetic relatedness that utilizes shared genomic segments arising due to a common ancestor. In total, we identified 376 clusters, including clusters with patients of Afro-Caribbean, Puerto Rican, Lebanese Christian, Iranian Jewish and Gujarati ancestry. Our analysis uncovered 1,218 significant associations between disease diagnoses and clusters and 124 significant associations with specialty visits. We also examined the distribution of pathogenic alleles and found 189 significant alleles at elevated frequency in particular clusters, including many that are not regularly included in population screening efforts. Overall, this work progresses the understanding of health in understudied communities and can provide the foundation for further study into health inequities.
Collapse
Affiliation(s)
- Christa Caggiano
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA
| | | | - Ruhollah Shemirani
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Joel Mefford
- Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ella Petter
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Alec Chiu
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Defne Ercelen
- Computational and Systems Biology Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Rosemary He
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Daniel Tward
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Kimberly C Paul
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Timothy S Chang
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Institute of Precision Health, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jonathan A Shortt
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Division of Bioinformatics and Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Christopher R Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Division of Bioinformatics and Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Brunilda Balliu
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Valerie A Arboleda
- Department of Pathology and Laboratory Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Gillian Belbin
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Noah Zaitlen
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
16
|
Affiliation(s)
- Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA, USA.
| |
Collapse
|
17
|
Ding Y, Hou K, Xu Z, Pimplaskar A, Petter E, Boulier K, Privé F, Vilhjálmsson BJ, Olde Loohuis LM, Pasaniuc B. Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature 2023; 618:774-781. [PMID: 37198491 PMCID: PMC10284707 DOI: 10.1038/s41586-023-06079-4] [Citation(s) in RCA: 128] [Impact Index Per Article: 64.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 04/12/2023] [Indexed: 05/19/2023]
Abstract
Polygenic scores (PGSs) have limited portability across different groupings of individuals (for example, by genetic ancestries and/or social determinants of health), preventing their equitable use1-3. PGS portability has typically been assessed using a single aggregate population-level statistic (for example, R2)4, ignoring inter-individual variation within the population. Here, using a large and diverse Los Angeles biobank5 (ATLAS, n = 36,778) along with the UK Biobank6 (UKBB, n = 487,409), we show that PGS accuracy decreases individual-to-individual along the continuum of genetic ancestries7 in all considered populations, even within traditionally labelled 'homogeneous' genetic ancestries. The decreasing trend is well captured by a continuous measure of genetic distance (GD) from the PGS training data: Pearson correlation of -0.95 between GD and PGS accuracy averaged across 84 traits. When applying PGS models trained on individuals labelled as white British in the UKBB to individuals with European ancestries in ATLAS, individuals in the furthest GD decile have 14% lower accuracy relative to the closest decile; notably, the closest GD decile of individuals with Hispanic Latino American ancestries show similar PGS performance to the furthest GD decile of individuals with European ancestries. GD is significantly correlated with PGS estimates themselves for 82 of 84 traits, further emphasizing the importance of incorporating the continuum of genetic ancestries in PGS interpretation. Our results highlight the need to move away from discrete genetic ancestry clusters towards the continuum of genetic ancestries when considering PGSs.
Collapse
Affiliation(s)
- Yi Ding
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
| | - Kangcheng Hou
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Ziqi Xu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Aditya Pimplaskar
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Ella Petter
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Kristin Boulier
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Florian Privé
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark
| | - Bjarni J Vilhjálmsson
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute, Cambridge, MA, USA
| | - Loes M Olde Loohuis
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Institute for Precision Health, UCLA, Los Angeles, CA, USA.
| |
Collapse
|