1
|
Raben TG, Lello L, Widen E, Hsu SDH. Efficient blockLASSO for polygenic scores with applications to all of us and UK Biobank. BMC Genomics 2025; 26:302. [PMID: 40148775 PMCID: PMC11948729 DOI: 10.1186/s12864-025-11505-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 03/19/2025] [Indexed: 03/29/2025] Open
Abstract
We develop a "block" LASSO (blockLASSO) approach for training polygenic scores (PGS) and demonstrate its use in All of Us (AoU) and the UK Biobank (UKB). blockLASSO utilizes the approximate block diagonal structure (due to chromosomal partition of the genome) of linkage disequilibrium (LD). The new implementation can be used for exploratory and methods research where repeated PGS training is necessary and expensive. For 11 different phenotypes, in two different biobanks, and across 5 different ancestry groups (African, American, East Asian, European, and South Asian) - we demonstrate that blockLASSO is generally as effective for training PGS as a (global) LASSO. Previous work has shown penalized regression methods produce competitive PGS to alternative approaches. It has been shown that some phenotypes are more/less polygenic than others. Using sparse algorithms, an accurate PGS can be trained for type 1 diabetes (T1D) using ∼ 100 single nucleotide variants (SNVs), but a PGS for body mass index (BMI) would need more than 10k SNVs. blockLASSO produces similar PGS for phenotypes while training with just a fraction of the variants per block. Within AoU (using only genetic information) block PGS for T1D reaches an AUC of 0 . 63 ± 0.02 and for BMI a correlation of 0 . 21 ± 0.01 , whereas a global LASSO approach which finds for T1D an AUC 0 . 65 ± 0.03 and BMI a correlation 0 . 19 ± 0.03 . This new block approach is more computationally efficient and scalable than naive global machine learning approaches and makes it ideal for exploratory methods investigations based on penalized regression.
Collapse
Affiliation(s)
- Timothy G Raben
- Department of Physics and Astronomy, Michigan State University, East Lansing, USA.
| | - Louis Lello
- Department of Physics and Astronomy, Michigan State University, East Lansing, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| | - Erik Widen
- Department of Physics and Astronomy, Michigan State University, East Lansing, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| | - Stephen D H Hsu
- Department of Physics and Astronomy, Michigan State University, East Lansing, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| |
Collapse
|
2
|
Shastry V, Berg JJ. Allele ages provide limited information about the strength of negative selection. Genetics 2025; 229:iyae211. [PMID: 39698825 PMCID: PMC11912868 DOI: 10.1093/genetics/iyae211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Accepted: 12/12/2024] [Indexed: 12/20/2024] Open
Abstract
For many problems in population genetics, it is useful to characterize the distribution of fitness effects (DFE) of de novo mutations among a certain class of sites. A DFE is typically estimated by fitting an observed site frequency spectrum (SFS) to an expected SFS given a hypothesized distribution of selection coefficients and demographic history. The development of tools to infer gene trees from haplotype alignments, along with ancient DNA resources, provides us with additional information about the frequency trajectories of segregating mutations. Here, we ask how useful this additional information is for learning about the DFE, using the joint distribution on allele frequency and age to summarize information about the trajectory. To this end, we introduce an accurate and efficient numerical method for computing the density on the age of a segregating variant found at a given sample frequency, given the strength of selection and an arbitrarily complex population size history. We then use this framework to show that the unconditional age distribution of negatively selected alleles is very closely approximated by reweighting the neutral age distribution in terms of the negatively selected SFS, suggesting that allele ages provide little information about the DFE beyond that already contained in the present day frequency. To confirm this prediction, we extended the standard Poisson random field method to incorporate the joint distribution of frequency and age in estimating selection coefficients, and test its performance using simulations. We find that when the full SFS is observed and the true allele ages are known, including ages in the estimation provides only small increases in the accuracy of estimated selection coefficients. However, if only sites with frequencies above a certain threshold are observed, then the true ages can provide substantial information about the selection coefficients, especially when the selection coefficient is large. When ages are estimated from haplotype data using state-of-the-art tools, uncertainty about the age abrogates most of the additional information in the fully observed SFS case, while the neutral prior assumed in these tools when estimating ages induces a downward bias in the case of the thresholded SFS.
Collapse
Affiliation(s)
- Vivaswat Shastry
- Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Jeremy J Berg
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
3
|
Durvasula A, Price AL. Distinct explanations underlie gene-environment interactions in the UK Biobank. Am J Hum Genet 2025; 112:644-658. [PMID: 39965571 PMCID: PMC11947178 DOI: 10.1016/j.ajhg.2025.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 01/15/2025] [Accepted: 01/16/2025] [Indexed: 02/20/2025] Open
Abstract
The role of gene-environment (GxE) interaction in disease and complex trait architectures is widely hypothesized but currently unknown. Here, we apply three statistical approaches to quantify and distinguish three different types of GxE interaction for a given trait and environmental (E) variable. First, we detect locus-specific GxE interaction by testing for genetic correlation (rg) < 1 across E bins. Second, we detect genome-wide effects of the E variable on genetic variance by leveraging polygenic risk scores (PRSs) to test for significant PRSxE in a regression of phenotypes on PRS, E, and PRSxE, together with differences in SNP heritability across E bins. Third, we detect genome-wide proportional amplification of genetic and environmental effects as a function of the E variable by testing for significant PRSxE with no differences in SNP heritability across E bins. We applied our framework to 33 UK Biobank traits (25 quantitative traits and 8 diseases; average n = 325,000) and 10 E variables spanning lifestyle, diet, and other environmental exposures. First, we identified 19 trait-E pairs with rg significantly <1 (false discovery rate < 5%); 28 trait-E pairs with significant PRSxE and significant SNP heritability differences across E bins; and 15 trait-E pairs with significant PRSxE but no SNP heritability differences across E bins. Across the three scenarios, eight of the trait-E pairs involved disease traits, whose interpretation is complicated by scale effects. Analyses using biological sex as the E variable produced additional significant findings in each of these scenarios. Overall, we infer a significant contribution of GxE and GxSex effects to complex trait variance.
Collapse
Affiliation(s)
- Arun Durvasula
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Genetics, Harvard Medical School, Cambridge, MA, USA; Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Alkes L Price
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
4
|
Schwaba T, Mallard TT, Maihofer AX, Rhemtulla M, Lee PH, Smoller JW, Davis LK, Nivard MG, Grotzinger AD, Tucker-Drob EM. Comparison of the multivariate genetic architecture of eight major psychiatric disorders across sex. Nat Genet 2025; 57:583-590. [PMID: 40055480 PMCID: PMC12022846 DOI: 10.1038/s41588-025-02093-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 01/20/2025] [Indexed: 03/15/2025]
Abstract
Differences in the patterning of genetic sharing between groups of individuals may arise from biological pathways, social mechanisms, phenotyping and ascertainment. We expand genomic structural equation modeling to allow for testing genomic structural invariance (GSI), that is, the formal comparison of multivariate genetic architecture across groups. We apply GSI to compare the autosomal multivariate genetic architecture of eight psychiatric disorders spanning three factors (psychotic, neurodevelopmental and internalizing) between cisgender males and females. We find that the genetic factor structure is largely similar across sex, permitting meaningful comparisons of associations at the level of the factors. However, in females, problematic alcohol use and posttraumatic stress disorder loaded more strongly on the internalizing factor, while the neurodevelopmental disorder factor exhibited weaker genetic correlations with the other factors. Four phenotypes (educational attainment, insomnia, smoking and deprivation) showed significant, albeit small, sex-differentiated associations with the psychotic factor. As genome-wide association study samples grow and diversify, GSI will become increasingly valuable for comparing multivariate genetic architecture across groups.
Collapse
Affiliation(s)
- Ted Schwaba
- Department of Psychology, Michigan State University, East Lansing, MI, USA.
| | - Travis T Mallard
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Adam X Maihofer
- Research Service, Veterans Affairs San Diego Healthcare System, San Diego, CA, USA
- Department of Psychiatry, University of California, San Diego, CA, USA
- Center of Excellence for Stress and Mental Health, Veterans Affairs San Diego Healthcare System, San Diego, CA, USA
| | - Mijke Rhemtulla
- Department of Psychology, University of California, Davis, Davis, CA, USA
| | - Phil H Lee
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Jordan W Smoller
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Lea K Davis
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York City, NY, USA
- Division of Data Driven and Digital Medicine, Department of Medicine, Mount Sinai Hospital, New York City, NY, USA
| | - Michel G Nivard
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Andrew D Grotzinger
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
- Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, USA
| | - Elliot M Tucker-Drob
- Department of Psychology, University of Texas at Austin, Austin, TX, USA
- Population Research Center, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
5
|
Hu S, Ferreira LAF, Shi S, Hellenthal G, Marchini J, Lawson DJ, Myers SR. Fine-scale population structure and widespread conservation of genetic effect sizes between human groups across traits. Nat Genet 2025; 57:379-389. [PMID: 39901012 PMCID: PMC11821542 DOI: 10.1038/s41588-024-02035-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 11/18/2024] [Indexed: 02/05/2025]
Abstract
Understanding genetic differences between populations is essential for avoiding confounding in genome-wide association studies and improving polygenic score (PGS) portability. We developed a statistical pipeline to infer fine-scale Ancestry Components and applied it to UK Biobank data. Ancestry Components identify population structure not captured by widely used principal components, improving stratification correction for geographically correlated traits. To estimate the similarity of genetic effect sizes between groups, we developed ANCHOR, which estimates changes in the predictive power of an existing PGS in distinct local ancestry segments. ANCHOR infers highly similar (estimated correlation 0.98 ± 0.07) effect sizes between UK Biobank participants of African and European ancestry for 47 of 53 quantitative phenotypes, suggesting that gene-environment and gene-gene interactions do not play major roles in poor cross-ancestry PGS transferability for these traits in the United Kingdom, and providing optimism that shared causal mutations operate similarly in different populations.
Collapse
Affiliation(s)
- Sile Hu
- Department of Statistics, University of Oxford, Oxford, UK.
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
- Human Genetics Centre of Excellence, Novo Nordisk Research Centre Oxford, Oxford, UK.
| | - Lino A F Ferreira
- Department of Statistics, University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Sinan Shi
- Department of Statistics, University of Oxford, Oxford, UK
| | - Garrett Hellenthal
- Department of Genetics, Evolution and Environment, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| | | | - Daniel J Lawson
- Department of Statistical Science, School of Mathematics, University of Bristol, Bristol, UK.
- MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK.
| | - Simon R Myers
- Department of Statistics, University of Oxford, Oxford, UK.
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
6
|
Hui D, Dudek S, Kiryluk K, Walunas TL, Kullo IJ, Wei WQ, Tiwari H, Peterson JF, Chung WK, Davis BH, Khan A, Kottyan LC, Limdi NA, Feng Q, Puckelwartz MJ, Weng C, Smith JL, Karlson EW, Regeneron Genetics Center, Penn Medicine BioBank, Jarvik GP, Ritchie MD. Risk factors affecting polygenic score performance across diverse cohorts. eLife 2025; 12:RP88149. [PMID: 39851248 PMCID: PMC11771958 DOI: 10.7554/elife.88149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2025] Open
Abstract
Apart from ancestry, personal or environmental covariates may contribute to differences in polygenic score (PGS) performance. We analyzed the effects of covariate stratification and interaction on body mass index (BMI) PGS (PGSBMI) across four cohorts of European (N = 491,111) and African (N = 21,612) ancestry. Stratifying on binary covariates and quintiles for continuous covariates, 18/62 covariates had significant and replicable R2 differences among strata. Covariates with the largest differences included age, sex, blood lipids, physical activity, and alcohol consumption, with R2 being nearly double between best- and worst-performing quintiles for certain covariates. Twenty-eight covariates had significant PGSBMI-covariate interaction effects, modifying PGSBMI effects by nearly 20% per standard deviation change. We observed overlap between covariates that had significant R2 differences among strata and interaction effects - across all covariates, their main effects on BMI were correlated with their maximum R2 differences and interaction effects (0.56 and 0.58, respectively), suggesting high-PGSBMI individuals have highest R2 and increase in PGS effect. Using quantile regression, we show the effect of PGSBMI increases as BMI itself increases, and that these differences in effects are directly related to differences in R2 when stratifying by different covariates. Given significant and replicable evidence for context-specific PGSBMI performance and effects, we investigated ways to increase model performance taking into account nonlinear effects. Machine learning models (neural networks) increased relative model R2 (mean 23%) across datasets. Finally, creating PGSBMI directly from GxAge genome-wide association studies effects increased relative R2 by 7.8%. These results demonstrate that certain covariates, especially those most associated with BMI, significantly affect both PGSBMI performance and effects across diverse cohorts and ancestries, and we provide avenues to improve model performance that consider these effects.
Collapse
Affiliation(s)
- Daniel Hui
- Department of Genetics, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| | - Scott Dudek
- Department of Genetics, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| | - Krzysztof Kiryluk
- Division of Nephrology, Department of Medicine, Columbia UniversityNew YorkUnited States
| | - Theresa L Walunas
- Department of Preventive Medicine, Northwestern University Feinberg School of MedicineChicagoUnited States
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo ClinicRochesterUnited States
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical CenterNashvilleUnited States
| | - Hemant Tiwari
- Department of Pediatrics, University of Alabama at BirminghamBirminghamUnited States
| | - Josh F Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical CenterNashvilleUnited States
| | - Wendy K Chung
- Departments of Pediatrics and Medicine, Columbia University Irving Medical Center, Columbia UniversityNew YorkUnited States
| | - Brittney H Davis
- Department of Neurology, School of Medicine, University of Alabama at BirminghamBirminghamUnited States
| | - Atlas Khan
- Division of Nephrology, Department of Medicine, Columbia UniversityNew YorkUnited States
| | - Leah C Kottyan
- The Center for Autoimmune Genomics and Etiology, Division of Human Genetics, Cincinnati Children's Hospital Medical CenterCincinnatiUnited States
| | - Nita A Limdi
- Department of Neurology, School of Medicine, University of Alabama at BirminghamBirminghamUnited States
| | - Qiping Feng
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical CenterNashvilleUnited States
| | - Megan J Puckelwartz
- Center for Genetic Medicine, Northwestern University Feinberg School of MedicineChicagoUnited States
| | - Chunhua Weng
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia UniversityNew YorkUnited States
| | - Johanna L Smith
- Department of Cardiovascular Medicine, Mayo ClinicRochesterUnited States
| | - Elizabeth W Karlson
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical SchoolBostonUnited States
| | | | | | - Gail P Jarvik
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington Medical CenterSeattleUnited States
| | - Marylyn D Ritchie
- Department of Genetics, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| |
Collapse
|
7
|
Gunn S, Wang X, Posner DC, Cho K, Huffman JE, Gaziano M, Wilson PW, Sun YV, Peloso G, Lunetta KL. Comparison of methods for building polygenic scores for diverse populations. HGG ADVANCES 2025; 6:100355. [PMID: 39323095 PMCID: PMC11532986 DOI: 10.1016/j.xhgg.2024.100355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 09/22/2024] [Accepted: 09/22/2024] [Indexed: 09/27/2024] Open
Abstract
Polygenic scores (PGSs) are a promising tool for estimating individual-level genetic risk of disease based on the results of genome-wide association studies (GWASs). However, their promise has yet to be fully realized because most currently available PGSs were built with genetic data from predominantly European-ancestry populations, and PGS performance declines when scores are applied to target populations different from the populations from which they were derived. Thus, there is a great need to improve PGS performance in currently under-studied populations. In this work we leverage data from two large and diverse cohorts the Million Veterans Program (MVP) and All of Us (AoU), providing us the unique opportunity to compare methods for building PGSs for multi-ancestry populations across multiple traits. We build PGSs for five continuous traits and five binary traits using both multi-ancestry and single-ancestry approaches with popular Bayesian PGS methods and both MVP META GWAS results and population-specific GWAS results from the respective African, European, and Hispanic MVP populations. We evaluate these scores in three AoU populations genetically similar to the respective African, Admixed American, and European 1000 Genomes Project superpopulations. Using correlation-based tests, we make formal comparisons of the PGS performance across the multiple AoU populations. We conclude that approaches that combine GWAS data from multiple populations produce PGSs that perform better than approaches that utilize smaller single-population GWAS results matched to the target population, and specifically that multi-ancestry scores built with PRS-CSx outperform the other approaches in the three AoU populations.
Collapse
Affiliation(s)
- Sophia Gunn
- Biostatistics, Boston University School of Public Health, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA.
| | - Xin Wang
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA; Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Daniel C Posner
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC) , Boston, MA, USA
| | - Kelly Cho
- Department of Medicine, Harvard Medical School, Boston, MA, USA; MVP Boston Coordinating Center, VA Boston Healthcare System, Boston, MA, USA; Department of Medicine, Division of Aging, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Jennifer E Huffman
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC) , Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA; Palo Alto Veterans Institute for Research (PAVIR), Palo Alto Health Care System, Palo Alto, CA, USA
| | - Michael Gaziano
- Department of Medicine, Harvard Medical School, Boston, MA, USA; MVP Boston Coordinating Center, VA Boston Healthcare System, Boston, MA, USA; Department of Medicine, Division of Aging, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Peter W Wilson
- VA Atlanta Healthcare System, Decatur, GA, USA; Division of Cardiology, Department of Medicine, Emory University School of Medicine, Atlanta, GA, USA; Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Yan V Sun
- VA Atlanta Healthcare System, Decatur, GA, USA; Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Gina Peloso
- Biostatistics, Boston University School of Public Health, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA
| | - Kathryn L Lunetta
- Biostatistics, Boston University School of Public Health, Boston, MA, USA
| |
Collapse
|
8
|
Hou L, Wu S, Yuan Z, Xue F, Li H. TEMR: Trans-ethnic mendelian randomization method using large-scale GWAS summary datasets. Am J Hum Genet 2025; 112:28-43. [PMID: 39689714 PMCID: PMC11739928 DOI: 10.1016/j.ajhg.2024.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2024] [Revised: 11/14/2024] [Accepted: 11/18/2024] [Indexed: 12/19/2024] Open
Abstract
Available large-scale genome-wide association study (GWAS) summary datasets predominantly stem from European populations, while sample sizes for other ethnicities, notably Central/South Asian, East Asian, African, Hispanic, etc., remain comparatively limited, resulting in low precision of causal effect estimations within these ethnicities when using Mendelian randomization (MR). In this paper, we propose a trans-ethnic MR method, TEMR, to improve the statistical power and estimation precision of MR in a target population that is underrepresented, using trans-ethnic large-scale GWAS summary datasets. TEMR incorporates trans-ethnic genetic correlation coefficients through a conditional likelihood-based inference framework, producing calibrated p values with substantially improved MR power. In the simulation study, compared with other existing MR methods, TEMR exhibited superior precision and statistical power in causal effect estimation within the target populations. Finally, we applied TEMR to infer causal relationships between concentrations of 16 blood biomarkers and the risk of developing five diseases (hypertension, ischemic stroke, type 2 diabetes, schizophrenia, and major depression disorder) in East Asian, African, and Hispanic/Latino populations, leveraging biobank-scale GWAS summary data obtained from individuals of European descent. We found that the causal biomarkers were mostly validated by previous MR methods, and we also discovered 17 causal relationships that were not identified using previously published MR methods.
Collapse
Affiliation(s)
- Lei Hou
- Department of Medical Data, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250000, P.R. China; Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan 250000, P.R. China
| | - Sijia Wu
- Department of Medical Data, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250000, P.R. China; Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan 250000, P.R. China
| | - Zhongshang Yuan
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan 250000, P.R. China; Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250000, P.R. China
| | - Fuzhong Xue
- Department of Medical Data, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250000, P.R. China; Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan 250000, P.R. China; Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan 250000, P.R. China.
| | - Hongkai Li
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan 250000, P.R. China; Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250000, P.R. China.
| |
Collapse
|
9
|
Saitou M, Dahl A, Wang Q, Liu X. Allele frequency impacts the cross-ancestry portability of gene expression prediction in lymphoblastoid cell lines. Am J Hum Genet 2024; 111:2814-2825. [PMID: 39549695 PMCID: PMC11639078 DOI: 10.1016/j.ajhg.2024.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 10/15/2024] [Accepted: 10/16/2024] [Indexed: 11/18/2024] Open
Abstract
Population-level genetic studies are overwhelmingly biased toward European ancestries. Transferring genetic predictions from European ancestries to other ancestries results in a substantial loss of accuracy. Yet, it remains unclear how much various genetic factors, such as causal effect differences, linkage disequilibrium (LD) differences, or allele frequency differences, contribute to the loss of prediction accuracy across ancestries. In this study, we used gene expression levels in lymphoblastoid cell lines to understand how much each genetic factor contributes to lowered portability of gene expression prediction from European to African ancestries. We found that cis-genetic effects on gene expression are highly similar between European and African individuals. However, we found that allele frequency differences of causal variants have a striking impact on prediction portability. For example, portability is reduced by more than 32% when the causal cis-variant is common (minor allele frequency, MAF >5%) in European samples (training population) but is rarer (MAF <5%) in African samples (prediction population). While large allele frequency differences can decrease portability through increasing LD differences, we also determined that causal allele frequency can significantly impact portability when the impact from LD is substantially controlled. This observation suggests that improving statistical fine-mapping alone does not overcome the loss of portability resulting from differences in causal allele frequency. We conclude that causal cis-eQTL effects are highly similar in European and African individuals, and allele frequency differences have a large impact on the accuracy of gene expression prediction.
Collapse
Affiliation(s)
- Marie Saitou
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA; Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian Universities of Life Sciences, As, Norway
| | - Andy Dahl
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA; Department of Human Genetics, The University of Chicago, Chicago, IL, USA
| | - Qingbo Wang
- Department of Statistical Genetics, Graduate School of Medicine, Osaka University, Suita, Japan
| | - Xuanyao Liu
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA; Department of Human Genetics, The University of Chicago, Chicago, IL, USA.
| |
Collapse
|
10
|
Wang J, Zhang Z, Lu Z, Mancuso N, Gazal S. Genes with differential expression across ancestries are enriched in ancestry-specific disease effects likely due to gene-by-environment interactions. Am J Hum Genet 2024; 111:2117-2128. [PMID: 39191255 PMCID: PMC11480800 DOI: 10.1016/j.ajhg.2024.07.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 07/26/2024] [Accepted: 07/30/2024] [Indexed: 08/29/2024] Open
Abstract
Multi-ancestry genome-wide association studies (GWASs) have highlighted the existence of variants with ancestry-specific effect sizes. Understanding where and why these ancestry-specific effects occur is fundamental to understanding the genetic basis of human diseases and complex traits. Here, we characterized genes differentially expressed across ancestries (ancDE genes) at the cell-type level by leveraging single-cell RNA-sequencing data in peripheral blood mononuclear cells for 21 individuals with East Asian (EAS) ancestry and 23 individuals with European (EUR) ancestry (172,385 cells); then, we tested whether variants surrounding those genes were enriched in disease variants with ancestry-specific effect sizes by leveraging ancestry-matched GWASs of 31 diseases and complex traits (average n ∼ 90,000 and ∼ 267,000 in EAS and EUR, respectively). We observed that ancDE genes tended to be cell-type specific and enriched in genes interacting with the environment and in variants with ancestry-specific disease effect sizes, which suggests cell-type-specific, gene-by-environment interactions shared between regulatory and disease architectures. Finally, we illustrated how different environments might have led to ancestry-specific myeloid cell leukemia 1 (MCL1) expression in B cells and ancestry-specific allele effect sizes in lymphocyte count GWASs for variants surrounding MCL1. Our results imply that large single-cell and GWAS datasets from diverse ancestries are required to improve our understanding of human diseases.
Collapse
Affiliation(s)
- Juehan Wang
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| | - Zixuan Zhang
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Zeyun Lu
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Nicholas Mancuso
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Steven Gazal
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
11
|
Bastos CMC, da Silva Machado LM, Crispim D, Canani LH, dos Santos KG. Association of the rs9896052 Polymorphism Upstream of GRB2 with Proliferative Diabetic Retinopathy in Patients with Less than 10 Years of Diabetes. Int J Mol Sci 2024; 25:10232. [PMID: 39408563 PMCID: PMC11477274 DOI: 10.3390/ijms251910232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 09/17/2024] [Accepted: 09/20/2024] [Indexed: 10/20/2024] Open
Abstract
Growth factor receptor-bound protein 2 (GRB2) is a negative regulator of insulin signaling and a positive regulator of angiogenesis. Its expression is increased in a mouse model of retinal neovascularization and in patients with type 2 diabetes mellitus (T2DM). This case-control study aimed to investigate the association between the rs9896052 polymorphism (A>C) upstream of GRB2 and proliferative diabetic retinopathy (PDR) in patients with T2DM from Southern Brazil, taking into consideration self-reported skin color (white or non-white) and the known duration of diabetes (<10 years or ≥10 years). Genotypes were determined by real-time PCR in 838 patients with T2DM (284 cases with PDR and 554 controls without DR). In the total study group and in the analysis stratified by skin color, the genotype and allele frequencies were similar between cases and controls. However, among patients with less than 10 years of diabetes, the C allele was more frequent in cases than in controls (63.3% versus 51.8%, p = 0.032), and the CC genotype was independently associated with an increased risk of PDR (adjusted OR = 2.82, 95% CI 1.17-6.75). In conclusion, our findings support the hypothesis that the rs9896052 polymorphism near GRB2 is associated with PDR in Brazilian patients with T2DM.
Collapse
Affiliation(s)
- Caroline Moura Cardoso Bastos
- Laboratory of Human Molecular Genetics, Lutheran University of Brazil (ULBRA), Av. Farroupilha 8001, Canoas 92425-900, RS, Brazil; (C.M.C.B.); (L.M.d.S.M.)
| | - Lucas Marcelo da Silva Machado
- Laboratory of Human Molecular Genetics, Lutheran University of Brazil (ULBRA), Av. Farroupilha 8001, Canoas 92425-900, RS, Brazil; (C.M.C.B.); (L.M.d.S.M.)
| | - Daisy Crispim
- Endocrine Division, Clinical Hospital of Porto Alegre (HCPA), R. Ramiro Barcelos 2350, Porto Alegre 90035-903, RS, Brazil;
| | - Luís Henrique Canani
- Department of Internal Medicine, Federal University of Rio Grande do Sul (UFRGS), R. Ramiro Barcelos 2400, Porto Alegre 90035-003, RS, Brazil;
| | - Kátia Gonçalves dos Santos
- Laboratory of Human Molecular Genetics, Lutheran University of Brazil (ULBRA), Av. Farroupilha 8001, Canoas 92425-900, RS, Brazil; (C.M.C.B.); (L.M.d.S.M.)
| |
Collapse
|
12
|
Akbari A, Barton AR, Gazal S, Li Z, Kariminejad M, Perry A, Zeng Y, Mittnik A, Patterson N, Mah M, Zhou X, Price AL, Lander ES, Pinhasi R, Rohland N, Mallick S, Reich D. Pervasive findings of directional selection realize the promise of ancient DNA to elucidate human adaptation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.14.613021. [PMID: 39314480 PMCID: PMC11419161 DOI: 10.1101/2024.09.14.613021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
We present a method for detecting evidence of natural selection in ancient DNA time-series data that leverages an opportunity not utilized in previous scans: testing for a consistent trend in allele frequency change over time. By applying this to 8433 West Eurasians who lived over the past 14000 years and 6510 contemporary people, we find an order of magnitude more genome-wide significant signals than previous studies: 347 independent loci with >99% probability of selection. Previous work showed that classic hard sweeps driving advantageous mutations to fixation have been rare over the broad span of human evolution, but in the last ten millennia, many hundreds of alleles have been affected by strong directional selection. Discoveries include an increase from ~0% to ~20% in 4000 years for the major risk factor for celiac disease at HLA-DQB1; a rise from ~0% to ~8% in 6000 years of blood type B; and fluctuating selection at the TYK2 tuberculosis risk allele rising from ~2% to ~9% from ~5500 to ~3000 years ago before dropping to ~3%. We identify instances of coordinated selection on alleles affecting the same trait, with the polygenic score today predictive of body fat percentage decreasing by around a standard deviation over ten millennia, consistent with the "Thrifty Gene" hypothesis that a genetic predisposition to store energy during food scarcity became disadvantageous after farming. We also identify selection for combinations of alleles that are today associated with lighter skin color, lower risk for schizophrenia and bipolar disease, slower health decline, and increased measures related to cognitive performance (scores on intelligence tests, household income, and years of schooling). These traits are measured in modern industrialized societies, so what phenotypes were adaptive in the past is unclear. We estimate selection coefficients at 9.9 million variants, enabling study of how Darwinian forces couple to allelic effects and shape the genetic architecture of complex traits.
Collapse
Affiliation(s)
- Ali Akbari
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Alison R Barton
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Steven Gazal
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Zheng Li
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | | | - Annabel Perry
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yating Zeng
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Alissa Mittnik
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Matthew Mah
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Xiang Zhou
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Alkes L Price
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Ron Pinhasi
- Department of Biology, Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
| | - Nadin Rohland
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
13
|
IGVF Consortium, Writing group (ordered by contribution), Engreitz JM, Lawson HA, Singh H, Starita LM, Hon GC, Carter H, Sahni N, Reddy TE, Lin X, Li Y, Munshi NV, Chahrour MH, Boyle AP, Hitz BC, Mortazavi A, Craven M, Mohlke KL, Pinello L, Wang T, Steering Committee Co-Chairs (alphabetical by last name), Kundaje A, Yue F, Code of Conduct Committee (alphabetical by last name), Cody S, Farrell NP, Love MI, Muffley LA, Pazin MJ, Reese F, Van Buren E, Working Group and Focus Group Co-Chairs (alphabetical by last name), Catalog, Dey KK, Characterization, Kircher M, Computational Analysis, Modeling, and Prediction, Ma J, Radivojac P, Project Design, Balliu B, Mapping, Williams BA, Networks, Huangfu D, Standards and Pipelines, Cardiometabolic, Park CY, Quertermous T, Cellular Programs and Networks, Das J, Coding Variants, Calderwood MA, Fowler DM, Vidal M, CRISPR, Ferreira L, Defining and Systematizing Function, Mooney SD, Pejaver V, Enumerating Variants, Zhao J, Evolution, Gazal S, Koch E, Reilly SK, Sunyaev S, Imaging, Carpenter AE, Immune, Buenrostro JD, Leslie CS, Savage RE, Impact on Diverse Populations, Giric S, iPSC, Luo C, Plath K, MPRA, Barrera A, Schubach M, Noncoding Variants, Gschwind AR, Moore JE, Neuro, Ahituv N, Phenotypic Impact and Function, Yi SS, QTL/Statgen, Hallgrimsdottir I, Gaulton KJ, Sakaue S, Single Cell, Booeshaghi S, Mattei E, Nair S, Pachter L, Wang AT, Characterization Awards (contact PI, MPIs (alphabetical by last name), other members (alphabetical by last name)), et alIGVF Consortium, Writing group (ordered by contribution), Engreitz JM, Lawson HA, Singh H, Starita LM, Hon GC, Carter H, Sahni N, Reddy TE, Lin X, Li Y, Munshi NV, Chahrour MH, Boyle AP, Hitz BC, Mortazavi A, Craven M, Mohlke KL, Pinello L, Wang T, Steering Committee Co-Chairs (alphabetical by last name), Kundaje A, Yue F, Code of Conduct Committee (alphabetical by last name), Cody S, Farrell NP, Love MI, Muffley LA, Pazin MJ, Reese F, Van Buren E, Working Group and Focus Group Co-Chairs (alphabetical by last name), Catalog, Dey KK, Characterization, Kircher M, Computational Analysis, Modeling, and Prediction, Ma J, Radivojac P, Project Design, Balliu B, Mapping, Williams BA, Networks, Huangfu D, Standards and Pipelines, Cardiometabolic, Park CY, Quertermous T, Cellular Programs and Networks, Das J, Coding Variants, Calderwood MA, Fowler DM, Vidal M, CRISPR, Ferreira L, Defining and Systematizing Function, Mooney SD, Pejaver V, Enumerating Variants, Zhao J, Evolution, Gazal S, Koch E, Reilly SK, Sunyaev S, Imaging, Carpenter AE, Immune, Buenrostro JD, Leslie CS, Savage RE, Impact on Diverse Populations, Giric S, iPSC, Luo C, Plath K, MPRA, Barrera A, Schubach M, Noncoding Variants, Gschwind AR, Moore JE, Neuro, Ahituv N, Phenotypic Impact and Function, Yi SS, QTL/Statgen, Hallgrimsdottir I, Gaulton KJ, Sakaue S, Single Cell, Booeshaghi S, Mattei E, Nair S, Pachter L, Wang AT, Characterization Awards (contact PI, MPIs (alphabetical by last name), other members (alphabetical by last name)), UM1HG011966, Shendure J, Agarwal V, Blair A, Chalkiadakis T, Chardon FM, Dash PM, Deng C, Hamazaki N, Keukeleire P, Kubo C, Lalanne JB, Maass T, Martin B, McDiarmid TA, Nobuhara M, Page NF, Regalado S, Sims J, Ushiki A, UM1HG011969, Best SM, Boyle G, Camp N, Casadei S, Da EY, Dawood M, Dawson SC, Fayer S, Hamm A, James RG, Jarvik GP, McEwen AE, Moore N, Pendyala S, Popp NA, Post M, Rubin AF, Smith NT, Stone J, Tejura M, Wang ZR, Wheelock MK, Woo I, Zapp BD, UM1HG011972, Amgalan D, Aradhana A, Arana SM, Bassik MC, Bauman JR, Bhattacharya A, Cai XS, Chen Z, Conley S, Deshpande S, Doughty BR, Du PP, Galante JA, Gifford C, Greenleaf WJ, Guo K, Gupta R, Isobe S, Jagoda E, Jain N, Jones H, Kang HY, Kim SH, Kim Y, Klemm S, Kundu R, Kundu S, Lago-Docampo M, Lee-Yow YC, Levin-Konigsberg R, Li DY, Lindenhofer D, Ma XR, Marinov GK, Martyn GE, McCreery CV, Metzl-Raz E, Monteiro JP, Montgomery MT, Mualim KS, Munger C, Munson G, Nguyen TC, Nguyen T, Palmisano BT, Pampari A, Rabinovitch M, Ramste M, Ray J, Roy KR, Rubio OM, Schaepe JM, Schnitzler G, Schreiber J, Sharma D, Sheth MU, Shi H, Singh V, Sinha R, Steinmetz LM, Tan J, Tan A, Tycko J, Valbuena RC, Amiri VVP, van Kooten MJFM, Vaughan-Jackson A, Venida A, Weldy CS, Worssam MD, Xia F, Yao D, Zeng T, Zhao Q, Zhou R, UM1HG011989, Chen ZS, Cimini BA, Coppin G, Coté AG, Haghighi M, Hao T, Hill DE, Lacoste J, Laval F, Reno C, Roth FP, Singh S, Spirohn-Fitzgerald K, Taipale M, Teelucksingh T, Tixhon M, Yadav A, Yang Z, UM1HG011996, Kraus WL, Armendariz DA, Dederich AE, Gogate A, El Hayek L, Goetsch SC, Kaur K, Kim HB, McCoy MK, Nzima MZ, Pinzón-Arteaga CA, Posner BA, Schmitz DA, Sivakumar S, Sundarrajan A, Wang L, Wang Y, Wu J, Xu L, Xu J, Yu L, Zhang Y, Zhao H, Zhou Q, UM1HG012003, Won H, Bell JL, Broadaway KA, Degner KN, Etheridge AS, Koller BH, Mah W, Mu W, Ritola KD, Rosen JD, Schoenrock SA, Sharp RA, UM1HG012010, Bauer D, Lettre G, Sherwood R, Becerra B, Blaine LJ, Che E, Francoeur MJ, Gibbs EN, Kim N, King EM, Kleinstiver BP, Lecluze E, Li Z, Patel ZM, Phan QV, Ryu J, Starr ML, Wu T, UM1HG012053, Gersbach CA, Crawford GE, Allen AS, Majoros WH, Iglesias N, Rai R, Venukuttan R, Li B, Anglen T, Bounds LR, Hamilton MC, Liu S, McCutcheon SR, McRoberts Amador CD, Reisman SJ, ter Weele MA, Bodle JC, Streff HL, Siklenka K, Strouse K, Mapping Awards (contact PI, MPIs (alphabetical by last name), other members (alphabetical by last name)), UM1HG011986, Bernstein BE, Babu J, Corona GB, Dong K, Duarte FM, Durand NC, Epstein CB, Fan K, Gaskell E, Hall AW, Ham AM, Knudson MK, Shoresh N, Wekhande S, White CM, Xi W, UM1HG012076, Satpathy AT, Corces MR, Chang SH, Chin IM, Gardner JM, Gardell ZA, Gutierrez JC, Johnson AW, Kampman L, Kasowski M, Lareau CA, Liu V, Ludwig LS, McGinnis CS, Menon S, Qualls A, Sandor K, Turner AW, Ye CJ, Yin Y, Zhang W, UM1HG012077, Wold BJ, Carilli M, Cheong D, Filibam G, Green K, Kawauchi S, Kim C, Liang H, Loving R, Luebbert L, MacGregor G, Merchan AG, Rebboah E, Rezaie N, Sakr J, Sullivan DK, Swarna N, Trout D, Upchurch S, Weber R, Predictive Modeling Awards (contact PI, MPIs (alphabetical by last name), other members (alphabetical by last name)), U01HG011952, Castro CP, Chou E, Feng F, Guerra A, Huang Y, Jiang L, Liu J, Mills RE, Qian W, Qin T, Sartor MA, Sherpa RN, Wang J, Wang Y, Welch JD, Zhang Z, Zhao N, U01HG011967, Mukherjee S, Page CD, Clarke S, Doty RW, Duan Y, Gordan R, Ko KY, Li S, Li B, Thomson A, U01HG012009, Raychaudhuri S, Price A, Ali TA, Dey KK, Durvasula A, Kellis M, U01HG012022, Iakoucheva LM, Kakati T, Chen Y, Benazouz M, Jain S, Zeiberg D, De Paolis Kaluza MC, Velyunskiy M, U01HG012039, Gasch A, Huang K, Jin Y, Lu Q, Miao J, Ohtake M, Scopel E, Steiner RD, Sverchkov Y, U01HG012064, Weng Z, Garber M, Fu Y, Haas N, Li X, Phalke N, Shan SC, Shedd N, Yu T, Zhang Y, Zhou H, U01HG012069, Battle A, Jerby L, Kotler E, Kundu S, Marderstein AR, Montgomery SB, Nigam A, Padhi EM, Patel A, Pritchard J, Raine I, Ramalingam V, Rodrigues KB, Schreiber JM, Singhal A, Sinha R, Wang AT, Network Projects (contact PI, MPIs (alphabetical by last name), other members (alphabetical by last name)), U01HG012041, Abundis M, Bisht D, Chakraborty T, Fan J, Hall DR, Rarani ZH, Jain AK, Kaundal B, Keshari S, McGrail D, Pease NA, Yi VF, U01HG012047, Wu H, Kannan S, Song H, Cai J, Gao Z, Kurzion R, Leu JI, Li F, Liang D, Ming GL, Musunuru K, Qiu Q, Shi J, Su Y, Tishkoff S, Xie N, Yang Q, Yang W, Zhang H, Zhang Z, U01HG012051, Beer MA, Hadjantonakis AK, Adeniyi S, Cho H, Cutler R, Glenn RA, Godovich D, Hu N, Jovanic S, Luo R, Oh JW, Razavi-Mohseni M, Shigaki D, Sidoli S, Vierbuchen T, Wang X, Williams B, Yan J, Yang D, Yang Y, U01HG012059, Sander M, Gaulton KJ, Ren B, Bartosik W, Indralingam HS, Klie A, Mummey H, Okino ML, Wang G, Zemke NR, Zhang K, Zhu H, U01HG012079, Zaitlen N, Ernst J, Langerman J, Li T, Sun Y, U01HG012103, Rudensky AY, Periyakoil PK, Gao VR, Smith MH, Thomas NM, Donlin LT, Lakhanpal A, Southard KM, Ardy RC, Data and Administrative Coordinating Center Awards (contact PI, MPIs (alphabetical by last name), other members (alphabetical by last name)), U24HG012012, Cherry JM, Gerstein MB, Andreeva K, Assis PR, Borsari B, Douglass E, Dong S, Gabdank I, Graham K, Jolanki O, Jou J, Kagda MS, Lee JW, Li M, Lin K, Miyasato SR, Rozowsky J, Small C, Spragins E, Tanaka FY, Whaling IM, Youngworth IA, Sloan CA, U24HG012070, Belter E, Chen X, Chisholm RL, Dickson P, Fan C, Fulton L, Li D, Lindsay T, Luan Y, Luo Y, Lyu H, Ma X, Macias-Velasco J, Miga KH, Quaid K, Stitziel N, Stranger BE, Tomlinson C, Wang J, Zhang W, Zhang B, Zhao G, Zhuo X, IGVF Affiliate Member Projects (contact PIs, other members (alphabetical by last name)), Brennand lab, Brennand K, Ciccia lab, Ciccia A, Hayward SB, Huang JW, Leuzzi G, Taglialatela A, Thakar T, Vaitsiankova A, Dey lab, Dey KK, Ali TA, Gazal lab, Kim A, Grimes lab, Grimes HL, Salomonis N, Gupta lab, Gupta R, Fang S, Lee-Kim V, Heinig lab, Heinig M, Losert C, Jones lab, Jones TR, Donnard E, Murphy M, Roberts E, Song S, Moore lab, Mostafavi lab, Mostafavi S, Sasse A, Spiro A, Pennacchio and Visel lab, Pennacchio LA, Kato M, Kosicki M, Mannion B, Slaven N, Visel A, Pollard lab, Pollard KS, Drusinsky S, Whalen S, Ray lab, Ray J, Harten IA, Ho CH, Reilly lab, Sanjana lab, Sanjana NE, Caragine C, Morris JA, Seruggia lab, Seruggia D, Kutschat AP, Wittibschlager S, Xu lab, Xu H, Fu R, He W, Zhang L, Yi lab, Osorio D, NHGRI Program Management (alphabetical by last name), Bly Z, Calluori S, Gilchrist DA, Hutter CM, Morris SA, Samer EK. Deciphering the impact of genomic variation on function. Nature 2024; 633:47-57. [PMID: 39232149 PMCID: PMC11973978 DOI: 10.1038/s41586-024-07510-0] [Show More Authors] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 05/02/2024] [Indexed: 09/06/2024]
Abstract
Our genomes influence nearly every aspect of human biology-from molecular and cellular functions to phenotypes in health and disease. Studying the differences in DNA sequence between individuals (genomic variation) could reveal previously unknown mechanisms of human biology, uncover the basis of genetic predispositions to diseases, and guide the development of new diagnostic tools and therapeutic agents. Yet, understanding how genomic variation alters genome function to influence phenotype has proved challenging. To unlock these insights, we need a systematic and comprehensive catalogue of genome function and the molecular and cellular effects of genomic variants. Towards this goal, the Impact of Genomic Variation on Function (IGVF) Consortium will combine approaches in single-cell mapping, genomic perturbations and predictive modelling to investigate the relationships among genomic variation, genome function and phenotypes. IGVF will create maps across hundreds of cell types and states describing how coding variants alter protein activity, how noncoding variants change the regulation of gene expression, and how such effects connect through gene-regulatory and protein-interaction networks. These experimental data, computational predictions and accompanying standards and pipelines will be integrated into an open resource that will catalyse community efforts to explore how our genomes influence biology and disease across populations.
Collapse
|
14
|
Choi KW, Tubbs JD, Lee YH, He Y, Tsuo K, Yohannes MT, Nkambule LL, Madsen E, Ghimire DJ, Hermosilla S, Ge T, Martin AR, Axinn WG, Smoller JW. Genetic architecture and socio-environmental risk factors for major depressive disorder in Nepal. Psychol Med 2024; 54:3126-3134. [PMID: 39282852 PMCID: PMC12050005 DOI: 10.1017/s0033291724001284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/26/2024]
Abstract
BACKGROUND Major depressive disorder (MDD) is the leading cause of disability globally, with moderate heritability and well-established socio-environmental risk factors. Genetic studies have been mostly restricted to European settings, with polygenic scores (PGS) demonstrating low portability across diverse global populations. METHODS This study examines genetic architecture, polygenic prediction, and socio-environmental correlates of MDD in a family-based sample of 10 032 individuals from Nepal with array genotyping data. We used genome-based restricted maximum likelihood to estimate heritability, applied S-LDXR to estimate the cross-ancestry genetic correlation between Nepalese and European samples, and modeled PGS trained on a GWAS meta-analysis of European and East Asian ancestry samples. RESULTS We estimated the narrow-sense heritability of lifetime MDD in Nepal to be 0.26 (95% CI 0.18-0.34, p = 8.5 × 10-6). Our analysis was underpowered to estimate the cross-ancestry genetic correlation (rg = 0.26, 95% CI -0.29 to 0.81). MDD risk was associated with higher age (beta = 0.071, 95% CI 0.06-0.08), female sex (beta = 0.160, 95% CI 0.15-0.17), and childhood exposure to potentially traumatic events (beta = 0.050, 95% CI 0.03-0.07), while neither the depression PGS (beta = 0.004, 95% CI -0.004 to 0.01) or its interaction with childhood trauma (beta = 0.007, 95% CI -0.01 to 0.03) were strongly associated with MDD. CONCLUSIONS Estimates of lifetime MDD heritability in this Nepalese sample were similar to previous European ancestry samples, but PGS trained on European data did not predict MDD in this sample. This may be due to differences in ancestry-linked causal variants, differences in depression phenotyping between the training and target data, or setting-specific environmental factors that modulate genetic effects. Additional research among under-represented global populations will ensure equitable translation of genomic findings.
Collapse
Affiliation(s)
- Karmel W. Choi
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston Massachusetts, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston Massachusetts, USA
- Stanley Center for Psychiatric Research, Broad Institute, Boston Massachusetts, USA
| | - Justin D. Tubbs
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston Massachusetts, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston Massachusetts, USA
- Stanley Center for Psychiatric Research, Broad Institute, Boston Massachusetts, USA
| | - Younga H. Lee
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston Massachusetts, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston Massachusetts, USA
- Stanley Center for Psychiatric Research, Broad Institute, Boston Massachusetts, USA
| | - Yixuan He
- Stanley Center for Psychiatric Research, Broad Institute, Boston Massachusetts, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kristin Tsuo
- Stanley Center for Psychiatric Research, Broad Institute, Boston Massachusetts, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mary T. Yohannes
- Stanley Center for Psychiatric Research, Broad Institute, Boston Massachusetts, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston Massachusetts, USA
| | - Lethukuthula L. Nkambule
- Stanley Center for Psychiatric Research, Broad Institute, Boston Massachusetts, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston Massachusetts, USA
| | - Emily Madsen
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston Massachusetts, USA
| | - Dirgha J. Ghimire
- Population Studies Center, Institute for Social Research, University of Michigan, Ann Arbor Michigan, USA
| | - Sabrina Hermosilla
- Population Studies Center, Institute for Social Research, University of Michigan, Ann Arbor Michigan, USA
- Department of Population and Family Health, Mailman School of Public Health, Columbia University Irving Medical Center, New York New York, USA
| | - Tian Ge
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston Massachusetts, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston Massachusetts, USA
- Stanley Center for Psychiatric Research, Broad Institute, Boston Massachusetts, USA
| | - Alicia R. Martin
- Stanley Center for Psychiatric Research, Broad Institute, Boston Massachusetts, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston Massachusetts, USA
| | - William G. Axinn
- Population Studies Center, Institute for Social Research, University of Michigan, Ann Arbor Michigan, USA
| | - Jordan W. Smoller
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston Massachusetts, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston Massachusetts, USA
- Stanley Center for Psychiatric Research, Broad Institute, Boston Massachusetts, USA
| |
Collapse
|
15
|
Pozarickij A, Gan W, Lin K, Clarke R, Fairhurst-Hunter Z, Koido M, Kanai M, Okada Y, Kamatani Y, Bennett D, Du H, Chen Y, Yang L, Avery D, Guo Y, Yu M, Yu C, Schmidt Valle D, Lv J, Chen J, Peto R, Collins R, Li L, Chen Z, Millwood IY, Walters RG. Causal relevance of different blood pressure traits on risk of cardiovascular diseases: GWAS and Mendelian randomisation in 100,000 Chinese adults. Nat Commun 2024; 15:6265. [PMID: 39048560 PMCID: PMC11269703 DOI: 10.1038/s41467-024-50297-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 07/04/2024] [Indexed: 07/27/2024] Open
Abstract
Elevated blood pressure (BP) is major risk factor for cardiovascular diseases (CVD). Genome-wide association studies (GWAS) conducted predominantly in populations of European ancestry have identified >2,000 BP-associated loci, but other ancestries have been less well-studied. We conducted GWAS of systolic, diastolic, pulse, and mean arterial BP in 100,453 Chinese adults. We identified 128 non-overlapping loci associated with one or more BP traits, including 74 newly-reported associations. Despite strong genetic correlations between populations, we identified appreciably higher heritability and larger variant effect sizes in Chinese compared with European or Japanese ancestry populations. Using instruments derived from these GWAS, multivariable Mendelian randomisation demonstrated that BP traits contribute differently to the causal associations of BP with CVD. In particular, only pulse pressure was independently causally associated with carotid plaque. These findings reinforce the need for studies in diverse populations to understand the genetic determinants of BP traits and their roles in disease risk.
Collapse
Affiliation(s)
- Alfred Pozarickij
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Wei Gan
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
- Human Genetics Centre of Excellence, Novo Nordisk Research Centre Oxford, Innovation Building, Old Road Campus, Oxford, UK
| | - Kuang Lin
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Robert Clarke
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Zammy Fairhurst-Hunter
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Masaru Koido
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, 565-0871, Japan
- Department of Genome Informatics, Graduate School of Medicine, University of Tokyo, Tokyo, 113-0033, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Kanagawa, 230- 0045, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, 565-0871, Japan
| | - Yoichiro Kamatani
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Derrick Bennett
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Huaidong Du
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Yiping Chen
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Ling Yang
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Daniel Avery
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Yu Guo
- National Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, 100037, Beijing, China
| | - Min Yu
- Zhejiang CDC, Zhejiang, China
| | - Canqing Yu
- Department of Epidemiology & Biostatistics, School of Public Health, Peking University, Xueyuan Road, Haidian District, 100191, Beijing, China
- Peking University Center for Public Health and Epidemic Preparedness and Response, 100191, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, 100191, Beijing, China
| | - Dan Schmidt Valle
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Jun Lv
- Department of Epidemiology & Biostatistics, School of Public Health, Peking University, Xueyuan Road, Haidian District, 100191, Beijing, China
- Peking University Center for Public Health and Epidemic Preparedness and Response, 100191, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, 100191, Beijing, China
| | - Junshi Chen
- China National Center For Food Safety Risk Assessment, Beijing, China
| | - Richard Peto
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Rory Collins
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Liming Li
- Department of Epidemiology & Biostatistics, School of Public Health, Peking University, Xueyuan Road, Haidian District, 100191, Beijing, China.
- Peking University Center for Public Health and Epidemic Preparedness and Response, 100191, Beijing, China.
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, 100191, Beijing, China.
| | - Zhengming Chen
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Iona Y Millwood
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Robin G Walters
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK.
| |
Collapse
|
16
|
Hou K, Xu Z, Ding Y, Mandla R, Shi Z, Boulier K, Harpak A, Pasaniuc B. Calibrated prediction intervals for polygenic scores across diverse contexts. Nat Genet 2024; 56:1386-1396. [PMID: 38886587 PMCID: PMC11465192 DOI: 10.1038/s41588-024-01792-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 05/08/2024] [Indexed: 06/20/2024]
Abstract
Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields. We show that PGS performance varies broadly across contexts and biobanks. Contexts such as age, sex and income can impact PGS accuracy with similar magnitudes as genetic ancestry. Here we introduce an approach (CalPred) that models all contexts jointly to produce prediction intervals that vary across contexts to achieve calibration (include the trait with 90% probability), whereas existing methods are miscalibrated. In analyses of 72 traits across large and diverse biobanks (All of Us and UK Biobank), we find that prediction intervals required adjustment by up to 80% for quantitative traits. For disease traits, PGS-based predictions were miscalibrated across socioeconomic contexts such as annual household income levels, further highlighting the need of accounting for context information in PGS-based prediction across diverse populations.
Collapse
Affiliation(s)
- Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA.
| | - Ziqi Xu
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Ravi Mandla
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Zhuozheng Shi
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Kristin Boulier
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Arbel Harpak
- Department of Population Health, The University of Texas at Austin, Austin, TX, USA
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA.
- Institute for Precision Health, University of California Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
17
|
Kim A, Zhang Z, Legros C, Lu Z, de Smith A, Moore JE, Mancuso N, Gazal S. Inferring causal cell types of human diseases and risk variants from candidate regulatory elements. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.17.24307556. [PMID: 38798383 PMCID: PMC11118635 DOI: 10.1101/2024.05.17.24307556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
The heritability of human diseases is extremely enriched in candidate regulatory elements (cRE) from disease-relevant cell types. Critical next steps are to infer which and how many cell types are truly causal for a disease (after accounting for co-regulation across cell types), and to understand how individual variants impact disease risk through single or multiple causal cell types. Here, we propose CT-FM and CT-FM-SNP, two methods that leverage cell-type-specific cREs to fine-map causal cell types for a trait and for its candidate causal variants, respectively. We applied CT-FM to 63 GWAS summary statistics (average N = 417K) using nearly one thousand cRE annotations, primarily coming from ENCODE4. CT-FM inferred 81 causal cell types with corresponding SNP-annotations explaining a high fraction of trait SNP-heritability (~2/3 of the SNP-heritability explained by existing cREs), identified 16 traits with multiple causal cell types, highlighted cell-disease relationships consistent with known biology, and uncovered previously unexplored cellular mechanisms in psychiatric and immune-related diseases. Finally, we applied CT-FM-SNP to 39 UK Biobank traits and predicted high confidence causal cell types for 2,798 candidate causal non-coding SNPs. Our results suggest that most SNPs impact a phenotype through a single cell type, and that pleiotropic SNPs target different cell types depending on the phenotype context. Altogether, CT-FM and CT-FM-SNP shed light on how genetic variants act collectively and individually at the cellular level to impact disease risk.
Collapse
Affiliation(s)
- Artem Kim
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Zixuan Zhang
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Come Legros
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Zeyun Lu
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Adam de Smith
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Jill E Moore
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Nicholas Mancuso
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Steven Gazal
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
18
|
Momin MM, Zhou X, Hyppönen E, Benyamin B, Lee SH. Cross-ancestry genetic architecture and prediction for cholesterol traits. Hum Genet 2024; 143:635-648. [PMID: 38536467 DOI: 10.1007/s00439-024-02660-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 02/13/2024] [Indexed: 05/18/2024]
Abstract
While cholesterol is essential, a high level of cholesterol is associated with the risk of cardiovascular diseases. Genome-wide association studies (GWASs) have proven successful in identifying genetic variants that are linked to cholesterol levels, predominantly in white European populations. However, the extent to which genetic effects on cholesterol vary across different ancestries remains largely unexplored. Here, we estimate cross-ancestry genetic correlation to address questions on how genetic effects are shared across ancestries. We find significant genetic heterogeneity between ancestries for cholesterol traits. Furthermore, we demonstrate that single nucleotide polymorphisms (SNPs) with concordant effects across ancestries for cholesterol are more frequently found in regulatory regions compared to other genomic regions. Indeed, the positive genetic covariance between ancestries is mostly driven by the effects of the concordant SNPs, whereas the genetic heterogeneity is attributed to the discordant SNPs. We also show that the predictive ability of the concordant SNPs is significantly higher than the discordant SNPs in the cross-ancestry polygenic prediction. The list of concordant SNPs for cholesterol is available in GWAS Catalog. These findings have relevance for the understanding of shared genetic architecture across ancestries, contributing to the development of clinical strategies for polygenic prediction of cholesterol in cross-ancestral settings.
Collapse
Affiliation(s)
- Md Moksedul Momin
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- Department of Genetics and Animal Breeding, Faculty of Veterinary Medicine, Chattogram Veterinary and Animal Sciences University (CVASU), Khulshi, Chattogram, 4225, Bangladesh.
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia.
| | - Xuan Zhou
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
| | - Elina Hyppönen
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Clinical and Health Sciences, University of South Australia, Adelaide, SA, Australia
| | - Beben Benyamin
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
| | - S Hong Lee
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia.
| |
Collapse
|
19
|
Durvasula A, Price AL. Distinct explanations underlie gene-environment interactions in the UK Biobank. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.09.22.23295969. [PMID: 37790574 PMCID: PMC10543037 DOI: 10.1101/2023.09.22.23295969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
The role of gene-environment (GxE) interaction in disease and complex trait architectures is widely hypothesized, but currently unknown. Here, we apply three statistical approaches to quantify and distinguish three different types of GxE interaction for a given trait and E variable. First, we detect locus-specific GxE interaction by testing for genetic correlation r g < 1 across E bins. Second, we detect genome-wide effects of the E variable on genetic variance by leveraging polygenic risk scores (PRS) to test for significant PRSxE in a regression of phenotypes on PRS, E, and PRSxE, together with differences in SNP-heritability across E bins. Third, we detect genome-wide proportional amplification of genetic and environmental effects as a function of the E variable by testing for significant PRSxE with no differences in SNP-heritability across E bins. Simulations show that these approaches achieve high sensitivity and specificity in distinguishing these three GxE scenarios. We applied our framework to 33 UK Biobank traits (25 quantitative traits and 8 diseases; average N = 325 K ) and 10 E variables spanning lifestyle, diet, and other environmental exposures. First, we identified 19 trait-E pairs with r g significantly < 1 (FDR<5%) (average r g = 0.95 ); for example, white blood cell count had r g = 0.95 (s.e. 0.01) between smokers and non-smokers. Second, we identified 28 trait-E pairs with significant PRSxE and significant SNP-heritability differences across E bins; for example, BMI had a significant PRSxE for physical activity (P=4.6e-5) with 5% larger SNP-heritability in the largest versus smallest quintiles of physical activity (P=7e-4). Third, we identified 15 trait-E pairs with significant PRSxE with no SNP-heritability differences across E bins; for example, waist-hip ratio adjusted for BMI had a significant PRSxE effect for time spent watching television (P=5e-3) with no SNP-heritability differences. Across the three scenarios, 8 of the trait-E pairs involved disease traits, whose interpretation is complicated by scale effects. Analyses using biological sex as the E variable produced additional significant findings in each of the three scenarios. Overall, we infer a significant contribution of GxE and GxSex effects to complex trait and disease variance.
Collapse
Affiliation(s)
- Arun Durvasula
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Genetics, Harvard Medical School, Cambridge, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Alkes L Price
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
20
|
Lu Z, Wang X, Carr M, Kim A, Gazal S, Mohammadi P, Wu L, Gusev A, Pirruccello J, Kachuri L, Mancuso N. Improved multi-ancestry fine-mapping identifies cis-regulatory variants underlying molecular traits and disease risk. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.15.24305836. [PMID: 38699369 PMCID: PMC11065034 DOI: 10.1101/2024.04.15.24305836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Multi-ancestry statistical fine-mapping of cis-molecular quantitative trait loci (cis-molQTL) aims to improve the precision of distinguishing causal cis-molQTLs from tagging variants. However, existing approaches fail to reflect shared genetic architectures. To solve this limitation, we present the Sum of Shared Single Effects (SuShiE) model, which leverages LD heterogeneity to improve fine-mapping precision, infer cross-ancestry effect size correlations, and estimate ancestry-specific expression prediction weights. We apply SuShiE to mRNA expression measured in PBMCs (n=956) and LCLs (n=814) together with plasma protein levels (n=854) from individuals of diverse ancestries in the TOPMed MESA and GENOA studies. We find SuShiE fine-maps cis-molQTLs for 16% more genes compared with baselines while prioritizing fewer variants with greater functional enrichment. SuShiE infers highly consistent cis-molQTL architectures across ancestries on average; however, we also find evidence of heterogeneity at genes with predicted loss-of-function intolerance, suggesting that environmental interactions may partially explain differences in cis-molQTL effect sizes across ancestries. Lastly, we leverage estimated cis-molQTL effect-sizes to perform individual-level TWAS and PWAS on six white blood cell-related traits in AOU Biobank individuals (n=86k), and identify 44 more genes compared with baselines, further highlighting its benefits in identifying genes relevant for complex disease risk. Overall, SuShiE provides new insights into the cis-genetic architecture of molecular traits.
Collapse
Affiliation(s)
- Zeyun Lu
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Xinran Wang
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Matthew Carr
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Artem Kim
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Steven Gazal
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA
| | - Pejman Mohammadi
- Center for Immunity and Immunotherapies, Seattle Children’s Research Institute, Seattle, WA, USA
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaiʻi Cancer Center, University of Hawaiʻi at Mānoa, Honolulu, HI, USA
| | - Alexander Gusev
- Harvard Medical School and Dana-Farber Cancer Institute, Boston, MA, USA
| | - James Pirruccello
- Division of Cardiology, University of California San Francisco, San Francisco, CA, USA
| | - Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA
| |
Collapse
|
21
|
Hui D, Dudek S, Kiryluk K, Walunas TL, Kullo IJ, Wei WQ, Tiwari HK, Peterson JF, Chung WK, Davis B, Khan A, Kottyan L, Limdi NA, Feng Q, Puckelwartz MJ, Weng C, Smith JL, Karlson EW, Regeneron Genetics Center, Jarvik GP, Ritchie MD. Risk factors affecting polygenic score performance across diverse cohorts. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.05.10.23289777. [PMID: 38645167 PMCID: PMC11030495 DOI: 10.1101/2023.05.10.23289777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Apart from ancestry, personal or environmental covariates may contribute to differences in polygenic score (PGS) performance. We analyzed effects of covariate stratification and interaction on body mass index (BMI) PGS (PGSBMI) across four cohorts of European (N=491,111) and African (N=21,612) ancestry. Stratifying on binary covariates and quintiles for continuous covariates, 18/62 covariates had significant and replicable R2 differences among strata. Covariates with the largest differences included age, sex, blood lipids, physical activity, and alcohol consumption, with R2 being nearly double between best and worst performing quintiles for certain covariates. 28 covariates had significant PGSBMI-covariate interaction effects, modifying PGSBMI effects by nearly 20% per standard deviation change. We observed overlap between covariates that had significant R2 differences among strata and interaction effects - across all covariates, their main effects on BMI were correlated with their maximum R2 differences and interaction effects (0.56 and 0.58, respectively), suggesting high-PGSBMI individuals have highest R2 and increase in PGS effect. Using quantile regression, we show the effect of PGSBMI increases as BMI itself increases, and that these differences in effects are directly related to differences in R2 when stratifying by different covariates. Given significant and replicable evidence for context-specific PGSBMI performance and effects, we investigated ways to increase model performance taking into account non-linear effects. Machine learning models (neural networks) increased relative model R2 (mean 23%) across datasets. Finally, creating PGSBMI directly from GxAge GWAS effects increased relative R2 by 7.8%. These results demonstrate that certain covariates, especially those most associated with BMI, significantly affect both PGSBMI performance and effects across diverse cohorts and ancestries, and we provide avenues to improve model performance that consider these effects.
Collapse
Affiliation(s)
- Daniel Hui
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Scott Dudek
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Krzysztof Kiryluk
- Division of Nephrology, Department of Medicine, Columbia University, NY, New York
| | - Theresa L. Walunas
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | | | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Hemant K. Tiwari
- Department of Pediatrics, University of Alabama at Birmingham, Birmingham, AL
| | - Josh F. Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Wendy K. Chung
- Departments of Pediatrics and Medicine, Columbia University Irving Medical Center, Columbia University, New York, NY
| | - Brittney Davis
- Department of Neurology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL
| | - Atlas Khan
- Division of Nephrology, Department of Medicine, Columbia University, NY, New York
| | - Leah Kottyan
- The Center for Autoimmune Genomics and Etiology, Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH
| | - Nita A. Limdi
- Department of Neurology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL
| | - Qiping Feng
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Megan J. Puckelwartz
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL
| | - Chunhua Weng
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY
| | - Johanna L. Smith
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN
| | - Elizabeth W. Karlson
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
| | | | - Gail P. Jarvik
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington Medical Center, Seattle, WA
| | - Marylyn D. Ritchie
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
22
|
Timmins IR, The PRACTICAL Consortium, Dudbridge F. Bayesian approach to assessing population differences in genetic risk of disease with application to prostate cancer. PLoS Genet 2024; 20:e1011212. [PMID: 38630784 PMCID: PMC11023298 DOI: 10.1371/journal.pgen.1011212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 03/07/2024] [Indexed: 04/19/2024] Open
Abstract
Population differences in risk of disease are common, but the potential genetic basis for these differences is not well understood. A standard approach is to compare genetic risk across populations by testing for mean differences in polygenic scores, but existing studies that use this approach do not account for statistical noise in effect estimates (i.e., the GWAS betas) that arise due to the finite sample size of GWAS training data. Here, we show using Bayesian polygenic score methods that the level of uncertainty in estimates of genetic risk differences across populations is highly dependent on the GWAS training sample size, the polygenicity (number of causal variants), and genetic distance (FST) between the populations considered. We derive a Wald test for formally assessing the difference in genetic risk across populations, which we show to have calibrated type 1 error rates under a simplified assumption that all SNPs are independent, which we achieve in practise using linkage disequilibrium (LD) pruning. We further provide closed-form expressions for assessing the uncertainty in estimates of relative genetic risk across populations under the special case of an infinitesimal genetic architecture. We suggest that for many complex traits and diseases, particularly those with more polygenic architectures, current GWAS sample sizes are insufficient to detect moderate differences in genetic risk across populations, though more substantial differences in relative genetic risk (relative risk > 1.5) can be detected. We show that conventional approaches that do not account for sampling error from the training sample, such as using a simple t-test, have very high type 1 error rates. When applying our approach to prostate cancer, we demonstrate a higher genetic risk in African Ancestry men, with lower risk in men of European followed by East Asian ancestry.
Collapse
Affiliation(s)
- Iain R. Timmins
- Department of Population Health Sciences, University of Leicester, Leicester, United Kingdom
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, United Kingdom
- Statistical Innovation, AstraZeneca, Cambridge, United Kingdom
| | | | - Frank Dudbridge
- Department of Population Health Sciences, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
23
|
Hatton AA, Cheng FF, Lin T, Shen RJ, Chen J, Zheng Z, Qu J, Lyu F, Harris SE, Cox SR, Jin ZB, Martin NG, Fan D, Montgomery GW, Yang J, Wray NR, Marioni RE, Visscher PM, McRae AF. Genetic control of DNA methylation is largely shared across European and East Asian populations. Nat Commun 2024; 15:2713. [PMID: 38548728 PMCID: PMC10978881 DOI: 10.1038/s41467-024-47005-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 03/15/2024] [Indexed: 04/01/2024] Open
Abstract
DNA methylation is an ideal trait to study the extent of the shared genetic control across ancestries, effectively providing hundreds of thousands of model molecular traits with large QTL effect sizes. We investigate cis DNAm QTLs in three European (n = 3701) and two East Asian (n = 2099) cohorts to quantify the similarities and differences in the genetic architecture across populations. We observe 80,394 associated mQTLs (62.2% of DNAm probes with significant mQTL) to be significant in both ancestries, while 28,925 mQTLs (22.4%) are identified in only a single ancestry. mQTL effect sizes are highly conserved across populations, with differences in mQTL discovery likely due to differences in allele frequency of associated variants and differing linkage disequilibrium between causal variants and assayed SNPs. This study highlights the overall similarity of genetic control across ancestries and the value of ancestral diversity in increasing the power to detect associations and enhancing fine mapping resolution.
Collapse
Affiliation(s)
- Alesha A Hatton
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Fei-Fei Cheng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
- School of Life Sciences, Westlake University, Hangzhou, 310030, Zhejiang, China
| | - Tian Lin
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Ren-Juan Shen
- Beijing Institute of Ophthalmology, Beijing Tongren Hospital, Capital Medical University, 100008, Beijing, China
- School of Ophthalmology & Optometry, Wenzhou Medical University, Wenzhou, 325027, China
| | - Jie Chen
- School of Ophthalmology & Optometry, Wenzhou Medical University, Wenzhou, 325027, China
| | - Zhili Zheng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Jia Qu
- School of Ophthalmology & Optometry, Wenzhou Medical University, Wenzhou, 325027, China
| | - Fan Lyu
- School of Ophthalmology & Optometry, Wenzhou Medical University, Wenzhou, 325027, China
| | - Sarah E Harris
- Lothian Birth Cohorts, Department of Psychology, University of Edinburgh, Edinburgh, EH8 9JZ, UK
| | - Simon R Cox
- Lothian Birth Cohorts, Department of Psychology, University of Edinburgh, Edinburgh, EH8 9JZ, UK
| | - Zi-Bing Jin
- Beijing Institute of Ophthalmology, Beijing Tongren Hospital, Capital Medical University, 100008, Beijing, China
- School of Ophthalmology & Optometry, Wenzhou Medical University, Wenzhou, 325027, China
| | - Nicholas G Martin
- Queensland Institute of Medical Research Berghofer, Brisbane, QLD, 4006, Australia
| | - Dongsheng Fan
- Department of Neurology, Peking University Third Hospital, 100191, Beijing, China
| | - Grant W Montgomery
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Jian Yang
- School of Life Sciences, Westlake University, Hangzhou, 310030, Zhejiang, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, 310024, Zhejiang, China
| | - Naomi R Wray
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Riccardo E Marioni
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Peter M Visscher
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Allan F McRae
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia.
| |
Collapse
|
24
|
Stassen HH, Bachmann S, Bridler R, Cattapan K, Hartmann AM, Rujescu D, Seifritz E, Weisbrod M, Scharfetter C. Analysis of genetic diversity in patients with major psychiatric disorders versus healthy controls: A molecular-genetic study of 1698 subjects genotyped for 100 candidate genes (549 SNPs). Psychiatry Res 2024; 333:115720. [PMID: 38224633 DOI: 10.1016/j.psychres.2024.115720] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 12/11/2023] [Accepted: 01/03/2024] [Indexed: 01/17/2024]
Abstract
BACKGROUND This study analyzed the extent to which irregularities in genetic diversity separate psychiatric patients from healthy controls. METHODS Genetic diversity was quantified through multidimensional "gene vectors" assembled from 4 to 8 polymorphic SNPs located within each of 100 candidate genes. The number of different genotypic patterns observed per gene was called the gene's "diversity index". RESULTS The diversity indices were found to be only weakly correlated with their constituent number of SNPs (20.5 % explained variance), thus suggesting that genetic diversity is an intrinsic gene property that has evolved over the course of evolution. Significant deviations from "normal" diversity values were found for (1) major depression; (2) Alzheimer's disease; and (3) schizoaffective disorders. Almost one third of the genes were correlated with each other, with correlations ranging from 0.0303 to 0.7245. The central finding of this study was the discovery of "singular genes" characterized by distinctive genotypic patterns that appeared exclusively in patients but not in healthy controls. Neural Nets yielded nonlinear classifiers that correctly identified up to 90 % of patients. Overlaps between diagnostic subgroups on the genotype level suggested that (1) diagnoses-crossing vulnerabilities are likely involved in the pathogenesis of major psychiatric disorders; (2) clinically defined diagnoses may not constitute etiological entities. CONCLUSION Detailed analyses of the variation of genotypic patterns in genes along with the correlation between genes lead to nonlinear classifiers that enable very robust separation between psychiatric patients and healthy controls on the genotype level.
Collapse
Affiliation(s)
- H H Stassen
- Institute for Response-Genetics, Department of Psychiatry, Psychotherapy and Psychosomatics, Psychiatric University Hospital, Zurich CH-8032, Switzerland.
| | - S Bachmann
- Department of Psychiatry, Psychotherapy, and Psychosomatics, University of Halle, Halle D-06112, Germany; Clienia AG, Psychiatric Hospital, Littenheid CH-9573, Switzerland; Department of Psychiatry, Geneva University Hospitals, Thônex CH-1226, Switzerland
| | - R Bridler
- Sanatorium Kilchberg, Kilchberg CH-8802, Switzerland
| | - K Cattapan
- Sanatorium Kilchberg, Kilchberg CH-8802, Switzerland; University Hospital of Psychiatry and Psychotherapy, University of Bern, Bern CH-3012, Switzerland
| | - A M Hartmann
- Clinical Division of General Psychiatry, Medical University of Vienna, Wien A-1090, Austria
| | - D Rujescu
- Clinical Division of General Psychiatry, Medical University of Vienna, Wien A-1090, Austria
| | - E Seifritz
- Department of Psychiatry, Psychotherapy and Psychosomatics, Psychiatric University Hospital, Zurich CH-8032, Switzerland
| | - M Weisbrod
- Department of General Psychiatry, Center of Psychosocial Medicine, University of Heidelberg, Heidelberg D-69115, Germany; SRH Hospital Karlsbad-Langensteinbach, Karlsbad D-76307, Germany
| | - Chr Scharfetter
- Institute for Response-Genetics, Department of Psychiatry, Psychotherapy and Psychosomatics, Psychiatric University Hospital, Zurich CH-8032, Switzerland
| |
Collapse
|
25
|
Chen TT, Kim J, Lam M, Chuang YF, Chiu YL, Lin SC, Jung SH, Kim B, Kim S, Cho C, Shim I, Park S, Ahn Y, Okbay A, Jang H, Kim HJ, Seo SW, Park WY, Ge T, Huang H, Feng YCA, Lin YF, Myung W, Chen CY, Won HH. Shared genetic architectures of educational attainment in East Asian and European populations. Nat Hum Behav 2024; 8:562-575. [PMID: 38182883 PMCID: PMC10963262 DOI: 10.1038/s41562-023-01781-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Accepted: 11/09/2023] [Indexed: 01/07/2024]
Abstract
Educational attainment (EduYears), a heritable trait often used as a proxy for cognitive ability, is associated with various health and social outcomes. Previous genome-wide association studies (GWASs) on EduYears have been focused on samples of European (EUR) genetic ancestries. Here we present the first large-scale GWAS of EduYears in people of East Asian (EAS) ancestry (n = 176,400) and conduct a cross-ancestry meta-analysis with EduYears GWAS in people of EUR ancestry (n = 766,345). EduYears showed a high genetic correlation and power-adjusted transferability ratio between EAS and EUR. We also found similar functional enrichment, gene expression enrichment and cross-trait genetic correlations between two populations. Cross-ancestry fine-mapping identified refined credible sets with a higher posterior inclusion probability than single population fine-mapping. Polygenic prediction analysis in four independent EAS and EUR cohorts demonstrated transferability between populations. Our study supports the need for further research on diverse ancestries to increase our understanding of the genetic basis of educational attainment.
Collapse
Affiliation(s)
- Tzu-Ting Chen
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli, Taiwan
| | - Jaeyoung Kim
- Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, South Korea
- Department of Neuropsychiatry, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Max Lam
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Human Genetics, Genome Institute of Singapore, Singapore, Singapore
- Division of Psychiatry Research, the Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, USA
- Research Division Institute of Mental Health Singapore, Singapore, Singapore
| | - Yi-Fang Chuang
- Institute of Public Health and International Health Program, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Yen-Ling Chiu
- Graduate Institute of Medicine, Yuan Ze University, Taoyuan City, Taiwan
- Department of Medical Research, Far Eastern Memorial Hospital, New Taipei City, Taiwan
| | - Shu-Chin Lin
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli, Taiwan
| | - Sang-Hyuk Jung
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Beomsu Kim
- Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, South Korea
| | - Soyeon Kim
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, the Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Chamlee Cho
- Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, South Korea
| | - Injeong Shim
- Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, South Korea
| | - Sanghyeon Park
- Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, South Korea
| | - Yeeun Ahn
- Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, South Korea
| | - Aysu Okbay
- Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Hyemin Jang
- Departments of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
- Alzheimer's Disease Convergence Research Center, Samsung Medical Center, Seoul, South Korea
| | - Hee Jin Kim
- Departments of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
- Alzheimer's Disease Convergence Research Center, Samsung Medical Center, Seoul, South Korea
| | - Sang Won Seo
- Departments of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
- Alzheimer's Disease Convergence Research Center, Samsung Medical Center, Seoul, South Korea
| | - Woong-Yang Park
- Samsung Genome Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Tian Ge
- Stanley Center for Psychiatric Research, the Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Hailiang Huang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, the Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yen-Chen Anne Feng
- Institute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, Taipei City, Taiwan
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei City, Taiwan
| | - Yen-Feng Lin
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli, Taiwan.
- Department of Public Health and Medical Humanities, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan.
- Institute of Behavioral Medicine, College of Medicine, National Cheng Kung University, Tainan, Taiwan.
| | - Woojae Myung
- Department of Neuropsychiatry, Seoul National University Bundang Hospital, Seongnam, South Korea.
- Department of Psychiatry, Seoul National University College of Medicine, Seoul, South Korea.
| | | | - Hong-Hee Won
- Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, South Korea.
- Samsung Genome Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea.
| |
Collapse
|
26
|
Kolobkov D, Mishra Sharma S, Medvedev A, Lebedev M, Kosaretskiy E, Vakhitov R. Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project. Front Big Data 2024; 7:1266031. [PMID: 38487517 PMCID: PMC10937521 DOI: 10.3389/fdata.2024.1266031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 01/31/2024] [Indexed: 03/17/2024] Open
Abstract
Combining training data from multiple sources increases sample size and reduces confounding, leading to more accurate and less biased machine learning models. In healthcare, however, direct pooling of data is often not allowed by data custodians who are accountable for minimizing the exposure of sensitive information. Federated learning offers a promising solution to this problem by training a model in a decentralized manner thus reducing the risks of data leakage. Although there is increasing utilization of federated learning on clinical data, its efficacy on individual-level genomic data has not been studied. This study lays the groundwork for the adoption of federated learning for genomic data by investigating its applicability in two scenarios: phenotype prediction on the UK Biobank data and ancestry prediction on the 1000 Genomes Project data. We show that federated models trained on data split into independent nodes achieve performance close to centralized models, even in the presence of significant inter-node heterogeneity. Additionally, we investigate how federated model accuracy is affected by communication frequency and suggest approaches to reduce computational complexity or communication costs.
Collapse
Affiliation(s)
- Dmitry Kolobkov
- GENXT, Hinxton, United Kingdom
- Laboratory of Ecological Genetics, Vavilov Institute of General Genetics, Moscow, Russia
| | - Satyarth Mishra Sharma
- GENXT, Hinxton, United Kingdom
- Center for Artificial Intelligence Technology, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Aleksandr Medvedev
- GENXT, Hinxton, United Kingdom
- Center for Artificial Intelligence Technology, Skolkovo Institute of Science and Technology, Moscow, Russia
| | | | | | | |
Collapse
|
27
|
Kachuri L, Chatterjee N, Hirbo J, Schaid DJ, Martin I, Kullo IJ, Kenny EE, Pasaniuc B, Witte JS, Ge T. Principles and methods for transferring polygenic risk scores across global populations. Nat Rev Genet 2024; 25:8-25. [PMID: 37620596 PMCID: PMC10961971 DOI: 10.1038/s41576-023-00637-2] [Citation(s) in RCA: 117] [Impact Index Per Article: 117.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2023] [Indexed: 08/26/2023]
Abstract
Polygenic risk scores (PRSs) summarize the genetic predisposition of a complex human trait or disease and may become a valuable tool for advancing precision medicine. However, PRSs that are developed in populations of predominantly European genetic ancestries can increase health disparities due to poor predictive performance in individuals of diverse and complex genetic ancestries. We describe genetic and modifiable risk factors that limit the transferability of PRSs across populations and review the strengths and weaknesses of existing PRS construction methods for diverse ancestries. Developing PRSs that benefit global populations in research and clinical settings provides an opportunity for innovation and is essential for health equity.
Collapse
Affiliation(s)
- Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Jibril Hirbo
- Department of Medicine Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Daniel J Schaid
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Iman Martin
- Division of Genomic Medicine, National Human Genome Research Institute, Bethesda, MD, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bogdan Pasaniuc
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - John S Witte
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA.
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Department of Genetics, Stanford University, Stanford, CA, USA.
| | - Tian Ge
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
28
|
Chen CY, Chen TT, Feng YCA, Yu M, Lin SC, Longchamps RJ, Wang SH, Hsu YH, Yang HI, Kuo PH, Daly MJ, Chen WJ, Huang H, Ge T, Lin YF. Analysis across Taiwan Biobank, Biobank Japan, and UK Biobank identifies hundreds of novel loci for 36 quantitative traits. CELL GENOMICS 2023; 3:100436. [PMID: 38116116 PMCID: PMC10726425 DOI: 10.1016/j.xgen.2023.100436] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 11/21/2021] [Accepted: 10/09/2023] [Indexed: 12/21/2023]
Abstract
Genome-wide association studies (GWASs) have identified tens of thousands of genetic loci associated with human complex traits. However, the majority of GWASs were conducted in individuals of European ancestries. Failure to capture global genetic diversity has limited genomic discovery and has impeded equitable delivery of genomic knowledge to diverse populations. Here we report findings from 102,900 individuals across 36 human quantitative traits in the Taiwan Biobank (TWB), a major biobank effort that broadens the population diversity of genetic studies in East Asia. We identified 968 novel genetic loci, pinpointed novel causal variants through statistical fine-mapping, compared the genetic architecture across TWB, Biobank Japan, and UK Biobank, and evaluated the utility of cross-phenotype, cross-population polygenic risk scores in disease risk prediction. These results demonstrated the potential to advance discovery through diversifying GWAS populations and provided insights into the common genetic basis of human complex traits in East Asia.
Collapse
Affiliation(s)
- Chia-Yen Chen
- Biogen, Cambridge, MA 02142, USA; Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Tzu-Ting Chen
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli 35053, Taiwan
| | - Yen-Chen Anne Feng
- Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Public Health & Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei 100025, Taiwan; Institute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, Taipei 100025, Taiwan.
| | - Mingrui Yu
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Shu-Chin Lin
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli 35053, Taiwan
| | - Ryan J Longchamps
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Shi-Heng Wang
- National Center for Geriatrics and Welfare Research, National Health Research Institutes, Miaoli 35053, Taiwan; Department of Public Health, College of Public Health, China Medical University, Taichung 40678, Taiwan
| | - Yi-Hsiang Hsu
- Marcus Institute for Aging Research and Harvard Medical School, Boston, MA 02131, USA; Beth Israel Deaconess Medical Center, Boston, MA 02215, USA; Harvard School of Public Health, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Hwai-I Yang
- Genomics Research Center, Academia Sinica, Taipei 115201, Taiwan; Institute of Clinical Medicine, National Yang-Ming University, Taipei 112304, Taiwan; Doctoral Program of Clinical and Experimental Medicine, National Sun Yat-Sen University, Kaohsiung 80424, Taiwan; Biomedical Translation Research Center, Academia Sinica, Taipei 115021, Taiwan
| | - Po-Hsiu Kuo
- Department of Public Health & Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei 100025, Taiwan; Department of Psychiatry, College of Medicine and National Taiwan University Hospital, Taipei 106319, Taiwan
| | - Mark J Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Institute for Molecular Medicine Finland FIMM, University of Helsinki, 00014 Helsinki, Finland
| | - Wei J Chen
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli 35053, Taiwan; Department of Public Health & Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei 100025, Taiwan; Department of Psychiatry, College of Medicine and National Taiwan University Hospital, Taipei 106319, Taiwan
| | - Hailiang Huang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA.
| | - Tian Ge
- Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA.
| | - Yen-Feng Lin
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli 35053, Taiwan; Department of Public Health & Medical Humanities, School of Medicine, National Yang Ming Chiao Tung University, Taipei 112304, Taiwan; Institute of Behavioral Medicine, College of Medicine, National Cheng Kung University, Tainan 70101, Taiwan.
| |
Collapse
|
29
|
Hassanin E, Lee KH, Hsieh TC, Aldisi R, Lee YL, Bobbili D, Krawitz P, May P, Chen CY, Maj C. Trans-ancestry polygenic models for the prediction of LDL blood levels: an analysis of the United Kingdom Biobank and Taiwan Biobank. Front Genet 2023; 14:1286561. [PMID: 38075701 PMCID: PMC10704094 DOI: 10.3389/fgene.2023.1286561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 10/31/2023] [Indexed: 10/16/2024] Open
Abstract
Polygenic risk score (PRS) predictions often show bias toward the population of available genome-wide association studies (GWASs), which is typically of European ancestry. This study aimed to assess the performance differences of ancestry-specific PRS and test the implementation of multi-ancestry PRS to enhance the generalizability of low-density lipoprotein (LDL) cholesterol predictions in the East Asian (EAS) population. In this study, we computed ancestry-specific and multi-ancestry PRSs for LDL using data obtained from the Global Lipid Genetics Consortium, while accounting for population-specific linkage disequilibrium patterns using the PRS-CSx method in the United Kingdom Biobank dataset (UKB, n = 423,596) and Taiwan Biobank dataset (TWB, n = 68,978). Population-specific PRSs were able to predict LDL levels better within the target population, whereas multi-ancestry PRSs were more generalizable. In the TWB dataset, covariate-adjusted R 2 values were 9.3% for ancestry-specific PRS, 6.7% for multi-ancestry PRS, and 4.5% for European-specific PRS. Similar trends (8.6%, 7.8%, and 6.2%) were observed in the smaller EAS population of the UKB (n = 1,480). Consistent with R 2 values, PRS stratification in EAS regions (TWB) effectively captured a heterogenous variability in LDL blood cholesterol levels across PRS strata. The mean difference in LDL levels between the lowest and highest EAS-specific PRS (EAS_PRS) deciles was 0.82, compared to 0.59 for European-specific PRS (EUR_PRS) and 0.76 for multi-ancestry PRS. Notably, the mean LDL values in the top decile of multi-ancestry PRS were comparable to those of EAS_PRS (3.543 vs. 3.541, p = 0.86). Our analysis of the PRS prediction model for LDL cholesterol further supports the issue of PRS generalizability across populations. Our targeted analysis of the EAS population revealed that integrating non-European genotyping data with a powerful European-based GWAS can enhance the generalizability of LDL PRS.
Collapse
Affiliation(s)
- Emadeldin Hassanin
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Sur-Alzette, Luxembourg
- Institute for Genomic Statistics and Bioinformatics, University of Bonn, Bonn, Germany
| | - Ko-Han Lee
- Taiwan AI Labs and Foundation, Taipei, Taiwan
| | - Tzung-Chien Hsieh
- Institute for Genomic Statistics and Bioinformatics, University of Bonn, Bonn, Germany
| | - Rana Aldisi
- Institute for Genomic Statistics and Bioinformatics, University of Bonn, Bonn, Germany
| | - Yi-Lun Lee
- Taiwan AI Labs and Foundation, Taipei, Taiwan
| | - Dheeraj Bobbili
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Sur-Alzette, Luxembourg
| | - Peter Krawitz
- Institute for Genomic Statistics and Bioinformatics, University of Bonn, Bonn, Germany
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Sur-Alzette, Luxembourg
| | - Chien-Yu Chen
- Taiwan AI Labs and Foundation, Taipei, Taiwan
- Center for Computational and Systems Biology, National Taiwan University, Taipei, Taiwan
- Department of Biomechatronics Engineering, National Taiwan University, Taipei, Taiwan
- Center for Advanced Computing and Imaging in Biomedicine, Natinal Taiwan University, Taipei, Taiwan
| | - Carlo Maj
- Centre for Human Genetics, University of Marburg, Marburg, Germany
| |
Collapse
|
30
|
Tanigawa Y, Kellis M. Power of inclusion: Enhancing polygenic prediction with admixed individuals. Am J Hum Genet 2023; 110:1888-1902. [PMID: 37890495 PMCID: PMC10645553 DOI: 10.1016/j.ajhg.2023.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 09/22/2023] [Accepted: 09/22/2023] [Indexed: 10/29/2023] Open
Abstract
Admixed individuals offer unique opportunities for addressing limited transferability in polygenic scores (PGSs), given the substantial trans-ancestry genetic correlation in many complex traits. However, they are rarely considered in PGS training, given the challenges in representing ancestry-matched linkage-disequilibrium reference panels for admixed individuals. Here we present inclusive PGS (iPGS), which captures ancestry-shared genetic effects by finding the exact solution for penalized regression on individual-level data and is thus naturally applicable to admixed individuals. We validate our approach in a simulation study across 33 configurations with varying heritability, polygenicity, and ancestry composition in the training set. When iPGS is applied to n = 237,055 ancestry-diverse individuals in the UK Biobank, it shows the greatest improvements in Africans by 48.9% on average across 60 quantitative traits and up to 50-fold improvements for some traits (neutrophil count, R2 = 0.058) over the baseline model trained on the same number of European individuals. When we allowed iPGS to use n = 284,661 individuals, we observed an average improvement of 60.8% for African, 11.6% for South Asian, 7.3% for non-British White, 4.8% for White British, and 17.8% for the other individuals. We further developed iPGS+refit to jointly model the ancestry-shared and -dependent genetic effects when heterogeneous genetic associations were present. For neutrophil count, for example, iPGS+refit showed the highest predictive performance in the African group (R2 = 0.115), which exceeds the best predictive performance for the White British group (R2 = 0.090 in the iPGS model), even though only 1.49% of individuals used in the iPGS training are of African ancestry. Our results indicate the power of including diverse individuals for developing more equitable PGS models.
Collapse
Affiliation(s)
- Yosuke Tanigawa
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
31
|
Wang J, Gazal S. Ancestry-specific regulatory and disease architectures are likely due to cell-type-specific gene-by-environment interactions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.20.23297214. [PMID: 37905038 PMCID: PMC10615008 DOI: 10.1101/2023.10.20.23297214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Multi-ancestry genome-wide association studies (GWAS) have highlighted the existence of variants with ancestry-specific effect sizes. Understanding where and why these ancestry-specific effects occur is fundamental to understanding the genetic basis of human diseases and complex traits. Here, we characterized genes differentially expressed across ancestries (ancDE genes) at the cell-type level by leveraging single-cell RNA-seq data in peripheral blood mononuclear cells for 21 individuals with East Asian (EAS) ancestry and 23 individuals with European (EUR) ancestry (172K cells); then, we tested if variants surrounding those genes were enriched in disease variants with ancestry-specific effect sizes by leveraging ancestry-matched GWAS of 31 diseases and complex traits (average N = 90K and 267K in EAS and EUR, respectively). We observed that ancDE genes tend to be cell-type-specific, to be enriched in genes interacting with the environment, and in variants with ancestry-specific disease effect sizes, suggesting the impact of shared cell-type-specific gene-by-environment (GxE) interactions between regulatory and disease architectures. Finally, we illustrated how GxE interactions might have led to ancestry-specific MCL1 expression in B cells, and ancestry-specific allele effect sizes in lymphocyte count GWAS for variants surrounding MCL1. Our results imply that large single-cell and GWAS datasets in diverse populations are required to improve our understanding on the effect of genetic variants on human diseases.
Collapse
Affiliation(s)
- Juehan Wang
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Steven Gazal
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
32
|
Zhang H, Zhan J, Jin J, Zhang J, Lu W, Zhao R, Ahearn TU, Yu Z, O'Connell J, Jiang Y, Chen T, Okuhara D, Garcia-Closas M, Lin X, Koelsch BL, Chatterjee N. A new method for multiancestry polygenic prediction improves performance across diverse populations. Nat Genet 2023; 55:1757-1768. [PMID: 37749244 PMCID: PMC10923245 DOI: 10.1038/s41588-023-01501-z] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 08/16/2023] [Indexed: 09/27/2023]
Abstract
Polygenic risk scores (PRSs) increasingly predict complex traits; however, suboptimal performance in non-European populations raise concerns about clinical applications and health inequities. We developed CT-SLEB, a powerful and scalable method to calculate PRSs, using ancestry-specific genome-wide association study summary statistics from multiancestry training samples, integrating clumping and thresholding, empirical Bayes and superlearning. We evaluated CT-SLEB and nine alternative methods with large-scale simulated genome-wide association studies (~19 million common variants) and datasets from 23andMe, Inc., the Global Lipids Genetics Consortium, All of Us and UK Biobank, involving 5.1 million individuals of diverse ancestry, with 1.18 million individuals from four non-European populations across 13 complex traits. Results demonstrated that CT-SLEB significantly improves PRS performance in non-European populations compared with simple alternatives, with comparable or superior performance to a recent, computationally intensive method. Moreover, our simulation studies offered insights into sample size requirements and SNP density effects on multiancestry risk prediction.
Collapse
Affiliation(s)
- Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | | | - Jin Jin
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Jingning Zhang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Wenxuan Lu
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA
| | - Ruzhang Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Thomas U Ahearn
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Zhi Yu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Tony Chen
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Montserrat Garcia-Closas
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
- Division of Genetics and Epidemiology, Institute of Cancer Research, London, UK
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | | | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
33
|
Corte L, Liou L, O’Reilly PF, García-González J. Trumpet plots: visualizing the relationship between allele frequency and effect size in genetic association studies. GIGABYTE 2023; 2023:gigabyte89. [PMID: 37711278 PMCID: PMC10498096 DOI: 10.46471/gigabyte.89] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 08/29/2023] [Indexed: 09/16/2023] Open
Abstract
Recent advances in genome-wide association and sequencing studies have shown that the genetic architecture of complex traits and diseases involves a combination of rare and common genetic variants distributed throughout the genome. One way to better understand this architecture is to visualize genetic associations across a wide range of allele frequencies. However, there is currently no standardized or consistent graphical representation for effectively illustrating these results. Here we propose a standardized approach for visualizing the effect size of risk variants across the allele frequency spectrum. The proposed plots have a distinctive trumpet shape: with the majority of variants having high frequency and small effects, and a small number of variants having lower frequency and larger effects. To demonstrate the utility of trumpet plots in illustrating the relationship between the number of variants, their frequency, and the magnitude of their effects in shaping the genetic architecture of complex traits and diseases, we generated trumpet plots for more than one hundred traits in the UK Biobank. To facilitate their broader use, we developed an R package, 'TrumpetPlots' (available at the Comprehensive R Archive Network) and R Shiny application, 'Shiny Trumpets' (available at https://juditgg.shinyapps.io/shinytrumpets/) that allows users to explore these results and submit their own data.
Collapse
Affiliation(s)
- Lucia Corte
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA
- Center for Excellence in Youth Education, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA
| | - Lathan Liou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA
| | - Paul F. O’Reilly
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA
| | - Judit García-González
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA
| |
Collapse
|
34
|
Abstract
Admixed populations constitute a large portion of global human genetic diversity, yet they are often left out of genomics analyses. This exclusion is problematic, as it leads to disparities in the understanding of the genetic structure and history of diverse cohorts and the performance of genomic medicine across populations. Admixed populations have particular statistical challenges, as they inherit genomic segments from multiple source populations-the primary reason they have historically been excluded from genetic studies. In recent years, however, an increasing number of statistical methods and software tools have been developed to account for and leverage admixture in the context of genomics analyses. Here, we provide a survey of such computational strategies for the informed consideration of admixture to allow for the well-calibrated inclusion of mixed ancestry populations in large-scale genomics studies, and we detail persisting gaps in existing tools.
Collapse
Affiliation(s)
- Taotao Tan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA;
| | - Elizabeth G Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA;
| |
Collapse
|
35
|
Hou K, Xu Z, Ding Y, Harpak A, Pasaniuc B. Calibrated prediction intervals for polygenic scores across diverse contexts. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.24.23293056. [PMID: 37546999 PMCID: PMC10402211 DOI: 10.1101/2023.07.24.23293056] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields from agriculture to personalized medicine. We analyze data from two large biobanks in the US (All of Us) and the UK (UK Biobank) to find widespread variability in PGS performance across contexts. Many contexts, including age, sex, and income, impact PGS accuracies with similar magnitudes as genetic ancestry. PGSs trained in single versus multi-ancestry cohorts show similar context-specificity in their accuracies. We introduce trait prediction intervals that are allowed to vary across contexts as a principled approach to account for context-specific PGS accuracy in genomic prediction. We model the impact of all contexts in a joint framework to enable PGS-based trait predictions that are well-calibrated (contain the trait value with 90% probability in all contexts), whereas methods that ignore context are mis-calibrated. We show that prediction intervals need to be adjusted for all considered traits ranging from 10% for diastolic blood pressure to 80% for waist circumference. Adjustment of prediction intervals depends on the dataset; for example, prediction intervals for education years need to be adjusted by 90% in All of Us versus 8% in UK Biobank. Our results provide a path forward towards utilization of PGS as a prediction tool across all individuals regardless of their contexts while highlighting the importance of comprehensive profile of context information in study design and data collection.
Collapse
Affiliation(s)
- Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ziqi Xu
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Arbel Harpak
- Department of Population Health, The University of Texas at Austin, Austin, TX, USA
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, Los Angeles
| |
Collapse
|
36
|
The Impact of Genomic Variation on Function (IGVF) Consortium. ARXIV 2023:arXiv:2307.13708v1. [PMID: 37547663 PMCID: PMC10402186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Our genomes influence nearly every aspect of human biology from molecular and cellular functions to phenotypes in health and disease. Human genetics studies have now associated hundreds of thousands of differences in our DNA sequence ("genomic variation") with disease risk and other phenotypes, many of which could reveal novel mechanisms of human biology and uncover the basis of genetic predispositions to diseases, thereby guiding the development of new diagnostics and therapeutics. Yet, understanding how genomic variation alters genome function to influence phenotype has proven challenging. To unlock these insights, we need a systematic and comprehensive catalog of genome function and the molecular and cellular effects of genomic variants. Toward this goal, the Impact of Genomic Variation on Function (IGVF) Consortium will combine approaches in single-cell mapping, genomic perturbations, and predictive modeling to investigate the relationships among genomic variation, genome function, and phenotypes. Through systematic comparisons and benchmarking of experimental and computational methods, we aim to create maps across hundreds of cell types and states describing how coding variants alter protein activity, how noncoding variants change the regulation of gene expression, and how both coding and noncoding variants may connect through gene regulatory and protein interaction networks. These experimental data, computational predictions, and accompanying standards and pipelines will be integrated into an open resource that will catalyze community efforts to explore genome function and the impact of genetic variation on human biology and disease across populations.
Collapse
|
37
|
Bahda M, Ricard J, Girard SL, Maziade M, Isabelle M, Bureau A. Multivariate extension of penalized regression on summary statistics to construct polygenic risk scores for correlated traits. HGG ADVANCES 2023; 4:100209. [PMID: 37333772 PMCID: PMC10276147 DOI: 10.1016/j.xhgg.2023.100209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 05/17/2023] [Indexed: 06/20/2023] Open
Abstract
Genetic correlations between human traits and disorders such as schizophrenia (SZ) and bipolar disorder (BD) diagnoses are well established. Improved prediction of individual traits has been obtained by combining predictors of multiple genetically correlated traits derived from summary statistics produced by genome-wide association studies, compared with single trait predictors. We extend this idea to penalized regression on summary statistics in Multivariate Lassosum, expressing regression coefficients for the multiple traits on single nucleotide polymorphisms (SNPs) as correlated random effects, similarly to multi-trait summary statistic best linear unbiased predictors (MT-SBLUPs). We also allow the SNP contributions to genetic covariance and heritability to depend on genomic annotations. We conducted simulations with two dichotomous traits having polygenic architecture similar to SZ and BD, using genotypes from 29,330 subjects from the CARTaGENE cohort. Multivariate Lassosum produced polygenic risk scores (PRSs) more strongly correlated with the true genetic risk predictor and had better discrimination power between affected and non-affected subjects than previously published sparse multi-trait (PANPRS) and univariate (Lassosum, sparse LDpred2, and the standard clumping and thresholding) methods in most simulation settings. Application of Multivariate Lassosum to predict SZ, BD, and related psychiatric traits in the Eastern Quebec SZ and BD kindred study revealed associations with every trait stronger than those obtained with univariate sparse PRSs, particularly when heritability and genetic covariance depended on genomic annotations. Multivariate Lassosum thus appears promising to improve prediction of genetically correlated traits with summary statistics for a selected subset of SNPs.
Collapse
Affiliation(s)
- Meriem Bahda
- Department of Mathematics and Statistic, Laval University, Québec, QC G1V 0A6, Canada
- CERVO Brain Research Centre, Québec, QC G1E 1T2, Canada
| | - Jasmin Ricard
- CERVO Brain Research Centre, Québec, QC G1E 1T2, Canada
| | - Simon L. Girard
- CERVO Brain Research Centre, Québec, QC G1E 1T2, Canada
- Department of Fundamental Sciences, University of Quebec in Chicoutimi, Chicoutimi, QC G7H 2B1, Canada
| | - Michel Maziade
- CERVO Brain Research Centre, Québec, QC G1E 1T2, Canada
- Department of Psychiatry and Neurosciences, Laval University, Québec, QC G1V 0A6, Canada
| | - Maripier Isabelle
- CERVO Brain Research Centre, Québec, QC G1E 1T2, Canada
- Department of Economics, Laval University, Québec, QC G1V 0A6, Canada
| | - Alexandre Bureau
- CERVO Brain Research Centre, Québec, QC G1E 1T2, Canada
- Department of Social and Preventive Medicine, Laval University, Québec, QC G1V 0A6, Canada
| |
Collapse
|
38
|
Yuan K, Longchamps RJ, Pardiñas AF, Yu M, Chen TT, Lin SC, Chen Y, Lam M, Liu R, Xia Y, Guo Z, Shi W, Shen C, Daly MJ, Neale BM, Feng YCA, Lin YF, Chen CY, O'Donovan M, Ge T, Huang H. Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.07.23284293. [PMID: 36711496 PMCID: PMC9882563 DOI: 10.1101/2023.01.07.23284293] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Genome-wide association studies (GWAS) of human complex traits or diseases often implicate genetic loci that span hundreds or thousands of genetic variants, many of which have similar statistical significance. While statistical fine-mapping in individuals of European ancestries has made important discoveries, cross-population fine-mapping has the potential to improve power and resolution by capitalizing on the genomic diversity across ancestries. Here we present SuSiEx, an accurate and computationally efficient method for cross-population fine-mapping, which builds on the single-population fine-mapping framework, Sum of Single Effects (SuSiE). SuSiEx integrates data from an arbitrary number of ancestries, explicitly models population-specific allele frequencies and LD patterns, accounts for multiple causal variants in a genomic region, and can be applied to GWAS summary statistics. We comprehensively evaluated SuSiEx using simulations, a range of quantitative traits measured in both UK Biobank and Taiwan Biobank, and schizophrenia GWAS across East Asian and European ancestries. In all evaluations, SuSiEx fine-mapped more association signals, produced smaller credible sets and higher posterior inclusion probability (PIP) for putative causal variants, and captured population-specific causal variants.
Collapse
Affiliation(s)
- Kai Yuan
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, the Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Ryan J Longchamps
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, the Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Antonio F Pardiñas
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University School of Medicine, Cardiff, UK
| | - Mingrui Yu
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, the Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Tzu-Ting Chen
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli, Taiwan
| | - Shu-Chin Lin
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli, Taiwan
| | - Yu Chen
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Max Lam
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Human Genetics, Genome Institute of Singapore, Singapore, Singapore
- Division of Psychiatry Research, the Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, USA
- Research Division Institute of Mental Health Singapore, Singapore, Singapore
| | - Ruize Liu
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, the Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Yan Xia
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, the Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Zhenglin Guo
- Stanley Center for Psychiatric Research, the Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Wenzhao Shi
- Digital Health China Technologies Corp. Ltd., Beijing, China
| | - Chengguo Shen
- Digital Health China Technologies Corp. Ltd., Beijing, China
| | - Mark J Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, the Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, the Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Yen-Chen A Feng
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Yen-Feng Lin
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli, Taiwan
- Department of Public Health & Medical Humanities, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
- Institute of Behavioral Medicine, College of Medicine, National Cheng Kung University
| | | | - Michael O'Donovan
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University School of Medicine, Cardiff, UK
| | - Tian Ge
- Stanley Center for Psychiatric Research, the Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Hailiang Huang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, the Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
39
|
Fiziev PP, McRae J, Ulirsch JC, Dron JS, Hamp T, Yang Y, Wainschtein P, Ni Z, Schraiber JG, Gao H, Cable D, Field Y, Aguet F, Fasnacht M, Metwally A, Rogers J, Marques-Bonet T, Rehm HL, O'Donnell-Luria A, Khera AV, Farh KKH. Rare penetrant mutations confer severe risk of common diseases. Science 2023; 380:eabo1131. [PMID: 37262146 DOI: 10.1126/science.abo1131] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 03/16/2023] [Indexed: 06/03/2023]
Abstract
We examined 454,712 exomes for genes associated with a wide spectrum of complex traits and common diseases and observed that rare, penetrant mutations in genes implicated by genome-wide association studies confer ~10-fold larger effects than common variants in the same genes. Consequently, an individual at the phenotypic extreme and at the greatest risk for severe, early-onset disease is better identified by a few rare penetrant variants than by the collective action of many common variants with weak effects. By combining rare variants across phenotype-associated genes into a unified genetic risk model, we demonstrate superior portability across diverse global populations compared with common-variant polygenic risk scores, greatly improving the clinical utility of genetic-based risk prediction.
Collapse
Affiliation(s)
- Petko P Fiziev
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Jeremy McRae
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Jacob C Ulirsch
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Jacqueline S Dron
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Tobias Hamp
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Yanshen Yang
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Pierrick Wainschtein
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
| | - Zijian Ni
- Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Joshua G Schraiber
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Hong Gao
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Dylan Cable
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology (MIT), Cambridge, MA 02142, USA
| | - Yair Field
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Francois Aguet
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Marc Fasnacht
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Ahmed Metwally
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), 08003 Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), 08003 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, 08193 Barcelona, Spain
| | - Heidi L Rehm
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115, USA
| | - Amit V Khera
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Verve Therapeutics, Cambridge, MA 02215, USA
| | - Kyle Kai-How Farh
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA 92122, USA
| |
Collapse
|
40
|
Mester R, Hou K, Ding Y, Meeks G, Burch KS, Bhattacharya A, Henn BM, Pasaniuc B. Impact of cross-ancestry genetic architecture on GWASs in admixed populations. Am J Hum Genet 2023; 110:927-939. [PMID: 37224807 PMCID: PMC10257009 DOI: 10.1016/j.ajhg.2023.05.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 05/04/2023] [Accepted: 05/04/2023] [Indexed: 05/26/2023] Open
Abstract
Genome-wide association studies (GWASs) have identified thousands of variants for disease risk. These studies have predominantly been conducted in individuals of European ancestries, which raises questions about their transferability to individuals of other ancestries. Of particular interest are admixed populations, usually defined as populations with recent ancestry from two or more continental sources. Admixed genomes contain segments of distinct ancestries that vary in composition across individuals in the population, allowing for the same allele to induce risk for disease on different ancestral backgrounds. This mosaicism raises unique challenges for GWASs in admixed populations, such as the need to correctly adjust for population stratification. In this work we quantify the impact of differences in estimated allelic effect sizes for risk variants between ancestry backgrounds on association statistics. Specifically, while the possibility of estimated allelic effect-size heterogeneity by ancestry (HetLanc) can be modeled when performing a GWAS in admixed populations, the extent of HetLanc needed to overcome the penalty from an additional degree of freedom in the association statistic has not been thoroughly quantified. Using extensive simulations of admixed genotypes and phenotypes, we find that controlling for and conditioning effect sizes on local ancestry can reduce statistical power by up to 72%. This finding is especially pronounced in the presence of allele frequency differentiation. We replicate simulation results using 4,327 African-European admixed genomes from the UK Biobank for 12 traits to find that for most significant SNPs, HetLanc is not large enough for GWASs to benefit from modeling heterogeneity in this way.
Collapse
Affiliation(s)
- Rachel Mester
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| | - Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Gillian Meeks
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, Davis, CA 95616, USA
| | - Kathryn S Burch
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Brenna M Henn
- Department of Anthropology, Center for Population Biology and the Genome Center, University of California, Davis, Davis, CA 95616, USA
| | - Bogdan Pasaniuc
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Institute of Precision Health, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
41
|
Fiziev P, McRae J, Ulirsch JC, Dron JS, Hamp T, Yang Y, Wainschtein P, Ni Z, Schraiber JG, Gao H, Cable D, Field Y, Aguet F, Fasnacht M, Metwally A, Rogers J, Marques-Bonet T, Rehm HL, O’Donnell-Luria A, Khera AV, Kai-How Farh K. Rare penetrant mutations confer severe risk of common diseases. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.01.23289356. [PMID: 37205493 PMCID: PMC10187340 DOI: 10.1101/2023.05.01.23289356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
We examined 454,712 exomes for genes associated with a wide spectrum of complex traits and common diseases and observed that rare, penetrant mutations in genes implicated by genome-wide association studies confer ∼10-fold larger effects than common variants in the same genes. Consequently, an individual at the phenotypic extreme and at the greatest risk for severe, early-onset disease is better identified by a few rare penetrant variants than by the collective action of many common variants with weak effects. By combining rare variants across phenotype-associated genes into a unified genetic risk model, we demonstrate superior portability across diverse global populations compared to common variant polygenic risk scores, greatly improving the clinical utility of genetic-based risk prediction. One sentence summary Rare variant polygenic risk scores identify individuals with outlier phenotypes in common human diseases and complex traits.
Collapse
Affiliation(s)
- Petko Fiziev
- Artificial Intelligence Laboratory, Illumina, Inc.; San Diego, California 92122, USA
| | - Jeremy McRae
- Artificial Intelligence Laboratory, Illumina, Inc.; San Diego, California 92122, USA
| | - Jacob C. Ulirsch
- Artificial Intelligence Laboratory, Illumina, Inc.; San Diego, California 92122, USA
| | - Jacqueline S. Dron
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard; Cambridge, Massachusetts 02142, USA
| | - Tobias Hamp
- Artificial Intelligence Laboratory, Illumina, Inc.; San Diego, California 92122, USA
| | - Yanshen Yang
- Artificial Intelligence Laboratory, Illumina, Inc.; San Diego, California 92122, USA
| | - Pierrick Wainschtein
- Artificial Intelligence Laboratory, Illumina, Inc.; San Diego, California 92122, USA
| | - Zijian Ni
- Department of Statistics, UW Madison; Madison, Wisconsin 53706, USA
| | - Joshua G. Schraiber
- Artificial Intelligence Laboratory, Illumina, Inc.; San Diego, California 92122, USA
| | - Hong Gao
- Artificial Intelligence Laboratory, Illumina, Inc.; San Diego, California 92122, USA
| | - Dylan Cable
- Department of Electrical Engineering and Computer Science, MIT; Cambridge, Massachusetts 02142, USA
| | - Yair Field
- Artificial Intelligence Laboratory, Illumina, Inc.; San Diego, California 92122, USA
| | - Francois Aguet
- Artificial Intelligence Laboratory, Illumina, Inc.; San Diego, California 92122, USA
| | - Marc Fasnacht
- Artificial Intelligence Laboratory, Illumina, Inc.; San Diego, California 92122, USA
| | - Ahmed Metwally
- Artificial Intelligence Laboratory, Illumina, Inc.; San Diego, California 92122, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine; Houston, Texas 77030, USA
- Wisconsin National Primate Research Center, University of Wisconsin; Madison 53715, USA
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC); 08003 Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA); 08010 Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST); 08003 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona; 08193 Barcelona, Spain
| | - Heidi L. Rehm
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard; Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital; Boston, Massachusetts 02114, USA
| | - Anne O’Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard; Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital; Boston, Massachusetts 02114, USA
- Division of Genetics and Genomics, Boston Children’s Hospital; Boston, Massachusetts 02115, USA
| | - Amit V. Khera
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard; Cambridge, Massachusetts 02142, USA
- Verve Therapeutics, Cambridge, Massachusetts 02215, USA
| | - Kyle Kai-How Farh
- Artificial Intelligence Laboratory, Illumina, Inc.; San Diego, California 92122, USA
| |
Collapse
|
42
|
Liu Z, Liu R, Gao H, Jung S, Gao X, Sun R, Liu X, Kim Y, Lee HS, Kawai Y, Nagasaki M, Umeno J, Tokunaga K, Kinouchi Y, Masamune A, Shi W, Shen C, Guo Z, Yuan K, Zhu S, Li D, Liu J, Ge T, Cho J, Daly MJ, McGovern DPB, Ye BD, Song K, Kakuta Y, Li M, Huang H. Genetic architecture of the inflammatory bowel diseases across East Asian and European ancestries. Nat Genet 2023; 55:796-806. [PMID: 37156999 PMCID: PMC10290755 DOI: 10.1038/s41588-023-01384-0] [Citation(s) in RCA: 90] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 03/27/2023] [Indexed: 05/10/2023]
Abstract
Inflammatory bowel diseases (IBDs) are chronic disorders of the gastrointestinal tract with the following two subtypes: Crohn's disease (CD) and ulcerative colitis (UC). To date, most IBD genetic associations were derived from individuals of European (EUR) ancestries. Here we report the largest IBD study of individuals of East Asian (EAS) ancestries, including 14,393 cases and 15,456 controls. We found 80 IBD loci in EAS alone and 320 when meta-analyzed with ~370,000 EUR individuals (~30,000 cases), among which 81 are new. EAS-enriched coding variants implicate many new IBD genes, including ADAP1 and GIT2. Although IBD genetic effects are generally consistent across ancestries, genetics underlying CD appears more ancestry dependent than UC, driven by allele frequency (NOD2) and effect (TNFSF15). We extended the IBD polygenic risk score (PRS) by incorporating both ancestries, greatly improving its accuracy and highlighting the importance of diversity for the equitable deployment of PRS.
Collapse
Affiliation(s)
- Zhanju Liu
- Center for IBD Research, Department of Gastroenterology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai, China.
| | - Ruize Liu
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Han Gao
- Center for IBD Research, Department of Gastroenterology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai, China
| | - Seulgi Jung
- Department of Biochemistry and Molecular Biology, University of Ulsan College of Medicine, Seoul, Korea
| | - Xiang Gao
- Center for IBD Research, Department of Gastroenterology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai, China
| | - Ruicong Sun
- Center for IBD Research, Department of Gastroenterology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai, China
| | - Xiaoming Liu
- Inflammatory Bowel Diseases Research Center, Department of Gastroenterology, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Yongjae Kim
- Department of Biochemistry and Molecular Biology, University of Ulsan College of Medicine, Seoul, Korea
| | - Ho-Su Lee
- Department of Biochemistry and Molecular Biology, University of Ulsan College of Medicine, Seoul, Korea
| | - Yosuke Kawai
- Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
| | - Masao Nagasaki
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research, Kyoto University, Kyoto, Japan
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Junji Umeno
- Department of Medicine and Clinical Science, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| | - Katsushi Tokunaga
- Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
| | - Yoshitaka Kinouchi
- Student Healthcare Center, Institute for Excellence in Higher Education, Tohoku University, Sendai, Japan
| | - Atsushi Masamune
- Division of Gastroenterology, Tohoku University Graduate School of Medicine, Sendai, Japan
| | - Wenzhao Shi
- Digital Health China Technologies Corp Ltd., Beijing, China
| | - Chengguo Shen
- Digital Health China Technologies Corp Ltd., Beijing, China
| | - Zhenglin Guo
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kai Yuan
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Shu Zhu
- Institute of Immunology, the CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Dalin Li
- Widjaja Inflammatory Bowel Research Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Jianjun Liu
- Genome Institute of Singapore, Singapore, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Tian Ge
- Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Judy Cho
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Mark J Daly
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Dermot P B McGovern
- Widjaja Inflammatory Bowel Research Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Byong Duk Ye
- Department of Gastroenterology and Inflammatory Bowel Disease Center, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| | - Kyuyoung Song
- Department of Biochemistry and Molecular Biology, University of Ulsan College of Medicine, Seoul, Korea.
| | - Yoichi Kakuta
- Division of Gastroenterology, Tohoku University Graduate School of Medicine, Sendai, Japan.
| | - Mingsong Li
- Inflammatory Bowel Diseases Research Center, Department of Gastroenterology, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
| | - Hailiang Huang
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
43
|
Sullivan PF, Meadows JRS, Gazal S, Phan BN, Li X, Genereux DP, Dong MX, Bianchi M, Andrews G, Sakthikumar S, Nordin J, Roy A, Christmas MJ, Marinescu VD, Wang C, Wallerman O, Xue J, Yao S, Sun Q, Szatkiewicz J, Wen J, Huckins LM, Lawler A, Keough KC, Zheng Z, Zeng J, Wray NR, Li Y, Johnson J, Chen J, Paten B, Reilly SK, Hughes GM, Weng Z, Pollard KS, Pfenning AR, Forsberg-Nilsson K, Karlsson EK, Lindblad-Toh K, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, et alSullivan PF, Meadows JRS, Gazal S, Phan BN, Li X, Genereux DP, Dong MX, Bianchi M, Andrews G, Sakthikumar S, Nordin J, Roy A, Christmas MJ, Marinescu VD, Wang C, Wallerman O, Xue J, Yao S, Sun Q, Szatkiewicz J, Wen J, Huckins LM, Lawler A, Keough KC, Zheng Z, Zeng J, Wray NR, Li Y, Johnson J, Chen J, Paten B, Reilly SK, Hughes GM, Weng Z, Pollard KS, Pfenning AR, Forsberg-Nilsson K, Karlsson EK, Lindblad-Toh K, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. Leveraging base-pair mammalian constraint to understand genetic variation and human disease. Science 2023; 380:eabn2937. [PMID: 37104612 PMCID: PMC10259825 DOI: 10.1126/science.abn2937] [Show More Authors] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 02/09/2023] [Indexed: 04/29/2023]
Abstract
Thousands of genomic regions have been associated with heritable human diseases, but attempts to elucidate biological mechanisms are impeded by an inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function, agnostic to cell type or disease mechanism. Single-base phyloP scores from 240 mammals identified 3.3% of the human genome as significantly constrained and likely functional. We compared phyloP scores to genome annotation, association studies, copy-number variation, clinical genetics findings, and cancer data. Constrained positions are enriched for variants that explain common disease heritability more than other functional annotations. Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.
Collapse
Affiliation(s)
- Patrick F Sullivan
- Department of Genetics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, 17177 Stockholm, Sweden
| | - Jennifer R S Meadows
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
| | - Steven Gazal
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - BaDoi N Phan
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Xue Li
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Diane P Genereux
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Michael X Dong
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
| | - Matteo Bianchi
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
| | - Gregory Andrews
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Sharadha Sakthikumar
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Jessika Nordin
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
| | - Ananya Roy
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, 75185 Uppsala, Sweden
| | - Matthew J Christmas
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
| | - Voichita D Marinescu
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
| | - Chao Wang
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
| | - Ola Wallerman
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
| | - James Xue
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Center for System Biology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Shuyang Yao
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, 17177 Stockholm, Sweden
| | - Quan Sun
- Department of Genetics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
| | - Jin Szatkiewicz
- Department of Genetics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
| | - Jia Wen
- Department of Genetics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
| | - Laura M Huckins
- Department of Genetic and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Alyssa Lawler
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kathleen C Keough
- Gladstone Institutes, San Francisco, CA 94158, USA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA 94158, USA
| | - Zhili Zheng
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
| | - Jian Zeng
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
| | - Naomi R Wray
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
| | - Yun Li
- Department of Genetics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
| | - Jessica Johnson
- Department of Genetic and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, Santa Cruz, CA 95064, USA
| | - Steven K Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| | - Graham M Hughes
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Katherine S Pollard
- Gladstone Institutes, San Francisco, CA 94158, USA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Andreas R Pfenning
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Karin Forsberg-Nilsson
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, 75185 Uppsala, Sweden
- Biodiscovery Institute, University of Nottingham, Nottingham NG7 2RD, UK
| | - Elinor K Karlsson
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Kerstin Lindblad-Toh
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Hou K, Ding Y, Xu Z, Wu Y, Bhattacharya A, Mester R, Belbin GM, Buyske S, Conti DV, Darst BF, Fornage M, Gignoux C, Guo X, Haiman C, Kenny EE, Kim M, Kooperberg C, Lange L, Manichaikul A, North KE, Peters U, Rasmussen-Torvik LJ, Rich SS, Rotter JI, Wheeler HE, Wojcik GL, Zhou Y, Sankararaman S, Pasaniuc B. Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals. Nat Genet 2023; 55:549-558. [PMID: 36941441 PMCID: PMC11120833 DOI: 10.1038/s41588-023-01338-6] [Citation(s) in RCA: 59] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 02/16/2023] [Indexed: 03/23/2023]
Abstract
Individuals of admixed ancestries (for example, African Americans) inherit a mosaic of ancestry segments (local ancestry) originating from multiple continental ancestral populations. This offers the unique opportunity of investigating the similarity of genetic effects on traits across ancestries within the same population. Here we introduce an approach to estimate correlation of causal genetic effects (radmix) across local ancestries and analyze 38 complex traits in African-European admixed individuals (N = 53,001) to observe very high correlations (meta-analysis radmix = 0.95, 95% credible interval 0.93-0.97), much higher than correlation of causal effects across continental ancestries. We replicate our results using regression-based methods from marginal genome-wide association study summary statistics. We also report realistic scenarios where regression-based methods yield inflated heterogeneity-by-ancestry due to ancestry-specific tagging of causal effects, and/or polygenicity. Our results motivate genetic analyses that assume minimal heterogeneity in causal effects by ancestry, with implications for the inclusion of ancestry-diverse individuals in studies.
Collapse
Affiliation(s)
- Kangcheng Hou
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
| | - Yi Ding
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Ziqi Xu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Yue Wu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Rachel Mester
- Graduate Program in Biomathematics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Gillian M Belbin
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Steve Buyske
- Department of Statistics, Rutgers University, Piscataway, NJ, USA
| | - David V Conti
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Burcu F Darst
- Division of Public Health Science, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Myriam Fornage
- Brown Foundation Institute for Molecular Medicine, The University of Texas Health Science Center, Houston, TX, USA
| | - Chris Gignoux
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado, Denver, CO, USA
| | - Xiuqing Guo
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Lundquist Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Christopher Haiman
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Michelle Kim
- Division of Public Health Science, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Charles Kooperberg
- Division of Public Health Science, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Leslie Lange
- Department of Medicine, University of Colorado, Aurora, CO, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Kari E North
- Department of Statistics, Rutgers University, Piscataway, NJ, USA
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Ulrike Peters
- Division of Public Health Science, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Laura J Rasmussen-Torvik
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Stephen S Rich
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Lundquist Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Heather E Wheeler
- Department of Biology, Loyola University Chicago, Chicago, IL, USA
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, USA
| | - Genevieve L Wojcik
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - Ying Zhou
- Division of Public Health Science, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Sriram Sankararaman
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
- Department of Computer Science, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
| |
Collapse
|
45
|
Atkinson EG. Estimation of cross-ancestry genetic correlations within ancestry tracts of admixed samples. Nat Genet 2023; 55:527-529. [PMID: 36941440 DOI: 10.1038/s41588-023-01325-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2023]
Affiliation(s)
- Elizabeth G Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
46
|
Sullivan PF, Meadows JRS, Gazal S, Phan BN, Li X, Genereux DP, Dong MX, Bianchi M, Andrews G, Sakthikumar S, Nordin J, Roy A, Christmas MJ, Marinescu VD, Wallerman O, Xue JR, Li Y, Yao S, Sun Q, Szatkiewicz J, Wen J, Huckins LM, Lawler AJ, Keough KC, Zheng Z, Zeng J, Wray NR, Johnson J, Chen J, Zoonomia Consortium, Paten B, Reilly SK, Hughes GM, Weng Z, Pollard KS, Pfenning AR, Forsberg-Nilsson K, Karlsson EK, Lindblad-Toh K. Leveraging Base Pair Mammalian Constraint to Understand Genetic Variation and Human Disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.10.531987. [PMID: 36945512 PMCID: PMC10028973 DOI: 10.1101/2023.03.10.531987] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/13/2023]
Abstract
Although thousands of genomic regions have been associated with heritable human diseases, attempts to elucidate biological mechanisms are impeded by a general inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function that is agnostic to cell type or disease mechanism. Here, single base phyloP scores from the whole genome alignment of 240 placental mammals identified 3.5% of the human genome as significantly constrained, and likely functional. We compared these scores to large-scale genome annotation, genome-wide association studies (GWAS), copy number variation, clinical genetics findings, and cancer data sets. Evolutionarily constrained positions are enriched for variants explaining common disease heritability (more than any other functional annotation). Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.
Collapse
Affiliation(s)
- Patrick F. Sullivan
- Department of Genetics, University of North Carolina Medical School; Chapel Hill, NC 27599, USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet; Stockholm, Sweden
| | - Jennifer R. S. Meadows
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University; Uppsala, 751 32, Sweden
| | - Steven Gazal
- Keck School of Medicine, University of Southern California; Los Angeles, CA 90033, USA
| | - BaDoi N. Phan
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University; Pittsburgh, PA 15213, USA
- Medical Scientist Training Program, University of Pittsburgh School of Medicine; Pittsburgh, PA 15261, USA
- Neuroscience Institute, Carnegie Mellon University; Pittsburgh, PA 15213, USA
| | - Xue Li
- Broad Institute of MIT and Harvard; Cambridge, MA 02139, USA
- Morningside Graduate School of Biomedical Sciences, UMass Chan Medical School; Worcester, MA 01605, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School; Worcester, MA 01605, USA
| | | | - Michael X. Dong
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University; Uppsala, 751 32, Sweden
| | - Matteo Bianchi
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University; Uppsala, 751 32, Sweden
| | - Gregory Andrews
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School; Worcester, MA 01605, USA
| | - Sharadha Sakthikumar
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University; Uppsala, 751 32, Sweden
- Broad Institute of MIT and Harvard; Cambridge, MA 02139, USA
| | - Jessika Nordin
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University; Uppsala, 751 85, Sweden
| | - Ananya Roy
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University; Uppsala, 751 85, Sweden
| | - Matthew J. Christmas
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University; Uppsala, 751 32, Sweden
| | - Voichita D. Marinescu
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University; Uppsala, 751 32, Sweden
| | - Ola Wallerman
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University; Uppsala, 751 32, Sweden
| | - James R. Xue
- Broad Institute of MIT and Harvard; Cambridge, MA 02139, USA
- Department of Organismic and Evolutionary Biology, Harvard University; Cambridge, MA 02138, USA
| | - Yun Li
- Department of Genetics, University of North Carolina Medical School; Chapel Hill, NC 27599, USA
| | - Shuyang Yao
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet; Stockholm, Sweden
| | - Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
| | - Jin Szatkiewicz
- Department of Genetics, University of North Carolina Medical School; Chapel Hill, NC 27599, USA
| | - Jia Wen
- Department of Genetics, University of North Carolina Medical School; Chapel Hill, NC 27599, USA
| | - Laura M. Huckins
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai; New York, NY 10029, USA
| | - Alyssa J. Lawler
- Neuroscience Institute, Carnegie Mellon University; Pittsburgh, PA 15213, USA
- Broad Institute of MIT and Harvard; Cambridge, MA 02139, USA
- Department of Biological Sciences, Mellon College of Science, Carnegie Mellon University; Pittsburgh, PA 15213, USA
| | - Kathleen C. Keough
- Department of Epidemiology & Biostatistics, University of California San Francisco; San Francisco, CA 94158, USA
- Fauna Bio Incorporated; Emeryville, CA 94608, USA
- Gladstone Institutes; San Francisco, CA 94158, USA
| | - Zhili Zheng
- Institute for Molecular Bioscience, University of Queensland; Brisbane, Queensland, Australia
| | - Jian Zeng
- Institute for Molecular Bioscience, University of Queensland; Brisbane, Queensland, Australia
| | - Naomi R. Wray
- Institute for Molecular Bioscience, University of Queensland; Brisbane, Queensland, Australia
- Queensland Brain Institute, University of Queensland; Brisbane, Queensland, Australia
| | - Jessica Johnson
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai; New York, NY 10029, USA
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
| | | | - Benedict Paten
- Genomics Institute, University of California Santa Cruz; Santa Cruz, CA 95064, USA
| | - Steven K. Reilly
- Department of Genetics, Yale School of Medicine; New Haven, CT 06510, USA
| | - Graham M. Hughes
- School of Biology and Environmental Science, University College Dublin; Belfield, Dublin 4, Ireland
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School; Worcester, MA 01605, USA
| | - Katherine S. Pollard
- Department of Epidemiology & Biostatistics, University of California San Francisco; San Francisco, CA 94158, USA
- Gladstone Institutes; San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub; San Francisco, CA 94158, USA
| | - Andreas R. Pfenning
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University; Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University; Pittsburgh, PA 15213, USA
| | - Karin Forsberg-Nilsson
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University; Uppsala, 751 85, Sweden
- Biodiscovery Institute, University of Nottingham; Nottingham, UK
| | - Elinor K. Karlsson
- Broad Institute of MIT and Harvard; Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School; Worcester, MA 01605, USA
- Program in Molecular Medicine, UMass Chan Medical School; Worcester, MA 01605, USA
| | - Kerstin Lindblad-Toh
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University; Uppsala, 751 32, Sweden
- Broad Institute of MIT and Harvard; Cambridge, MA 02139, USA
| |
Collapse
|
47
|
Zhou G, Chen T, Zhao H. SDPRX: A statistical method for cross-population prediction of complex traits. Am J Hum Genet 2023; 110:13-22. [PMID: 36460009 PMCID: PMC9892700 DOI: 10.1016/j.ajhg.2022.11.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 11/08/2022] [Indexed: 12/03/2022] Open
Abstract
Polygenic risk score (PRS) has demonstrated its great utility in biomedical research through identifying high-risk individuals for different diseases from their genotypes. However, the broader application of PRS to the general population is hindered by the limited transferability of PRS developed in Europeans to non-European populations. To improve PRS prediction accuracy in non-European populations, we develop a statistical method called SDPRX that can effectively integrate genome wide association study summary statistics from different populations. SDPRX automatically adjusts for linkage disequilibrium differences between populations and characterizes the joint distribution of the effect sizes of a variant in two populations to be both null, population specific, or shared with correlation. Through simulations and applications to real traits, we show that SDPRX improves the prediction performance over existing methods in non-European populations.
Collapse
Affiliation(s)
- Geyu Zhou
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Tianqi Chen
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Hongyu Zhao
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.
| |
Collapse
|
48
|
Hui D, Xiao B, Dikilitas O, Freimuth RR, Irvin MR, Jarvik GP, Kottyan L, Kullo I, Limdi NA, Liu C, Luo Y, Namjou B, Puckelwartz MJ, Schaid D, Tiwari H, Wei WQ, Verma S, Kim D, Ritchie MD. Quantifying factors that affect polygenic risk score performance across diverse ancestries and age groups for body mass index. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2023; 28:437-448. [PMID: 36540998 PMCID: PMC10018532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Polygenic risk scores (PRS) have led to enthusiasm for precision medicine. However, it is well documented that PRS do not generalize across groups differing in ancestry or sample characteristics e.g., age. Quantifying performance of PRS across different groups of study participants, using genome-wide association study (GWAS) summary statistics from multiple ancestry groups and sample sizes, and using different linkage disequilibrium (LD) reference panels may clarify which factors are limiting PRS transferability. To evaluate these factors in the PRS generation process, we generated body mass index (BMI) PRS (PRSBMI) in the Electronic Medical Records and Genomics (eMERGE) network (N=75,661). Analyses were conducted in two ancestry groups (European and African) and three age ranges (adult, teenagers, and children). For PRSBMI calculations, we evaluated five LD reference panels and three sets of GWAS summary statistics of varying sample size and ancestry. PRSBMI performance increased for both African and European ancestry individuals using cross-ancestry GWAS summary statistics compared to European-only summary statistics (6.3% and 3.7% relative R2 increase, respectively, pAfrican=0.038, pEuropean=6.26x10-4). The effects of LD reference panels were more pronounced in African ancestry study datasets. PRSBMI performance degraded in children; R2 was less than half of teenagers or adults. The effect of GWAS summary statistics sample size was small when modeled with the other factors. Additionally, the potential of using a PRS generated for one trait to predict risk for comorbid diseases is not well understood especially in the context of cross-ancestry analyses - we explored clinical comorbidities from the electronic health record associated with PRSBMI and identified significant associations with type 2 diabetes and coronary atherosclerosis. In summary, this study quantifies the effects that ancestry, GWAS summary statistic sample size, and LD reference panel have on PRS performance, especially in cross-ancestry and age-specific analyses.
Collapse
Affiliation(s)
- Daniel Hui
- Graduate Program in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Brenda Xiao
- Graduate Program in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Ozan Dikilitas
- Department of Internal Medicine, Department of Cardiovascular Medicine, Clinician-Investigator Training Program, Mayo Clinic, Rochester MN
| | - Robert R. Freimuth
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Marguerite R. Irvin
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Gail P. Jarvik
- Departments of Medicine and Genome Sciences, University of Washington, Seattle WA, USA
| | - Leah Kottyan
- Center for Autoimmune Genomics and Etiology, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | - Iftikhar Kullo
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN 55905, USA
| | - Nita A. Limdi
- Department of Neurology & Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Yuan Luo
- Department of Preventive Medicine (Health and Biomedical Informatics), Northwestern University, Chicago, IL USA
| | - Bahram Namjou
- Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | | | - Daniel Schaid
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA
| | - Hemant Tiwari
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Shefali Verma
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Dokyoon Kim
- Department of Biostatistics, Epidemiology and Informatics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Marylyn D. Ritchie
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
49
|
Novembre J, Stein C, Asgari S, Gonzaga-Jauregui C, Landstrom A, Lemke A, Li J, Mighton C, Taylor M, Tishkoff S. Addressing the challenges of polygenic scores in human genetic research. Am J Hum Genet 2022; 109:2095-2100. [PMID: 36459976 PMCID: PMC9808501 DOI: 10.1016/j.ajhg.2022.10.012] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
The genotyping of millions of human samples has made it possible to evaluate variants across the human genome for their possible association with risks for numerous diseases and other traits by using genome-wide association studies (GWASs). The associations between phenotype and genotype found in GWASs make possible the construction of polygenic scores (PGSs), which aim to predict a trait or disease outcome in an individual on the basis of their genotype (in the disease case, the term polygenic risk score [PRS] is often used). PGSs have shown promise for studying the biology of complex traits and as a tool for evaluating individual disease risks in clinical settings. Although the quantity and quality of data to compute PGSs are increasing, challenges remain in the technical aspects of developing PGSs and in the ethical and social issues that might arise from their use. This ASHG Guidance emphasizes three major themes for researchers working with or interested in the application of PGSs in their own research: (1) developing diverse research cohorts; (2) fostering robustness in the development, application, and interpretation of PGSs; and (3) improving the communication of PGS results and their implications to broad audiences.
Collapse
Affiliation(s)
- John Novembre
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Department of Human Genetics, University of Chicago, Chicago, IL, USA,Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA,Corresponding author
| | - Catherine Stein
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA,Corresponding author
| | - Samira Asgari
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Claudia Gonzaga-Jauregui
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,International Laboratory for Human Genome Research, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, México
| | - Andrew Landstrom
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Department of Pediatrics, Division of Cardiology, Duke University School of Medicine, Durham, NC, USA
| | - Amy Lemke
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Norton Children’s Research Institute, affiliated with the University of Louisville School of Medicine, Louisville, KY, USA
| | - Jun Li
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Chloe Mighton
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Genomics Health Services Research Program, St. Michael’s Hospital, Unity Health Toronto, Toronto, ON, Canada,Institute of Health Policy, Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Matthew Taylor
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Adult Medical Genetics Program, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Sarah Tishkoff
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Department of Genetics, Center for Global Genomics and Health Equity, University of Pennsylvania, Philadelphia, PA, USA,Department of Biology, Center for Global Genomics and Health Equity, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
50
|
Abraham A, LaBella AL, Capra JA, Rokas A. Mosaic patterns of selection in genomic regions associated with diverse human traits. PLoS Genet 2022; 18:e1010494. [PMID: 36342969 PMCID: PMC9671423 DOI: 10.1371/journal.pgen.1010494] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 11/17/2022] [Accepted: 10/21/2022] [Indexed: 11/09/2022] Open
Abstract
Natural selection shapes the genetic architecture of many human traits. However, the prevalence of different modes of selection on genomic regions associated with variation in traits remains poorly understood. To address this, we developed an efficient computational framework to calculate positive and negative enrichment of different evolutionary measures among regions associated with complex traits. We applied the framework to summary statistics from >900 genome-wide association studies (GWASs) and 11 evolutionary measures of sequence constraint, population differentiation, and allele age while accounting for linkage disequilibrium, allele frequency, and other potential confounders. We demonstrate that this framework yields consistent results across GWASs with variable sample sizes, numbers of trait-associated SNPs, and analytical approaches. The resulting evolutionary atlas maps diverse signatures of selection on genomic regions associated with complex human traits on an unprecedented scale. We detected positive enrichment for sequence conservation among trait-associated regions for the majority of traits (>77% of 290 high power GWASs), which included reproductive traits. Many traits also exhibited substantial positive enrichment for population differentiation, especially among hair, skin, and pigmentation traits. In contrast, we detected widespread negative enrichment for signatures of balancing selection (51% of GWASs) and absence of enrichment for evolutionary signals in regions associated with late-onset Alzheimer's disease. These results support a pervasive role for negative selection on regions of the human genome that contribute to variation in complex traits, but also demonstrate that diverse modes of evolution are likely to have shaped trait-associated loci. This atlas of evolutionary signatures across the diversity of available GWASs will enable exploration of the relationship between the genetic architecture and evolutionary processes in the human genome.
Collapse
Affiliation(s)
- Abin Abraham
- Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Abigail L. LaBella
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, United States of America
- Evolutionary Studies Initiative, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, North Carolina, United States of America
- North Carolina Research Center, Kannapolis, North Carolina, United States of America
| | - John A. Capra
- Bakar Computational Health Sciences Institute, University of California, San Francisco, California, United States of America
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, United States of America
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, United States of America
- Evolutionary Studies Initiative, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, Tennessee, United States of America
| |
Collapse
|