1
|
Fu Y, Kenttämies A, Ruotsalainen S, Pirinen M, Tukiainen T. Role of X chromosome and dosage-compensation mechanisms in complex trait genetics. Am J Hum Genet 2025:S0002-9297(25)00145-4. [PMID: 40359939 DOI: 10.1016/j.ajhg.2025.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Revised: 04/16/2025] [Accepted: 04/16/2025] [Indexed: 05/15/2025] Open
Abstract
The X chromosome (chrX) is often excluded from genome-wide association studies due to its unique biology complicating the analysis and interpretation of genetic data. Consequently, the influence of chrX on human complex traits remains debated. Here, we systematically assessed the relevance of chrX and the effect of its biology on complex traits by analyzing 48 quantitative traits in 343,695 individuals in UK Biobank with replication in 412,181 individuals from FinnGen. We show that, in the general population, chrX contributes to complex trait heritability at a rate of 3% of the autosomal heritability, consistent with the amount of genetic variation observed in chrX. We find that a pronounced male bias in chrX heritability supports the presence of near-complete dosage compensation between sexes through X chromosome inactivation (XCI). However, we also find subtle yet plausible evidence of escape from XCI contributing to human height. Assuming full XCI, the observed chrX contribution to complex trait heritability in both sexes is greater than expected given the presence of only a single active copy of chrX, mirroring potential dosage compensation between chrX and the autosomes. We find this enhanced contribution attributable to systematically larger active allele effects from chrX compared to autosomes in both sexes, independent of allele frequency and variant deleteriousness. Together, these findings support a model in which the two dosage-compensation mechanisms work in concert to balance the influence of chrX across the population while preserving sex-specific differences at a manageable level. Overall, our study advocates for more comprehensive locus discovery efforts in chrX.
Collapse
Affiliation(s)
- Yu Fu
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, 00014 Helsinki, Finland
| | - Aino Kenttämies
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, 00014 Helsinki, Finland
| | - Sanni Ruotsalainen
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, 00014 Helsinki, Finland
| | - Matti Pirinen
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, 00014 Helsinki, Finland; Department of Public Health, University of Helsinki, 00014 Helsinki, Finland; Department of Mathematics and Statistics, University of Helsinki, 00014 Helsinki, Finland
| | - Taru Tukiainen
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, 00014 Helsinki, Finland.
| |
Collapse
|
2
|
Fu B, Anand P, Anand A, Mefford J, Sankararaman S. A scalable adaptive quadratic kernel method for interpretable epistasis analysis in complex traits. Genome Res 2024; 34:1294-1303. [PMID: 39209554 PMCID: PMC11529862 DOI: 10.1101/gr.279140.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 08/13/2024] [Indexed: 09/04/2024]
Abstract
Our knowledge of the contribution of genetic interactions (epistasis) to variation in human complex traits remains limited, partly due to the lack of efficient, powerful, and interpretable algorithms to detect interactions. Recently proposed approaches for set-based association tests show promise in improving the power to detect epistasis by examining the aggregated effects of multiple variants. Nevertheless, these methods either do not scale to large Biobank data sets or lack interpretability. We propose QuadKAST, a scalable algorithm focused on testing pairwise interaction effects (quadratic effects) within small to medium-sized sets of genetic variants (window size ≤100) on a trait and provide quantified interpretation of these effects. Comprehensive simulations show that QuadKAST is well-calibrated. Additionally, QuadKAST is highly sensitive in detecting loci with epistatic signals and accurate in its estimation of quadratic effects. We applied QuadKAST to 52 quantitative phenotypes measured in ≈300,000 unrelated white British individuals in the UK Biobank to test for quadratic effects within each of 9515 protein-coding genes. We detect 32 trait-gene pairs across 17 traits and 29 genes that demonstrate statistically significant signals of quadratic effects (accounting for the number of genes and traits tested). Across these trait-gene pairs, the proportion of trait variance explained by quadratic effects is comparable to additive effects, with five pairs having a ratio >1. Our method enables the detailed investigation of epistasis on a large scale, offering new insights into its role and importance.
Collapse
Affiliation(s)
- Boyang Fu
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA;
| | - Prateek Anand
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Aakarsh Anand
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Joel Mefford
- Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, Los Angeles, California 90024, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA;
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California 90095, USA
| |
Collapse
|
3
|
Capalbo A, de Wert G, Mertes H, Klausner L, Coonen E, Spinella F, Van de Velde H, Viville S, Sermon K, Vermeulen N, Lencz T, Carmi S. Screening embryos for polygenic disease risk: a review of epidemiological, clinical, and ethical considerations. Hum Reprod Update 2024; 30:529-557. [PMID: 38805697 PMCID: PMC11369226 DOI: 10.1093/humupd/dmae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 03/25/2024] [Indexed: 05/30/2024] Open
Abstract
BACKGROUND The genetic composition of embryos generated by in vitro fertilization (IVF) can be examined with preimplantation genetic testing (PGT). Until recently, PGT was limited to detecting single-gene, high-risk pathogenic variants, large structural variants, and aneuploidy. Recent advances have made genome-wide genotyping of IVF embryos feasible and affordable, raising the possibility of screening embryos for their risk of polygenic diseases such as breast cancer, hypertension, diabetes, or schizophrenia. Despite a heated debate around this new technology, called polygenic embryo screening (PES; also PGT-P), it is already available to IVF patients in some countries. Several articles have studied epidemiological, clinical, and ethical perspectives on PES; however, a comprehensive, principled review of this emerging field is missing. OBJECTIVE AND RATIONALE This review has four main goals. First, given the interdisciplinary nature of PES studies, we aim to provide a self-contained educational background about PES to reproductive specialists interested in the subject. Second, we provide a comprehensive and critical review of arguments for and against the introduction of PES, crystallizing and prioritizing the key issues. We also cover the attitudes of IVF patients, clinicians, and the public towards PES. Third, we distinguish between possible future groups of PES patients, highlighting the benefits and harms pertaining to each group. Finally, our review, which is supported by ESHRE, is intended to aid healthcare professionals and policymakers in decision-making regarding whether to introduce PES in the clinic, and if so, how, and to whom. SEARCH METHODS We searched for PubMed-indexed articles published between 1/1/2003 and 1/3/2024 using the terms 'polygenic embryo screening', 'polygenic preimplantation', and 'PGT-P'. We limited the review to primary research papers in English whose main focus was PES for medical conditions. We also included papers that did not appear in the search but were deemed relevant. OUTCOMES The main theoretical benefit of PES is a reduction in lifetime polygenic disease risk for children born after screening. The magnitude of the risk reduction has been predicted based on statistical modelling, simulations, and sibling pair analyses. Results based on all methods suggest that under the best-case scenario, large relative risk reductions are possible for one or more diseases. However, as these models abstract several practical limitations, the realized benefits may be smaller, particularly due to a limited number of embryos and unclear future accuracy of the risk estimates. PES may negatively impact patients and their future children, as well as society. The main personal harms are an unindicated IVF treatment, a possible reduction in IVF success rates, and patient confusion, incomplete counselling, and choice overload. The main possible societal harms include discarded embryos, an increasing demand for 'designer babies', overemphasis of the genetic determinants of disease, unequal access, and lower utility in people of non-European ancestries. Benefits and harms will vary across the main potential patient groups, comprising patients already requiring IVF, fertile people with a history of a severe polygenic disease, and fertile healthy people. In the United States, the attitudes of IVF patients and the public towards PES seem positive, while healthcare professionals are cautious, sceptical about clinical utility, and concerned about patient counselling. WIDER IMPLICATIONS The theoretical potential of PES to reduce risk across multiple polygenic diseases requires further research into its benefits and harms. Given the large number of practical limitations and possible harms, particularly unnecessary IVF treatments and discarded viable embryos, PES should be offered only within a research context before further clarity is achieved regarding its balance of benefits and harms. The gap in attitudes between healthcare professionals and the public needs to be narrowed by expanding public and patient education and providing resources for informative and unbiased genetic counselling.
Collapse
Affiliation(s)
- Antonio Capalbo
- Juno Genetics, Department of Reproductive Genetics, Rome, Italy
- Center for Advanced Studies and Technology (CAST), Department of Medical Genetics, “G. d’Annunzio” University of Chieti-Pescara, Chieti, Italy
| | - Guido de Wert
- Department of Health, Ethics & Society, CAPHRI-School for Public Health and Primary Care and GROW School for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands
| | - Heidi Mertes
- Department of Philosophy and Moral Sciences, Ghent University, Ghent, Belgium
- Department of Public Health and Primary Care, Ghent University, Ghent, Belgium
| | - Liraz Klausner
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Edith Coonen
- Departments of Clinical Genetics and Reproductive Medicine, Maastricht University Medical Centre, Maastricht, The Netherlands
- School for Oncology and Developmental Biology, GROW, Maastricht University, Maastricht, The Netherlands
| | - Francesca Spinella
- Eurofins GENOMA Group Srl, Molecular Genetics Laboratories, Department of Scientific Communication, Rome, Italy
| | - Hilde Van de Velde
- Research Group Genetics Reproduction and Development (GRAD), Vrije Universiteit Brussel, Brussel, Belgium
- Brussels IVF, UZ Brussel, Brussel, Belgium
| | - Stephane Viville
- Laboratoire de Génétique Médicale LGM, Institut de Génétique Médicale d’Alsace IGMA, INSERM UMR 1112, Université de Strasbourg, France
- Laboratoire de Diagnostic Génétique, Unité de Génétique de l’infertilité (UF3472), Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Karen Sermon
- Research Group Genetics Reproduction and Development (GRAD), Vrije Universiteit Brussel, Brussel, Belgium
| | | | - Todd Lencz
- Institute of Behavioral Science, Feinstein Institutes for Medical Research, Manhasset, NY, USA
- Departments of Psychiatry and Molecular Medicine, Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY 11549, USA
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
4
|
Bao J, Lee BN, Wen J, Kim M, Mu S, Yang S, Davatzikos C, Long Q, Ritchie MD, Shen L. Employing Informatics Strategies in Alzheimer's Disease Research: A Review from Genetics, Multiomics, and Biomarkers to Clinical Outcomes. Annu Rev Biomed Data Sci 2024; 7:391-418. [PMID: 38848574 PMCID: PMC11525791 DOI: 10.1146/annurev-biodatasci-102423-121021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2024]
Abstract
Alzheimer's disease (AD) is a critical national concern, affecting 5.8 million people and costing more than $250 billion annually. However, there is no available cure. Thus, effective strategies are in urgent need to discover AD biomarkers for disease early detection and drug development. In this review, we study AD from a biomedical data scientist perspective to discuss the four fundamental components in AD research: genetics (G), molecular multiomics (M), multimodal imaging biomarkers (B), and clinical outcomes (O) (collectively referred to as the GMBO framework). We provide a comprehensive review of common statistical and informatics methodologies for each component within the GMBO framework, accompanied by the major findings from landmark AD studies. Our review highlights the potential of multimodal biobank data in addressing key challenges in AD, such as early diagnosis, disease heterogeneity, and therapeutic development. We identify major hurdles in AD research, including data scarcity and complexity, and advocate for enhanced collaboration, data harmonization, and advanced modeling techniques. This review aims to be an essential guide for understanding current biomedical data science strategies in AD research, emphasizing the need for integrated, multidisciplinary approaches to advance our understanding and management of AD.
Collapse
Affiliation(s)
- Jingxuan Bao
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA;
| | - Brian N Lee
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA;
| | - Junhao Wen
- Laboratory of AI and Biomedical Science (LABS), Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California, Los Angeles, California, USA
| | - Mansu Kim
- AI Graduate School, Gwangju Institute of Science and Technology, Gwangju, South Korea
| | - Shizhuo Mu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA;
| | - Shu Yang
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA;
| | - Christos Davatzikos
- Center for Biomedical Image Computing and Analytics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA;
| | - Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA;
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA;
| |
Collapse
|
5
|
Pazokitoroudi A, Liu Z, Dahl A, Zaitlen N, Rosset S, Sankararaman S. A scalable and robust variance components method reveals insights into the architecture of gene-environment interactions underlying complex traits. Am J Hum Genet 2024; 111:1462-1480. [PMID: 38866020 PMCID: PMC11267529 DOI: 10.1016/j.ajhg.2024.05.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 05/15/2024] [Accepted: 05/15/2024] [Indexed: 06/14/2024] Open
Abstract
Understanding the contribution of gene-environment interactions (GxE) to complex trait variation can provide insights into disease mechanisms, explain sources of heritability, and improve genetic risk prediction. While large biobanks with genetic and deep phenotypic data hold promise for obtaining novel insights into GxE, our understanding of GxE architecture in complex traits remains limited. We introduce a method to estimate the proportion of trait variance explained by GxE (GxE heritability) and additive genetic effects (additive heritability) across the genome and within specific genomic annotations. We show that our method is accurate in simulations and computationally efficient for biobank-scale datasets. We applied our method to common array SNPs (MAF ≥1%), fifty quantitative traits, and four environmental variables (smoking, sex, age, and statin usage) in unrelated white British individuals in the UK Biobank. We found 68 trait-E pairs with significant genome-wide GxE heritability (p<0.05/200) with a ratio of GxE to additive heritability of ≈6.8% on average. Analyzing ≈8 million imputed SNPs (MAF ≥0.1%), we documented an approximate 28% increase in genome-wide GxE heritability compared to array SNPs. We partitioned GxE heritability across minor allele frequency (MAF) and local linkage disequilibrium (LD) values, revealing that, like additive allelic effects, GxE allelic effects tend to increase with decreasing MAF and LD. Analyzing GxE heritability near genes highly expressed in specific tissues, we find significant brain-specific enrichment for body mass index (BMI) and basal metabolic rate in the context of smoking and adipose-specific enrichment for waist-hip ratio (WHR) in the context of sex.
Collapse
Affiliation(s)
- Ali Pazokitoroudi
- Department of Computer Science, UCLA, Los Angeles, CA, USA; Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Zhengtong Liu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Andrew Dahl
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Noah Zaitlen
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA; Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA; Department of Neurology, UCLA, Los Angeles, CA, USA
| | - Saharon Rosset
- Department of Statistics, Tel-Aviv University, Tel-Aviv, Israel
| | - Sriram Sankararaman
- Department of Computer Science, UCLA, Los Angeles, CA, USA; Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA; Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
6
|
Pattillo Smith S, Darnell G, Udwin D, Stamp J, Harpak A, Ramachandran S, Crawford L. Discovering non-additive heritability using additive GWAS summary statistics. eLife 2024; 13:e90459. [PMID: 38913556 PMCID: PMC11196113 DOI: 10.7554/elife.90459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 04/22/2024] [Indexed: 06/26/2024] Open
Abstract
LD score regression (LDSC) is a method to estimate narrow-sense heritability from genome-wide association study (GWAS) summary statistics alone, making it a fast and popular approach. In this work, we present interaction-LD score (i-LDSC) regression: an extension of the original LDSC framework that accounts for interactions between genetic variants. By studying a wide range of generative models in simulations, and by re-analyzing 25 well-studied quantitative phenotypes from 349,468 individuals in the UK Biobank and up to 159,095 individuals in BioBank Japan, we show that the inclusion of a cis-interaction score (i.e. interactions between a focal variant and proximal variants) recovers genetic variance that is not captured by LDSC. For each of the 25 traits analyzed in the UK Biobank and BioBank Japan, i-LDSC detects additional variation contributed by genetic interactions. The i-LDSC software and its application to these biobanks represent a step towards resolving further genetic contributions of sources of non-additive genetic effects to complex trait variation.
Collapse
Affiliation(s)
- Samuel Pattillo Smith
- Center for Computational Molecular Biology, Brown UniversityProvidenceUnited States
- Department of Ecology and Evolutionary Biology, Brown UniversityProvidenceUnited States
- Department of Integrative Biology, The University of Texas at AustinAustinUnited States
- Department of Population Health, The University of Texas at AustinAustinUnited States
| | - Gregory Darnell
- Center for Computational Molecular Biology, Brown UniversityProvidenceUnited States
- Institute for Computational and Experimental Research in Mathematics, Brown UniversityProvidenceUnited States
| | - Dana Udwin
- Department of Biostatistics, Brown UniversityProvidenceUnited States
| | - Julian Stamp
- Center for Computational Molecular Biology, Brown UniversityProvidenceUnited States
| | - Arbel Harpak
- Department of Integrative Biology, The University of Texas at AustinAustinUnited States
- Department of Population Health, The University of Texas at AustinAustinUnited States
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown UniversityProvidenceUnited States
- Department of Ecology and Evolutionary Biology, Brown UniversityProvidenceUnited States
- Data Science Institute, Brown UniversityProvidenceUnited States
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown UniversityProvidenceUnited States
- Department of Biostatistics, Brown UniversityProvidenceUnited States
- MicrosoftCambridgeUnited States
| |
Collapse
|
7
|
Ohta R, Tanigawa Y, Suzuki Y, Kellis M, Morishita S. A polygenic score method boosted by non-additive models. Nat Commun 2024; 15:4433. [PMID: 38811555 PMCID: PMC11522481 DOI: 10.1038/s41467-024-48654-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 05/08/2024] [Indexed: 05/31/2024] Open
Abstract
Dominance heritability in complex traits has received increasing recognition. However, most polygenic score (PGS) approaches do not incorporate non-additive effects. Here, we present GenoBoost, a flexible PGS modeling framework capable of considering both additive and non-additive effects, specifically focusing on genetic dominance. Building on statistical boosting theory, we derive provably optimal GenoBoost scores and provide its efficient implementation for analyzing large-scale cohorts. We benchmark it against seven commonly used PGS methods and demonstrate its competitive predictive performance. GenoBoost is ranked the best for four traits and second-best for three traits among twelve tested disease outcomes in UK Biobank. We reveal that GenoBoost improves prediction for autoimmune diseases by incorporating non-additive effects localized in the MHC locus and, more broadly, works best in less polygenic traits. We further demonstrate that GenoBoost can infer the mode of genetic inheritance without requiring prior knowledge. For example, GenoBoost finds non-zero genetic dominance effects for 602 of 900 selected genetic variants, resulting in 2.5% improvements in predicting psoriasis cases. Lastly, we show that GenoBoost can prioritize genetic loci with genetic dominance not previously reported in the GWAS catalog. Our results highlight the increased accuracy and biological insights from incorporating non-additive effects in PGS models.
Collapse
Affiliation(s)
- Rikifumi Ohta
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.
| | - Yosuke Tanigawa
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Yuta Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.
| |
Collapse
|
8
|
Tang D, Freudenberg J, Dahl A. Factorizing polygenic epistasis improves prediction and uncovers biological pathways in complex traits. Am J Hum Genet 2023; 110:1875-1887. [PMID: 37922884 PMCID: PMC10645564 DOI: 10.1016/j.ajhg.2023.10.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 10/04/2023] [Accepted: 10/05/2023] [Indexed: 11/07/2023] Open
Abstract
Epistasis is central in many domains of biology, but it has not yet been proven useful for understanding the etiology of complex traits. This is partly because complex-trait epistasis involves polygenic interactions that are poorly captured in current models. To address this gap, we developed a model called Epistasis Factor Analysis (EFA). EFA assumes that polygenic epistasis can be factorized into interactions between a few epistasis factors (EFs), which represent latent polygenic components of the observed complex trait. The statistical goals of EFA are to improve polygenic prediction and to increase power to detect epistasis, while the biological goal is to unravel genetic effects into more-homogeneous units. We mathematically characterize EFA and use simulations to show that EFA outperforms current epistasis models when its assumptions approximately hold. Applied to predicting yeast growth rates, EFA outperforms the additive model for several traits with large epistasis heritability and uniformly outperforms the standard epistasis model. We replicate these prediction improvements in a second dataset. We then apply EFA to four previously characterized traits in the UK Biobank and find statistically significant epistasis in all four, including two that are robust to scale transformation. Moreover, we find that the inferred EFs partly recover pre-defined biological pathways for two of the traits. Our results demonstrate that more realistic models can identify biologically and statistically meaningful epistasis in complex traits, indicating that epistasis has potential for precision medicine and characterizing the biology underlying GWAS results.
Collapse
Affiliation(s)
- David Tang
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA.
| | - Jerome Freudenberg
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA; Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Andy Dahl
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
9
|
Cui L, Yang B, Xiao S, Gao J, Baud A, Graham D, McBride M, Dominiczak A, Schafer S, Aumatell RL, Mont C, Teruel AF, Hübner N, Flint J, Mott R, Huang L. Dominance is common in mammals and is associated with trans-acting gene expression and alternative splicing. Genome Biol 2023; 24:215. [PMID: 37773188 PMCID: PMC10540365 DOI: 10.1186/s13059-023-03060-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 09/18/2023] [Indexed: 10/01/2023] Open
Abstract
BACKGROUND Dominance and other non-additive genetic effects arise from the interaction between alleles, and historically these phenomena play a major role in quantitative genetics. However, most genome-wide association studies (GWAS) assume alleles act additively. RESULTS We systematically investigate both dominance-here representing any non-additive within-locus interaction-and additivity across 574 physiological and gene expression traits in three mammalian stocks: F2 intercross pigs, rat heterogeneous stock, and mice heterogeneous stock. Dominance accounts for about one quarter of heritable variance across all physiological traits in all species. Hematological and immunological traits exhibit the highest dominance variance, possibly reflecting balancing selection in response to pathogens. Although most quantitative trait loci (QTLs) are detectable as additive QTLs, we identify 154, 64, and 62 novel dominance QTLs in pigs, rats, and mice respectively that are undetectable as additive QTLs. Similarly, even though most cis-acting expression QTLs are additive, gene expression exhibits a large fraction of dominance variance, and trans-acting eQTLs are enriched for dominance. Genes causal for dominance physiological QTLs are less likely to be physically linked to their QTLs but instead act via trans-acting dominance eQTLs. In addition, thousands of eQTLs are associated with alternatively spliced isoforms with complex additive and dominant architectures in heterogeneous stock rats, suggesting a possible mechanism for dominance. CONCLUSIONS Although heritability is predominantly additive, many mammalian genetic effects are dominant and likely arise through distinct mechanisms. It is therefore advantageous to consider both additive and dominance effects in GWAS to improve power and uncover causality.
Collapse
Affiliation(s)
- Leilei Cui
- National Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045, People's Republic of China
- UCL Genetics Institute, University College London, London, WC1E 6BT, UK
- Human Aging Research Institute and School of Life Science, Nanchang University, and Jiangxi Key Laboratory of Human Aging, Jiangxi, China
- School of Life Sciences, Nanchang University, Nanchang, China
| | - Bin Yang
- National Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045, People's Republic of China
| | - Shijun Xiao
- National Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045, People's Republic of China
| | - Jun Gao
- National Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045, People's Republic of China
| | - Amelie Baud
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Delyth Graham
- BHF Glasgow Cardiovascular Research Centre, University of Glasgow, Glasgow, G12 8TA, UK
| | - Martin McBride
- BHF Glasgow Cardiovascular Research Centre, University of Glasgow, Glasgow, G12 8TA, UK
| | - Anna Dominiczak
- BHF Glasgow Cardiovascular Research Centre, University of Glasgow, Glasgow, G12 8TA, UK
| | - Sebastian Schafer
- Cardiovascular and Metabolic Disorders Program, Duke-National University of Singapore Medical School, Singapore, Singapore
| | - Regina Lopez Aumatell
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Carme Mont
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Albert Fernandez Teruel
- Departamento de Psiquiatría y Medicina Legal, Universitat Autonoma de Barcelona, Barcelona, Spain
| | - Norbert Hübner
- Genetics and Genomics of Cardiovascular Diseases Research Group, Max Delbrück Center (MDC) for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- DZHK (German Center for Cardiovascular Research) Partner Site Berlin, Berlin, Germany
- Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Jonathan Flint
- Department of Psychiatry and Behavioral Sciences, Brain Research Institute, University of California, Los Angeles, CA, USA
| | - Richard Mott
- UCL Genetics Institute, University College London, London, WC1E 6BT, UK.
| | - Lusheng Huang
- National Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045, People's Republic of China.
| |
Collapse
|
10
|
Fu B, Pazokitoroudi A, Sudarshan M, Liu Z, Subramanian L, Sankararaman S. Fast kernel-based association testing of non-linear genetic effects for biobank-scale data. Nat Commun 2023; 14:4936. [PMID: 37582955 PMCID: PMC10427662 DOI: 10.1038/s41467-023-40346-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 07/18/2023] [Indexed: 08/17/2023] Open
Abstract
Our knowledge of non-linear genetic effects on complex traits remains limited, in part, due to the modest power to detect such effects. While kernel-based tests offer a versatile approach to test for non-linear relationships between sets of genetic variants and traits, current approaches cannot be applied to Biobank-scale datasets containing hundreds of thousands of individuals. We propose, FastKAST, a kernel-based approach that can test for non-linear effects of a set of variants on a quantitative trait. FastKAST provides calibrated hypothesis tests while enabling analysis of Biobank-scale datasets with hundreds of thousands of unrelated individuals from a homogeneous population. We apply FastKAST to 53 quantitative traits measured across ≈ 300 K unrelated white British individuals in the UK Biobank to detect sets of variants with non-linear effects at genome-wide significance.
Collapse
Affiliation(s)
- Boyang Fu
- Department of Computer Science, UCLA, Los Angeles, CA, USA.
| | | | - Mukund Sudarshan
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| | - Zhengtong Liu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Lakshminarayanan Subramanian
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
- Department of Population Health, NYU Grossman School of Medicine, New York, NY, USA
| | - Sriram Sankararaman
- Department of Computer Science, UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
11
|
Stamp J, DenAdel A, Weinreich D, Crawford L. Leveraging the genetic correlation between traits improves the detection of epistasis in genome-wide association studies. G3 (BETHESDA, MD.) 2023; 13:jkad118. [PMID: 37243672 PMCID: PMC10484060 DOI: 10.1093/g3journal/jkad118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 01/11/2023] [Accepted: 05/23/2023] [Indexed: 05/29/2023]
Abstract
Epistasis, commonly defined as the interaction between genetic loci, is known to play an important role in the phenotypic variation of complex traits. As a result, many statistical methods have been developed to identify genetic variants that are involved in epistasis, and nearly all of these approaches carry out this task by focusing on analyzing one trait at a time. Previous studies have shown that jointly modeling multiple phenotypes can often dramatically increase statistical power for association mapping. In this study, we present the "multivariate MArginal ePIstasis Test" (mvMAPIT)-a multioutcome generalization of a recently proposed epistatic detection method which seeks to detect marginal epistasis or the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact-thus, potentially alleviating much of the statistical and computational burden associated with conventional explicit search-based methods. Our proposed mvMAPIT builds upon this strategy by taking advantage of correlation structure between traits to improve the identification of variants involved in epistasis. We formulate mvMAPIT as a multivariate linear mixed model and develop a multitrait variance component estimation algorithm for efficient parameter inference and P-value computation. Together with reasonable model approximations, our proposed approach is scalable to moderately sized genome-wide association studies. With simulations, we illustrate the benefits of mvMAPIT over univariate (or single-trait) epistatic mapping strategies. We also apply mvMAPIT framework to protein sequence data from two broadly neutralizing anti-influenza antibodies and approximately 2,000 heterogeneous stock of mice from the Wellcome Trust Centre for Human Genetics. The mvMAPIT R package can be downloaded at https://github.com/lcrawlab/mvMAPIT.
Collapse
Affiliation(s)
- Julian Stamp
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
| | - Alan DenAdel
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
| | - Daniel Weinreich
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
- Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02906, USA
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
- Department of Biostatistics, Brown University, Providence, RI 02903, USA
- Microsoft Research New England, Cambridge, MA 02142, USA
| |
Collapse
|
12
|
Ren J, Lin Z, He R, Shen X, Pan W. Using GWAS summary data to impute traits for genotyped individuals. HGG ADVANCES 2023; 4:100197. [PMID: 37181332 PMCID: PMC10173780 DOI: 10.1016/j.xhgg.2023.100197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 04/07/2023] [Indexed: 05/16/2023] Open
Abstract
Genome-wide association study (GWAS) summary data have become extremely useful in daily routine data analysis, largely facilitating new methods development and new applications. However, a severe limitation with the current use of GWAS summary data is its exclusive restriction to only linear single nucleotide polymorphism (SNP)-trait association analyses. To further expand the use of GWAS summary data, along with a large sample of individual-level genotypes, we propose a nonparametric method for large-scale imputation of the genetic component of the trait for the given genotypes. The imputed individual-level trait values, along with the individual-level genotypes, make it possible to conduct any analysis as with individual-level GWAS data, including nonlinear SNP-trait associations and predictions. We use the UK Biobank data to highlight the usefulness and effectiveness of the proposed method in three applications that currently cannot be done with only GWAS summary data (for SNP-trait associations): marginal SNP-trait association analysis under non-additive genetic models, detection of SNP-SNP interactions, and genetic prediction of a trait using a nonlinear model of SNPs.
Collapse
Affiliation(s)
- Jingchen Ren
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Zhaotong Lin
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Ruoyu He
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
13
|
Palmer DS, Zhou W, Abbott L, Wigdor EM, Baya N, Churchhouse C, Seed C, Poterba T, King D, Kanai M, Bloemendal A, Neale BM. Analysis of genetic dominance in the UK Biobank. Science 2023; 379:1341-1348. [PMID: 36996212 PMCID: PMC10345642 DOI: 10.1126/science.abn8455] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 02/15/2023] [Indexed: 04/01/2023]
Abstract
Classical statistical genetics theory defines dominance as any deviation from a purely additive, or dosage, effect of a genotype on a trait, which is known as the dominance deviation. Dominance is well documented in plant and animal breeding. Outside of rare monogenic traits, however, evidence in humans is limited. We systematically examined common genetic variation across 1060 traits in a large population cohort (UK Biobank, N = 361,194 samples analyzed) for evidence of dominance effects. We then developed a computationally efficient method to rapidly assess the aggregate contribution of dominance deviations to heritability. Lastly, observing that dominance associations are inherently less correlated between sites at a genomic locus than their additive counterparts, we explored whether they may be leveraged to identify causal variants more confidently.
Collapse
Affiliation(s)
- Duncan S. Palmer
- Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Wei Zhou
- Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Liam Abbott
- Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Nikolas Baya
- Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Claire Churchhouse
- Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Cotton Seed
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Tim Poterba
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Daniel King
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Masahiro Kanai
- Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alex Bloemendal
- Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Benjamin M. Neale
- Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
14
|
Shi G. Genome-wide variance quantitative trait locus analysis suggests small interaction effects in blood pressure traits. Sci Rep 2022; 12:12649. [PMID: 35879408 PMCID: PMC9314370 DOI: 10.1038/s41598-022-16908-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 07/18/2022] [Indexed: 11/09/2022] Open
Abstract
Genome-wide variance quantitative trait loci (vQTL) analysis complements genome-wide association study (GWAS) and has the potential to identify novel variants associated with the trait, explain additional trait variance and lead to the identification of factors that modulate the genetic effects. I conducted genome-wide analysis of the UK Biobank data and identified 27 vQTLs associated with systolic blood pressure (SBP), diastolic blood pressure (DBP) and pulse pressure (PP). The top single-nucleotide polymorphisms (SNPs) are enriched for expression QTLs (eQTLs) or splicing QTLs (sQTLs) annotated by GTEx, suggesting their regulatory roles in mediating the associations with blood pressure (BP). Of the 27 vQTLs, 14 are known BP-associated QTLs discovered by GWASs. The heteroscedasticity effects of the 13 novel vQTLs are larger than their genetic main effects, which were not detected by existing GWASs. The total R-squared of the 27 top SNPs due to variance heteroscedasticity is 0.28%, compared with 0.50% owing to their main effects. The overall effect size of the variance heteroscedasticity is small in GWAS SNPs compared with their main effects. For the 411, 384 and 285 GWAS SNPs associated with SBP, DBP and PP, respectively, their heteroscedasticity effects were 0.52%, 0.43%, and 0.16%, and their main effects were 5.13%, 5.61%, and 3.75%, respectively. The number and effects of the vQTLs are small, which suggests that the effects of gene-environment and gene-gene interactions are small. The main effects of the SNPs remain the major source of genetic variance for BP, which would probably be true for other complex traits as well.
Collapse
Affiliation(s)
- Gang Shi
- School of Telecommunications Engineering, Xidian University, 2 South Taibai Road, Xi'an, 710071, Shaanxi, China.
| |
Collapse
|
15
|
From Mendel to quantitative genetics in the genome era: the scientific legacy of W. G. Hill. Nat Genet 2022; 54:934-939. [PMID: 35817969 DOI: 10.1038/s41588-022-01103-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 05/18/2022] [Indexed: 11/08/2022]
Abstract
The quantitative geneticist W. G. ('Bill') Hill, awardee of the 2018 Darwin Medal of the Royal Society and the 2019 Mendel Medal of the Genetics Society (United Kingdom), died on 17 December 2021 at the age of 81 years. Here, we pay tribute to his multiple key scientific contributions, which span population and evolutionary genetics, animal and plant breeding and human genetics. We discuss his theoretical research on the role of linkage disequilibrium (LD) and mutational variance in the response to selection, the origin of the widely used LD metric r2 in genomic association studies, the genetic architecture of complex traits, the quantification of the variation in realized relationships given a pedigree relationship and much more. We demonstrate that basic theoretical research in quantitative and statistical genetics has led to profound insights into the genetics and evolution of complex traits and made predictions that were subsequently empirically validated, often decades later.
Collapse
|
16
|
Deng WQ, Sun L. gJLS2: an R package for generalized joint location and scale analysis in X-inclusive genome-wide association studies. G3 GENES|GENOMES|GENETICS 2022; 12:6535712. [PMID: 35201341 PMCID: PMC8982384 DOI: 10.1093/g3journal/jkac049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Accepted: 02/17/2022] [Indexed: 11/12/2022]
Abstract
A joint analysis of location and scale can be a powerful tool in genome-wide association studies to uncover previously overlooked markers that influence a quantitative trait through both mean and variance, as well as to prioritize candidates for gene–environment interactions. This approach has recently been generalized to handle related samples, dosage data, and the analytically challenging X-chromosome. We disseminate the latest advances in methodology through a user-friendly R software package with added functionalities to support genome-wide analysis on individual-level or summary-level data. The implemented R package can be called from PLINK or directly in a scripting environment, to enable a streamlined genome-wide analysis for biobank-scale data. Application results on individual-level and summary-level data highlight the advantage of the joint test to discover more genome-wide signals as compared to a location or scale test alone. We hope the availability of gJLS2 software package will encourage more scale and/or joint analyses in large-scale datasets, and promote the standardized reporting of their P-values to be shared with the scientific community.
Collapse
Affiliation(s)
- Wei Q Deng
- Department of Psychiatry and Behavioural Neurosciences, McMaster University, Hamilton, ON L8P 3R2, Canada
- Peter Boris Centre for Addictions Research, St. Joseph’s Healthcare Hamilton, McMaster University, Hamilton, ON L8P 3R2, Canada
| | - Lei Sun
- Department of Statistical Sciences, University of Toronto, Toronto, ON M5G 1Z5, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada
| |
Collapse
|
17
|
Okbay A, Wu Y, Wang N, Jayashankar H, Bennett M, Nehzati SM, Sidorenko J, Kweon H, Goldman G, Gjorgjieva T, Jiang Y, Hicks B, Tian C, Hinds DA, Ahlskog R, Magnusson PKE, Oskarsson S, Hayward C, Campbell A, Porteous DJ, Freese J, Herd P, Watson C, Jala J, Conley D, Koellinger PD, Johannesson M, Laibson D, Meyer MN, Lee JJ, Kong A, Yengo L, Cesarini D, Turley P, Visscher PM, Beauchamp JP, Benjamin DJ, Young AI. Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nat Genet 2022; 54:437-449. [PMID: 35361970 PMCID: PMC9005349 DOI: 10.1038/s41588-022-01016-z] [Citation(s) in RCA: 296] [Impact Index Per Article: 98.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 01/20/2022] [Indexed: 12/14/2022]
Abstract
We conduct a genome-wide association study (GWAS) of educational attainment (EA) in a sample of ~3 million individuals and identify 3,952 approximately uncorrelated genome-wide-significant single-nucleotide polymorphisms (SNPs). A genome-wide polygenic predictor, or polygenic index (PGI), explains 12-16% of EA variance and contributes to risk prediction for ten diseases. Direct effects (i.e., controlling for parental PGIs) explain roughly half the PGI's magnitude of association with EA and other phenotypes. The correlation between mate-pair PGIs is far too large to be consistent with phenotypic assortment alone, implying additional assortment on PGI-associated factors. In an additional GWAS of dominance deviations from the additive model, we identify no genome-wide-significant SNPs, and a separate X-chromosome additive GWAS identifies 57.
Collapse
Affiliation(s)
- Aysu Okbay
- Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands.
| | - Yeda Wu
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| | - Nancy Wang
- National Bureau of Economic Research, Cambridge, MA, USA
| | | | | | | | - Julia Sidorenko
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| | - Hyeokmoon Kweon
- Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Grant Goldman
- National Bureau of Economic Research, Cambridge, MA, USA
| | | | | | | | | | | | - Rafael Ahlskog
- Department of Government, Uppsala University, Uppsala, Sweden
| | - Patrik K E Magnusson
- Swedish Twin Registry, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Sven Oskarsson
- Department of Government, Uppsala University, Uppsala, Sweden
| | - Caroline Hayward
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - Archie Campbell
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Edinburgh, UK
- Usher Institute, University of Edinburgh, Edinburgh, UK
| | - David J Porteous
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Edinburgh, UK
- Usher Institute, University of Edinburgh, Edinburgh, UK
- Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, UK
| | - Jeremy Freese
- Department of Sociology, Stanford University, Stanford, CA, USA
| | - Pamela Herd
- McCourt School of Public Policy, Georgetown University, Washington, DC, USA
| | - Chelsea Watson
- UCLA Anderson School of Management, Los Angeles, CA, USA
| | - Jonathan Jala
- UCLA Anderson School of Management, Los Angeles, CA, USA
| | - Dalton Conley
- Department of Sociology, Princeton University, Princeton, NJ, USA
| | - Philipp D Koellinger
- Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
- Robert M. La Follette School of Public Affairs, University of Wisconsin-Madison, Madison, WI, USA
| | - Magnus Johannesson
- Department of Economics, Stockholm School of Economics, Stockholm, Sweden
| | - David Laibson
- Department of Economics, Harvard University, Cambridge, MA, USA
| | - Michelle N Meyer
- Center for Translational Bioethics and Health Care Policy, Geisinger Health System, Danville, PA, USA
| | - James J Lee
- Department of Psychology, University of Minnesota Twin Cities, Minneapolis, MN, USA
| | - Augustine Kong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Loic Yengo
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| | - David Cesarini
- National Bureau of Economic Research, Cambridge, MA, USA
- Department of Economics, New York University, New York, NY, USA
- Center for Experimental Social Science, New York University, New York, NY, USA
| | - Patrick Turley
- Department of Economics, University of Southern California, Los Angeles, CA, USA
- Center for Economic and Social Research, University of Southern California, Los Angeles, CA, USA
| | - Peter M Visscher
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia.
| | - Jonathan P Beauchamp
- Interdisciplinary Center for Economic Science and Department of Economics, George Mason University, Fairfax, VA, USA
| | - Daniel J Benjamin
- National Bureau of Economic Research, Cambridge, MA, USA.
- UCLA Anderson School of Management, Los Angeles, CA, USA.
- Human Genetics Department, UCLA David Geffen School of Medicine, Los Angeles, CA, USA.
| | - Alexander I Young
- UCLA Anderson School of Management, Los Angeles, CA, USA.
- Human Genetics Department, UCLA David Geffen School of Medicine, Los Angeles, CA, USA.
| |
Collapse
|
18
|
Balagué-Dobón L, Cáceres A, González JR. Fully exploiting SNP arrays: a systematic review on the tools to extract underlying genomic structure. Brief Bioinform 2022; 23:bbac043. [PMID: 35211719 PMCID: PMC8921734 DOI: 10.1093/bib/bbac043] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 01/25/2022] [Accepted: 01/28/2022] [Indexed: 12/12/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) are the most abundant type of genomic variation and the most accessible to genotype in large cohorts. However, they individually explain a small proportion of phenotypic differences between individuals. Ancestry, collective SNP effects, structural variants, somatic mutations or even differences in historic recombination can potentially explain a high percentage of genomic divergence. These genetic differences can be infrequent or laborious to characterize; however, many of them leave distinctive marks on the SNPs across the genome allowing their study in large population samples. Consequently, several methods have been developed over the last decade to detect and analyze different genomic structures using SNP arrays, to complement genome-wide association studies and determine the contribution of these structures to explain the phenotypic differences between individuals. We present an up-to-date collection of available bioinformatics tools that can be used to extract relevant genomic information from SNP array data including population structure and ancestry; polygenic risk scores; identity-by-descent fragments; linkage disequilibrium; heritability and structural variants such as inversions, copy number variants, genetic mosaicisms and recombination histories. From a systematic review of recently published applications of the methods, we describe the main characteristics of R packages, command-line tools and desktop applications, both free and commercial, to help make the most of a large amount of publicly available SNP data.
Collapse
|
19
|
Willoughby EA, McGue M, Iacono WG, Lee JJ. Genetic and environmental contributions to IQ in adoptive and biological families with 30-year-old offspring. INTELLIGENCE 2021; 88. [PMID: 34658462 DOI: 10.1016/j.intell.2021.101579] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
While adoption studies have provided key insights into the influence of the familial environment on IQ scores of adolescents and children, few have followed adopted offspring long past the time spent living in the family home. To improve confidence about the extent to which shared environment exerts enduring effects on IQ, we estimated genetic and environmental effects on adulthood IQ in a unique sample of 486 biological and adoptive families. These families, tested previously on measures of IQ when offspring averaged age 15, were assessed a second time nearly two decades later ( M offspring age = 32 years). We estimated the proportions of the variance in IQ attributable to environmentally mediated effects of parental IQs, sibling-specific shared environment, and gene-environment covariance to be .01 [95% CI .00, .02], .04 [95% CI .00, .15], and .03 [95% CI .00, .07] respectively; these components jointly accounted for 8 percent of the IQ variance in adulthood. The heritability was estimated to be .42 [95% CI .21, .64]. Together, these findings provide further evidence for the predominance of genetic influences on adult intelligence over any other systematic source of variation.
Collapse
Affiliation(s)
- Emily A Willoughby
- University of Minnesota Twin Cities, Department of Psychology 75 E River Rd, Minneapolis, Minnesota 55455
| | - Matt McGue
- University of Minnesota Twin Cities, Department of Psychology 75 E River Rd, Minneapolis, Minnesota 55455
| | - William G Iacono
- University of Minnesota Twin Cities, Department of Psychology 75 E River Rd, Minneapolis, Minnesota 55455
| | - James J Lee
- University of Minnesota Twin Cities, Department of Psychology 75 E River Rd, Minneapolis, Minnesota 55455
| |
Collapse
|
20
|
Chen B, Craiu RV, Strug LJ, Sun L. The X factor: A robust and powerful approach to X-chromosome-inclusive whole-genome association studies. Genet Epidemiol 2021; 45:694-709. [PMID: 34224641 PMCID: PMC9292551 DOI: 10.1002/gepi.22422] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 05/14/2021] [Accepted: 05/28/2021] [Indexed: 12/17/2022]
Abstract
The X‐chromosome is often excluded from genome‐wide association studies because of analytical challenges. Some of the problems, such as the random, skewed, or no X‐inactivation model uncertainty, have been investigated. Other considerations have received little to no attention, such as the value in considering nonadditive and gene–sex interaction effects, and the inferential consequence of choosing different baseline alleles (i.e., the reference vs. the alternative allele). Here we propose a unified and flexible regression‐based association test for X‐chromosomal variants. We provide theoretical justifications for its robustness in the presence of various model uncertainties, as well as for its improved power when compared with the existing approaches under certain scenarios. For completeness, we also revisit the autosomes and show that the proposed framework leads to a more robust approach than the standard method. Finally, we provide supporting evidence by revisiting several published association studies. Supporting Information for this article are available online.
Collapse
Affiliation(s)
- Bo Chen
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Radu V Craiu
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Lisa J Strug
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada.,Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.,Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Lei Sun
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada.,Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
21
|
Abstract
Fisher's partitioning of genotypic values and genetic variance is highly relevant in the current era of genome-wide association studies (GWASs). However, despite being more than a century old, a number of persistent misconceptions related to nonadditive genetic effects remain. We developed a user-friendly web tool, the Falconer ShinyApp, to show how the combination of gene action and allele frequencies at causal loci translate to genetic variance and genetic variance components for a complex trait. The app can be used to demonstrate the relationship between a SNP effect size estimated from GWAS and the variation the SNP generates in the population, i.e., how locus-specific effects lead to individual differences in traits. In addition, it can also be used to demonstrate how within and between locus interactions (dominance and epistasis, respectively) usually do not lead to a large amount of nonadditive variance relative to additive variance, and therefore, that these interactions usually do not explain individual differences in a population.
Collapse
Affiliation(s)
- Valentin Hivert
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Naomi R. Wray
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
- Queensland Brain Institute, The University of Queensland, Brisbane, Queensland, Australia
| | - Peter M. Visscher
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|