1
|
Zouaghi Y, Alpern D, Gardeux V, Russeil J, Deplancke B, Santoni F, Pitteloud N, Messina A. Transcriptomic profiling of murine GnRH neurons reveals developmental trajectories linked to human reproduction and infertility. Theranostics 2025; 15:3673-3692. [PMID: 40093908 PMCID: PMC11905127 DOI: 10.7150/thno.91873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 01/22/2025] [Indexed: 03/19/2025] Open
Abstract
Rationale: Neurons producing Gonadotropin-Releasing Hormone (GnRH) are essential for human reproduction and have to migrate from nose to brain during prenatal life. Impaired GnRH neuron biology results in alterations of the reproductive axis, including delayed puberty and infertility, with considerable effects on quality of life and metabolic health. Although various genes have been implicated, the molecular causes of these conditions remain elusive, with most patients lacking a genetic diagnosis. Methods: GnRH neurons and non-GnRH cells were FACS-isolated from mouse embryo microdissections to perform high-resolution transcriptomic profiling during mouse embryonic development. We analyzed our dataset to reveal GnRH neuron molecular identity, gene expression dynamics, and cell-to-cell communication. The spatial context of candidate genes was validated using in situ hybridization and spatial transcriptomic analysis. The possible links with human reproduction in health and disease were explored using enrichment analysis on GWAS data and analyzing the genetic burden of patients with congenital GnRH deficiency. Results: GnRH neurons undergo a profound transcriptional shift as they migrate from the nose to the brain and display expression trajectories associating with distinct biological processes, including cell migration, neuronal projections, and synapse formation. We revealed a timely and spatially restricted modulation of signaling pathways involving known and novel molecules, including Semaphorins and Neurexins, respectively. A particular set of genes, whose expression in GnRH neurons timely rises in late developmental stages, showed a strong association with GWAS genes linked with human reproductive onset. Finally, some of the identified trajectories harbor a diagnostic potential for congenital hypogonadism. This is supported by genetic analysis in a large cohort of patients affected by congenital GnRH deficiency, revealing a high mutation burden in patients compared to healthy controls. Conclusion: We charted the landscape of gene expression dynamics underlying murine GnRH neuron embryonic development. Our study highlights new genes in GnRH neuron development and provides novel insights linking those genes with human reproduction.
Collapse
Affiliation(s)
- Yassine Zouaghi
- Department of Endocrinology, Diabetes and Metabolism, Centre Hospitalier Universitaire Vaudois (CHUV), 1011 Lausanne, Switzerland
- Faculty of Biology and Medicine, University of Lausanne, 1011 Lausanne, Switzerland
| | - Daniel Alpern
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Vincent Gardeux
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Julie Russeil
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Federico Santoni
- Department of Endocrinology, Diabetes and Metabolism, Centre Hospitalier Universitaire Vaudois (CHUV), 1011 Lausanne, Switzerland
- Faculty of Biology and Medicine, University of Lausanne, 1011 Lausanne, Switzerland
| | - Nelly Pitteloud
- Department of Endocrinology, Diabetes and Metabolism, Centre Hospitalier Universitaire Vaudois (CHUV), 1011 Lausanne, Switzerland
- Faculty of Biology and Medicine, University of Lausanne, 1011 Lausanne, Switzerland
| | - Andrea Messina
- Department of Endocrinology, Diabetes and Metabolism, Centre Hospitalier Universitaire Vaudois (CHUV), 1011 Lausanne, Switzerland
- Faculty of Biology and Medicine, University of Lausanne, 1011 Lausanne, Switzerland
| |
Collapse
|
2
|
Townsend HA, Rosenberger KJ, Vanderlinden LA, Inamo J, Zhang F. Evaluating methods for integrating single-cell data and genetics to understand inflammatory disease complexity. Front Immunol 2024; 15:1454263. [PMID: 39703500 PMCID: PMC11655331 DOI: 10.3389/fimmu.2024.1454263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Accepted: 11/07/2024] [Indexed: 12/21/2024] Open
Abstract
Background Understanding genetic underpinnings of immune-mediated inflammatory diseases is crucial to improve treatments. Single-cell RNA sequencing (scRNA-seq) identifies cell states expanded in disease, but often overlooks genetic causality due to cost and small genotyping cohorts. Conversely, large genome-wide association studies (GWAS) are commonly accessible. Methods We present a 3-step robust benchmarking analysis of integrating GWAS and scRNA-seq to identify genetically relevant cell states and genes in inflammatory diseases. First, we applied and compared the results of three recent algorithms, based on pathways (scGWAS), single-cell disease scores (scDRS), or both (scPagwas), according to accuracy/sensitivity and interpretability. While previous studies focused on coarse cell types, we used disease-specific, fine-grained single-cell atlases (183,742 and 228,211 cells) and GWAS data (Ns of 97,173 and 45,975) for rheumatoid arthritis (RA) and ulcerative colitis (UC). Second, given the lack of scRNA-seq for many diseases with GWAS, we further tested the tools' resolution limits by differentiating between similar diseases with only one fine-grained scRNA-seq atlas. Lastly, we provide a novel evaluation of noncoding SNP incorporation methods by testing which enabled the highest sensitivity/accuracy of known cell-state calls. Results We first found that single-cell based tools scDRS and scPagwas called superior numbers of supported cell states that were overlooked by scGWAS. While scGWAS and scPagwas were advantageous for gene exploration, scDRS effectively accounted for batch effect and captured cellular heterogeneity of disease-relevance without single-cell genotyping. For noncoding SNP integration, we found a key trade-off between statistical power and confidence with positional (e.g. MAGMA) and non-positional approaches (e.g. chromatin-interaction, eQTL). Even when directly incorporating noncoding SNPs through 5' scRNA-seq measures of regulatory elements, non disease-specific atlases gave misleading results by not containing disease-tissue specific transcriptomic patterns. Despite this criticality of tissue-specific scRNA-seq, we showed that scDRS enabled deconvolution of two similar diseases with a single fine-grained scRNA-seq atlas and separate GWAS. Indeed, we identified supported and novel genetic-phenotype linkages separating RA and ankylosing spondylitis, and UC and crohn's disease. Overall, while noting evolving single-cell technologies, our study provides key findings for integrating expanding fine-grained scRNA-seq, GWAS, and noncoding SNP resources to unravel the complexities of inflammatory diseases.
Collapse
Affiliation(s)
- Hope A. Townsend
- Biofrontiers Institute, University of Colorado Boulder, Boulder, CO, United States
- Department of Molecular, Cellular, Developmental Biology, University of Colorado Boulder, Boulder, CO, United States
| | - Kaylee J. Rosenberger
- Biofrontiers Institute, University of Colorado Boulder, Boulder, CO, United States
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, United States
| | - Lauren A. Vanderlinden
- Department of Medicine, Division of Rheumatology, University of Colorado Anschutz Medical Campus, Denver, CO, United States
- Department of Biomedical Informatics, Center for Health AI, University of Colorado Anschutz Medical Campus, Denver, CO, United States
| | - Jun Inamo
- Department of Medicine, Division of Rheumatology, University of Colorado Anschutz Medical Campus, Denver, CO, United States
- Department of Biomedical Informatics, Center for Health AI, University of Colorado Anschutz Medical Campus, Denver, CO, United States
| | - Fan Zhang
- Biofrontiers Institute, University of Colorado Boulder, Boulder, CO, United States
- Department of Medicine, Division of Rheumatology, University of Colorado Anschutz Medical Campus, Denver, CO, United States
- Department of Biomedical Informatics, Center for Health AI, University of Colorado Anschutz Medical Campus, Denver, CO, United States
| |
Collapse
|
3
|
Wang L, Zhang S. Investigating the Causal Effects of Exercise-Induced Genes on Sarcopenia. Int J Mol Sci 2024; 25:10773. [PMID: 39409102 PMCID: PMC11476887 DOI: 10.3390/ijms251910773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 09/29/2024] [Accepted: 09/30/2024] [Indexed: 10/20/2024] Open
Abstract
Exercise is increasingly recognized as an effective strategy to counteract skeletal muscle aging and conditions such as sarcopenia. However, the specific exercise-induced genes responsible for these protective effects remain unclear. To address this, we conducted an eight-week aerobic exercise regimen on late-middle-aged mice and developed an integrated approach that combines mouse exercise-induced genes with human GWAS datasets to identify causal genes for sarcopenia. This approach led to significant improvements in the skeletal muscle phenotype of the mice and the identification of exercise-induced genes and miRNAs. By constructing a miRNA regulatory network enriched with transcription factors and GWAS signals related to muscle function and traits, we focused on 896 exercise-induced genes. Using human skeletal muscle cis-eQTLs as instrumental variables, 250 of these exercise-induced genes underwent two-sample Mendelian randomization analysis, identifying 40, 68, and 62 causal genes associated with sarcopenia and its clinical indicators-appendicular lean mass (ALM) and hand grip strength (HGS), respectively. Sensitivity analyses and cross-phenotype validation confirmed the robustness of our findings. Consistently across the three outcomes, RXRA, MDM1, RBL2, KCNJ2, and ADHFE1 were identified as risk factors, while NMB, TECPR2, MGAT3, ECHDC2, and GINM1 were identified as protective factors, all with potential as biomarkers for sarcopenia progression. Biological activity and disease association analyses suggested that exercise exerts its anti-sarcopenia effects primarily through the regulation of fatty acid oxidation. Based on available drug-gene interaction data, 21 of the causal genes are druggable, offering potential therapeutic targets. Our findings highlight key genes and molecular pathways potentially responsible for the anti-sarcopenia benefits of exercise, offering insights into future therapeutic strategies that could mimic the safe and mild protective effects of exercise on age-related skeletal muscle degeneration.
Collapse
Affiliation(s)
- Li Wang
- Institute of Sports Medicine and Health, Chengdu Sport University, Chengdu 610041, China
| | - Song Zhang
- College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China;
| |
Collapse
|
4
|
Fu S, Wheeler W, Wang X, Hua X, Godbole D, Duan J, Zhu B, Deng L, Qin F, Zhang H, Shi J, Yu K. A comprehensive framework for trans-ancestry pathway analysis using GWAS summary data from diverse populations. PLoS Genet 2024; 20:e1011322. [PMID: 39441834 PMCID: PMC11534268 DOI: 10.1371/journal.pgen.1011322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 11/04/2024] [Accepted: 10/07/2024] [Indexed: 10/25/2024] Open
Abstract
As more multi-ancestry GWAS summary data become available, we have developed a comprehensive trans-ancestry pathway analysis framework that effectively utilizes this diverse genetic information. Within this framework, we evaluated various strategies for integrating genetic data at different levels-SNP, gene, and pathway-from multiple ancestry groups. Through extensive simulation studies, we have identified robust strategies that demonstrate superior performance across diverse scenarios. Applying these methods, we analyzed 6,970 pathways for their association with schizophrenia, incorporating data from African, East Asian, and European populations. Our analysis identified over 200 pathways significantly associated with schizophrenia, even after excluding genes near genome-wide significant loci. This approach substantially enhances detection efficiency compared to traditional single-ancestry pathway analysis and the conventional approach that amalgamates single-ancestry pathway analysis results across different ancestry groups. Our framework provides a flexible and effective tool for leveraging the expanding pool of multi-ancestry GWAS summary data, thereby improving our ability to identify biologically relevant pathways that contribute to disease susceptibility.
Collapse
Affiliation(s)
- Sheng Fu
- School of Statistics and Data Science, Nankai University, Tianjin, China
- Key Laboratory of Pure Mathematics and Combinatorics, Nankai University, Tianjin, China
| | - William Wheeler
- Information Management Services, Inc, Bethesda, Maryland, United States of America
| | - Xiaoyu Wang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc, Rockville, Maryland, United States of America
| | - Xing Hua
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc, Rockville, Maryland, United States of America
| | - Devika Godbole
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc, Rockville, Maryland, United States of America
| | - Jubao Duan
- Center for Psychiatric Genetics, NorthShore University HealthSystem, Evanston, Illinois, United States of America
- Department of Psychiatry and Behavioral Neuroscience, University of Chicago, Chicago, Illinois, United States of America
| | - Bin Zhu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Lu Deng
- School of Statistics and Data Science, Nankai University, Tianjin, China
| | - Fei Qin
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Jianxin Shi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Kai Yu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| |
Collapse
|
5
|
Pattillo Smith S, Darnell G, Udwin D, Stamp J, Harpak A, Ramachandran S, Crawford L. Discovering non-additive heritability using additive GWAS summary statistics. eLife 2024; 13:e90459. [PMID: 38913556 PMCID: PMC11196113 DOI: 10.7554/elife.90459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 04/22/2024] [Indexed: 06/26/2024] Open
Abstract
LD score regression (LDSC) is a method to estimate narrow-sense heritability from genome-wide association study (GWAS) summary statistics alone, making it a fast and popular approach. In this work, we present interaction-LD score (i-LDSC) regression: an extension of the original LDSC framework that accounts for interactions between genetic variants. By studying a wide range of generative models in simulations, and by re-analyzing 25 well-studied quantitative phenotypes from 349,468 individuals in the UK Biobank and up to 159,095 individuals in BioBank Japan, we show that the inclusion of a cis-interaction score (i.e. interactions between a focal variant and proximal variants) recovers genetic variance that is not captured by LDSC. For each of the 25 traits analyzed in the UK Biobank and BioBank Japan, i-LDSC detects additional variation contributed by genetic interactions. The i-LDSC software and its application to these biobanks represent a step towards resolving further genetic contributions of sources of non-additive genetic effects to complex trait variation.
Collapse
Affiliation(s)
- Samuel Pattillo Smith
- Center for Computational Molecular Biology, Brown UniversityProvidenceUnited States
- Department of Ecology and Evolutionary Biology, Brown UniversityProvidenceUnited States
- Department of Integrative Biology, The University of Texas at AustinAustinUnited States
- Department of Population Health, The University of Texas at AustinAustinUnited States
| | - Gregory Darnell
- Center for Computational Molecular Biology, Brown UniversityProvidenceUnited States
- Institute for Computational and Experimental Research in Mathematics, Brown UniversityProvidenceUnited States
| | - Dana Udwin
- Department of Biostatistics, Brown UniversityProvidenceUnited States
| | - Julian Stamp
- Center for Computational Molecular Biology, Brown UniversityProvidenceUnited States
| | - Arbel Harpak
- Department of Integrative Biology, The University of Texas at AustinAustinUnited States
- Department of Population Health, The University of Texas at AustinAustinUnited States
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown UniversityProvidenceUnited States
- Department of Ecology and Evolutionary Biology, Brown UniversityProvidenceUnited States
- Data Science Institute, Brown UniversityProvidenceUnited States
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown UniversityProvidenceUnited States
- Department of Biostatistics, Brown UniversityProvidenceUnited States
- MicrosoftCambridgeUnited States
| |
Collapse
|
6
|
Frei O, Hindley G, Shadrin AA, van der Meer D, Akdeniz BC, Hagen E, Cheng W, O'Connell KS, Bahrami S, Parker N, Smeland OB, Holland D, de Leeuw C, Posthuma D, Andreassen OA, Dale AM. Improved functional mapping of complex trait heritability with GSA-MiXeR implicates biologically specific gene sets. Nat Genet 2024; 56:1310-1318. [PMID: 38831010 PMCID: PMC11759099 DOI: 10.1038/s41588-024-01771-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 04/24/2024] [Indexed: 06/05/2024]
Abstract
While genome-wide association studies are increasingly successful in discovering genomic loci associated with complex human traits and disorders, the biological interpretation of these findings remains challenging. Here we developed the GSA-MiXeR analytical tool for gene set analysis (GSA), which fits a model for the heritability of individual genes, accounting for linkage disequilibrium across variants and allowing the quantification of partitioned heritability and fold enrichment for small gene sets. We validated the method using extensive simulations and sensitivity analyses. When applied to a diverse selection of complex traits and disorders, including schizophrenia, GSA-MiXeR prioritizes gene sets with greater biological specificity compared to standard GSA approaches, implicating voltage-gated calcium channel function and dopaminergic signaling for schizophrenia. Such biologically relevant gene sets, often with fewer than ten genes, are more likely to provide insights into the pathobiology of complex diseases and highlight potential drug targets.
Collapse
Affiliation(s)
- Oleksandr Frei
- Centre for Precision Psychiatry, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway.
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway.
| | - Guy Hindley
- Centre for Precision Psychiatry, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Alexey A Shadrin
- Centre for Precision Psychiatry, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Dennis van der Meer
- Centre for Precision Psychiatry, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
- School of Mental Health and Neuroscience, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, the Netherlands
| | - Bayram C Akdeniz
- Centre for Precision Psychiatry, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Espen Hagen
- Centre for Precision Psychiatry, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Weiqiu Cheng
- Centre for Precision Psychiatry, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Kevin S O'Connell
- Centre for Precision Psychiatry, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Shahram Bahrami
- Centre for Precision Psychiatry, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Nadine Parker
- Centre for Precision Psychiatry, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Olav B Smeland
- Centre for Precision Psychiatry, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Dominic Holland
- Center for Multimodal Imaging and Genetics, University of California San Diego, La Jolla, CA, USA
| | - Christiaan de Leeuw
- Department of Complex Trait Genetics, Centre for Neurogenomics and Cognitive Research, VU University, Amsterdam, the Netherlands
| | - Danielle Posthuma
- Department of Complex Trait Genetics, Centre for Neurogenomics and Cognitive Research, VU University, Amsterdam, the Netherlands
| | - Ole A Andreassen
- Centre for Precision Psychiatry, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Anders M Dale
- Center for Multimodal Imaging and Genetics, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
7
|
Shen S, Sobczyk MK, Paternoster L, Brown SJ. From GWASs toward Mechanistic Understanding with Case Studies in Dermatogenetics. J Invest Dermatol 2024; 144:1189-1199.e8. [PMID: 38782533 DOI: 10.1016/j.jid.2024.03.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 02/13/2024] [Accepted: 03/06/2024] [Indexed: 05/25/2024]
Abstract
Many human skin diseases result from the complex interplay of genetic and environmental mechanisms that are largely unknown. GWASs have yielded insight into the genetic aspect of complex disease by highlighting regions of the genome or specific genetic variants associated with disease. Leveraging this information to identify causal genes and cell types will provide insight into fundamental biology, inform diagnostics, and aid drug discovery. However, the etiological mechanisms from genetic variant to disease are still unestablished in most cases. There now exists an unprecedented wealth of data and computational methods for variant interpretation in a functional context. It can be challenging to decide where to start owing to a lack of consensus on the best way to identify causal genetic mechanisms. This article highlights 3 key aspects of genetic variant interpretation: prioritizing causal genes, cell types, and pathways. We provide a practical overview of the main methods and datasets, giving examples from recent atopic dermatitis studies to provide a blueprint for variant interpretation. A collection of resources, including brief description and links to the packages and web tools, is provided for researchers looking to start in silico follow-up genetic analysis of associated genetic variants.
Collapse
Affiliation(s)
- Silvia Shen
- Centre for Genomic & Experimental Medicine, Institute of Genetics and Cancer, The University of Edinburgh, Edinburgh, United Kingdom; Institute for Evolution and Ecology, School of Biological Sciences, The University of Edinburgh, Edinburgh, United Kingdom.
| | - Maria K Sobczyk
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, United Kingdom
| | - Lavinia Paternoster
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, United Kingdom
| | - Sara J Brown
- Centre for Genomic & Experimental Medicine, Institute of Genetics and Cancer, The University of Edinburgh, Edinburgh, United Kingdom; Department of Dermatology, NHS Lothian, Edinburgh, United Kingdom
| |
Collapse
|
8
|
Dorans E, Jagadeesh K, Dey K, Price AL. Linking regulatory variants to target genes by integrating single-cell multiome methods and genomic distance. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.24.24307813. [PMID: 38826240 PMCID: PMC11142273 DOI: 10.1101/2024.05.24.24307813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Methods that analyze single-cell paired RNA-seq and ATAC-seq multiome data have shown great promise in linking regulatory elements to genes. However, existing methods differ in their modeling assumptions and approaches to account for biological and technical noise-leading to low concordance in their linking scores-and do not capture the effects of genomic distance. We propose pgBoost, an integrative modeling framework that trains a non-linear combination of existing linking strategies (including genomic distance) on fine-mapped eQTL data to assign a probabilistic score to each candidate SNP-gene link. We applied pgBoost to single-cell multiome data from 85k cells representing 6 major immune/blood cell types. pgBoost attained higher enrichment for fine-mapped eSNP-eGene pairs (e.g. 21x at distance >10kb) than existing methods (1.2-10x; p-value for difference = 5e-13 vs. distance-based method and < 4e-35 for each other method), with larger improvements at larger distances (e.g. 35x vs. 0.89-6.6x at distance >100kb; p-value for difference < 0.002 vs. each other method). pgBoost also outperformed existing methods in enrichment for CRISPR-validated links (e.g. 4.8x vs. 1.6-4.1x at distance >10kb; p-value for difference = 0.25 vs. distance-based method and < 2e-5 for each other method), with larger improvements at larger distances (e.g. 15x vs. 1.6-2.5x at distance >100kb; p-value for difference < 0.009 for each other method). Similar improvements in enrichment were observed for links derived from Activity-By-Contact (ABC) scores and GWAS data. We further determined that restricting pgBoost to features from a focal cell type improved the identification of SNP-gene links relevant to that cell type. We highlight several examples where pgBoost linked fine-mapped GWAS variants to experimentally validated or biologically plausible target genes that were not implicated by other methods. In conclusion, a non-linear combination of linking strategies, including genomic distance, improves power to identify target genes underlying GWAS associations.
Collapse
|
9
|
Zhu X, Ma S, Wong WH. Genetic effects of sequence-conserved enhancer-like elements on human complex traits. Genome Biol 2024; 25:1. [PMID: 38167462 PMCID: PMC10759394 DOI: 10.1186/s13059-023-03142-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 12/08/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND The vast majority of findings from human genome-wide association studies (GWAS) map to non-coding sequences, complicating their mechanistic interpretations and clinical translations. Non-coding sequences that are evolutionarily conserved and biochemically active could offer clues to the mechanisms underpinning GWAS discoveries. However, genetic effects of such sequences have not been systematically examined across a wide range of human tissues and traits, hampering progress to fully understand regulatory causes of human complex traits. RESULTS Here we develop a simple yet effective strategy to identify functional elements exhibiting high levels of human-mouse sequence conservation and enhancer-like biochemical activity, which scales well to 313 epigenomic datasets across 106 human tissues and cell types. Combined with 468 GWAS of European (EUR) and East Asian (EAS) ancestries, these elements show tissue-specific enrichments of heritability and causal variants for many traits, which are significantly stronger than enrichments based on enhancers without sequence conservation. These elements also help prioritize candidate genes that are functionally relevant to body mass index (BMI) and schizophrenia but were not reported in previous GWAS with large sample sizes. CONCLUSIONS Our findings provide a comprehensive assessment of how sequence-conserved enhancer-like elements affect complex traits in diverse tissues and demonstrate a generalizable strategy of integrating evolutionary and biochemical data to elucidate human disease genetics.
Collapse
Affiliation(s)
- Xiang Zhu
- Department of Statistics, The Pennsylvania State University, 326 Thomas Building, University Park, 16802, PA, USA.
- Huck Institutes of the Life Sciences, The Pennsylvania State University, 201 Huck Life Sciences Building, University Park, 16802, PA, USA.
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, 94305, CA, USA.
| | - Shining Ma
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, 94305, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, 1265 Welch Road MC5464, Stanford, 94305, CA, USA
| | - Wing Hung Wong
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, 94305, CA, USA.
- Department of Biomedical Data Science, Stanford University School of Medicine, 1265 Welch Road MC5464, Stanford, 94305, CA, USA.
| |
Collapse
|
10
|
Azoury D, von Hoegen A, Su Y, Oh KH, Holder T, Tan H, Ortiz BR, Capa Salinas A, Wilson SD, Yan B, Gedik N. Direct observation of the collective modes of the charge density wave in the kagome metal CsV 3Sb 5. Proc Natl Acad Sci U S A 2023; 120:e2308588120. [PMID: 37748057 PMCID: PMC10556638 DOI: 10.1073/pnas.2308588120] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 07/31/2023] [Indexed: 09/27/2023] Open
Abstract
A recently discovered group of kagome metals AV[Formula: see text]Sb[Formula: see text] (A = K, Rb, Cs) exhibit a variety of intertwined unconventional electronic phases, which emerge from a puzzling charge density wave phase. Understanding of this charge-ordered parent phase is crucial for deciphering the entire phase diagram. However, the mechanism of the charge density wave is still controversial, and its primary source of fluctuations-the collective modes-has not been experimentally observed. Here, we use ultrashort laser pulses to melt the charge order in CsV[Formula: see text]Sb[Formula: see text] and record the resulting dynamics using femtosecond angle-resolved photoemission. We resolve the melting time of the charge order and directly observe its amplitude mode, imposing a fundamental limit for the fastest possible lattice rearrangement time. These observations together with ab initio calculations provide clear evidence for a structural rather than electronic mechanism of the charge density wave. Our findings pave the way for a better understanding of the unconventional phases hosted on the kagome lattice.
Collapse
Affiliation(s)
- Doron Azoury
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Alexander von Hoegen
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Yifan Su
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Kyoung Hun Oh
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Tobias Holder
- Department of Condensed Matter Physics, Weizmann Institute of Science, Rehovot7610001, Israel
| | - Hengxin Tan
- Department of Condensed Matter Physics, Weizmann Institute of Science, Rehovot7610001, Israel
| | - Brenden R. Ortiz
- Materials Department, University of California, Santa Barbara, CA93106
| | | | - Stephen D. Wilson
- Materials Department, University of California, Santa Barbara, CA93106
| | - Binghai Yan
- Department of Condensed Matter Physics, Weizmann Institute of Science, Rehovot7610001, Israel
| | - Nuh Gedik
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA02139
| |
Collapse
|
11
|
Salehi Nowbandegani P, Wohns AW, Ballard JL, Lander ES, Bloemendal A, Neale BM, O'Connor LJ. Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies. Nat Genet 2023; 55:1494-1502. [PMID: 37640881 DOI: 10.1038/s41588-023-01487-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 07/24/2023] [Indexed: 08/31/2023]
Abstract
Linkage disequilibrium (LD) is the correlation among nearby genetic variants. In genetic association studies, LD is often modeled using large correlation matrices, but this approach is inefficient, especially in ancestrally diverse studies. In the present study, we introduce LD graphical models (LDGMs), which are an extremely sparse and efficient representation of LD. LDGMs are derived from genome-wide genealogies; statistical relationships among alleles in the LDGM correspond to genealogical relationships among haplotypes. We published LDGMs and ancestry-specific LDGM precision matrices for 18 million common variants (minor allele frequency >1%) in five ancestry groups, validated their accuracy and demonstrated order-of-magnitude improvements in runtime for commonly used LD matrix computations. We implemented an extremely fast multiancestry polygenic prediction method, BLUPx-ldgm, which performs better than a similar method based on the reference LD correlation matrix. LDGMs will enable sophisticated methods that scale to ancestrally diverse genetic association data across millions of variants and individuals.
Collapse
Affiliation(s)
- Pouria Salehi Nowbandegani
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Anthony Wilder Wohns
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Stanford University School of Medicine, Stanford, CA, USA.
| | - Jenna L Ballard
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Eric S Lander
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, MIT, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Alex Bloemendal
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Luke J O'Connor
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
12
|
Singh R, He X, Park AK, Hardison RC, Zhu X, Li Q. RETROFIT: Reference-free deconvolution of cell-type mixtures in spatial transcriptomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.07.544126. [PMID: 37333291 PMCID: PMC10274808 DOI: 10.1101/2023.06.07.544126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Spatial transcriptomics (ST) profiles gene expression in intact tissues. However, ST data measured at each spatial location may represent gene expression of multiple cell types, making it difficult to identify cell-type-specific transcriptional variation across spatial contexts. Existing cell-type deconvolutions of ST data often require single-cell transcriptomic references, which can be limited by availability, completeness and platform effect of such references. We present RETROFIT, a reference-free Bayesian method that produces sparse and interpretable solutions to deconvolve cell types underlying each location independent of single-cell transcriptomic references. Results from synthetic and real ST datasets acquired by Slide-seq and Visium platforms demonstrate that RETROFIT outperforms existing reference-based and reference-free methods in estimating cell-type composition and reconstructing gene expression. Applying RETROFIT to human intestinal development ST data reveals spatiotemporal patterns of cellular composition and transcriptional specificity. RETROFIT is available at https://bioconductor.org/packages/release/bioc/html/retrofit.html.
Collapse
Affiliation(s)
- Roopali Singh
- The Pennsylvania State University, University Park, PA 16802
| | - Xi He
- The Pennsylvania State University, University Park, PA 16802
| | | | | | - Xiang Zhu
- The Pennsylvania State University, University Park, PA 16802
| | - Qunhua Li
- The Pennsylvania State University, University Park, PA 16802
| |
Collapse
|
13
|
Zabad S, Gravel S, Li Y. Fast and accurate Bayesian polygenic risk modeling with variational inference. Am J Hum Genet 2023; 110:741-761. [PMID: 37030289 PMCID: PMC10183379 DOI: 10.1016/j.ajhg.2023.03.009] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 03/13/2023] [Indexed: 04/10/2023] Open
Abstract
The advent of large-scale genome-wide association studies (GWASs) has motivated the development of statistical methods for phenotype prediction with single-nucleotide polymorphism (SNP) array data. These polygenic risk score (PRS) methods use a multiple linear regression framework to infer joint effect sizes of all genetic variants on the trait. Among the subset of PRS methods that operate on GWAS summary statistics, sparse Bayesian methods have shown competitive predictive ability. However, most existing Bayesian approaches employ Markov chain Monte Carlo (MCMC) algorithms, which are computationally inefficient and do not scale favorably to higher dimensions, for posterior inference. Here, we introduce variational inference of polygenic risk scores (VIPRS), a Bayesian summary statistics-based PRS method that utilizes variational inference techniques to approximate the posterior distribution for the effect sizes. Our experiments with 36 simulation configurations and 12 real phenotypes from the UK Biobank dataset demonstrated that VIPRS is consistently competitive with the state-of-the-art in prediction accuracy while being more than twice as fast as popular MCMC-based approaches. This performance advantage is robust across a variety of genetic architectures, SNP heritabilities, and independent GWAS cohorts. In addition to its competitive accuracy on the "White British" samples, VIPRS showed improved transferability when applied to other ethnic groups, with up to 1.7-fold increase in R2 among individuals of Nigerian ancestry for low-density lipoprotein (LDL) cholesterol. To illustrate its scalability, we applied VIPRS to a dataset of 9.6 million genetic markers, which conferred further improvements in prediction accuracy for highly polygenic traits, such as height.
Collapse
Affiliation(s)
- Shadi Zabad
- School of Computer Science, McGill University, Montreal, QC, Canada
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, QC, Canada.
| | - Yue Li
- School of Computer Science, McGill University, Montreal, QC, Canada.
| |
Collapse
|
14
|
Zhao K, Rhee SY. Interpreting omics data with pathway enrichment analysis. Trends Genet 2023; 39:308-319. [PMID: 36750393 DOI: 10.1016/j.tig.2023.01.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 11/24/2022] [Accepted: 01/13/2023] [Indexed: 02/09/2023]
Abstract
Pathway enrichment analysis is indispensable for interpreting omics datasets and generating hypotheses. However, the foundations of enrichment analysis remain elusive to many biologists. Here, we discuss best practices in interpreting different types of omics data using pathway enrichment analysis and highlight the importance of considering intrinsic features of various types of omics data. We further explain major components that influence the outcomes of a pathway enrichment analysis, including defining background sets and choosing reference annotation databases. To improve reproducibility, we describe how to standardize reporting methodological details in publications. This article aims to serve as a primer for biologists to leverage the wealth of omics resources and motivate bioinformatics tool developers to enhance the power of pathway enrichment analysis.
Collapse
Affiliation(s)
- Kangmei Zhao
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94025, USA.
| | - Seung Yon Rhee
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94025, USA.
| |
Collapse
|
15
|
Geiger-Schuller K, Eraslan B, Kuksenko O, Dey KK, Jagadeesh KA, Thakore PI, Karayel O, Yung AR, Rajagopalan A, Meireles AM, Yang KD, Amir-Zilberstein L, Delorey T, Phillips D, Raychowdhury R, Moussion C, Price AL, Hacohen N, Doench JG, Uhler C, Rozenblatt-Rosen O, Regev A. Systematically characterizing the roles of E3-ligase family members in inflammatory responses with massively parallel Perturb-seq. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.23.525198. [PMID: 36747789 PMCID: PMC9900845 DOI: 10.1101/2023.01.23.525198] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
E3 ligases regulate key processes, but many of their roles remain unknown. Using Perturb-seq, we interrogated the function of 1,130 E3 ligases, partners and substrates in the inflammatory response in primary dendritic cells (DCs). Dozens impacted the balance of DC1, DC2, migratory DC and macrophage states and a gradient of DC maturation. Family members grouped into co-functional modules that were enriched for physical interactions and impacted specific programs through substrate transcription factors. E3s and their adaptors co-regulated the same processes, but partnered with different substrate recognition adaptors to impact distinct aspects of the DC life cycle. Genetic interactions were more prevalent within than between modules, and a deep learning model, comβVAE, predicts the outcome of new combinations by leveraging modularity. The E3 regulatory network was associated with heritable variation and aberrant gene expression in immune cells in human inflammatory diseases. Our study provides a general approach to dissect gene function.
Collapse
|
16
|
Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics. Nat Genet 2022; 54:1479-1492. [PMID: 36175791 PMCID: PMC9910198 DOI: 10.1038/s41588-022-01187-9] [Citation(s) in RCA: 105] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 08/18/2022] [Indexed: 12/13/2022]
Abstract
Genome-wide association studies provide a powerful means of identifying loci and genes contributing to disease, but in many cases, the related cell types/states through which genes confer disease risk remain unknown. Deciphering such relationships is important for identifying pathogenic processes and developing therapeutics. In the present study, we introduce sc-linker, a framework for integrating single-cell RNA-sequencing, epigenomic SNP-to-gene maps and genome-wide association study summary statistics to infer the underlying cell types and processes by which genetic variants influence disease. The inferred disease enrichments recapitulated known biology and highlighted notable cell-disease relationships, including γ-aminobutyric acid-ergic neurons in major depressive disorder, a disease-dependent M-cell program in ulcerative colitis and a disease-specific complement cascade process in multiple sclerosis. In autoimmune disease, both healthy and disease-dependent immune cell-type programs were associated, whereas only disease-dependent epithelial cell programs were prominent, suggesting a role in disease response rather than initiation. Our framework provides a powerful approach for identifying the cell types and cellular processes by which genetic variants influence disease.
Collapse
|
17
|
Tcheandjieu C, Zhu X, Hilliard AT, Clarke SL, Napolioni V, Ma S, Lee KM, Fang H, Chen F, Lu Y, Tsao NL, Raghavan S, Koyama S, Gorman BR, Vujkovic M, Klarin D, Levin MG, Sinnott-Armstrong N, Wojcik GL, Plomondon ME, Maddox TM, Waldo SW, Bick AG, Pyarajan S, Huang J, Song R, Ho YL, Buyske S, Kooperberg C, Haessler J, Loos RJF, Do R, Verbanck M, Chaudhary K, North KE, Avery CL, Graff M, Haiman CA, Le Marchand L, Wilkens LR, Bis JC, Leonard H, Shen B, Lange LA, Giri A, Dikilitas O, Kullo IJ, Stanaway IB, Jarvik GP, Gordon AS, Hebbring S, Namjou B, Kaufman KM, Ito K, Ishigaki K, Kamatani Y, Verma SS, Ritchie MD, Kember RL, Baras A, Lotta LA, Kathiresan S, Hauser ER, Miller DR, Lee JS, Saleheen D, Reaven PD, Cho K, Gaziano JM, Natarajan P, Huffman JE, Voight BF, Rader DJ, Chang KM, Lynch JA, Damrauer SM, Wilson PWF, Tang H, Sun YV, Tsao PS, O'Donnell CJ, Assimes TL. Large-scale genome-wide association study of coronary artery disease in genetically diverse populations. Nat Med 2022; 28:1679-1692. [PMID: 35915156 PMCID: PMC9419655 DOI: 10.1038/s41591-022-01891-3] [Citation(s) in RCA: 184] [Impact Index Per Article: 61.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 06/08/2022] [Indexed: 02/03/2023]
Abstract
We report a genome-wide association study (GWAS) of coronary artery disease (CAD) incorporating nearly a quarter of a million cases, in which existing studies are integrated with data from cohorts of white, Black and Hispanic individuals from the Million Veteran Program. We document near equivalent heritability of CAD across multiple ancestral groups, identify 95 novel loci, including nine on the X chromosome, detect eight loci of genome-wide significance in Black and Hispanic individuals, and demonstrate that two common haplotypes at the 9p21 locus are responsible for risk stratification in all populations except those of African origin, in which these haplotypes are virtually absent. Moreover, in the largest GWAS for angiographically derived coronary atherosclerosis performed to date, we find 15 loci of genome-wide significance that robustly overlap with established loci for clinical CAD. Phenome-wide association analyses of novel loci and polygenic risk scores (PRSs) augment signals related to insulin resistance, extend pleiotropic associations of these loci to include smoking and family history, and precisely document the markedly reduced transferability of existing PRSs to Black individuals. Downstream integrative analyses reinforce the critical roles of vascular endothelial, fibroblast, and smooth muscle cells in CAD susceptibility, but also point to a shared biology between atherosclerosis and oncogenesis. This study highlights the value of diverse populations in further characterizing the genetic architecture of CAD.
Collapse
Affiliation(s)
- Catherine Tcheandjieu
- VA Palo Alto Health Care System, Palo Alto, CA, USA.
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA, USA.
- Gladstone Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA.
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA.
| | - Xiang Zhu
- VA Palo Alto Health Care System, Palo Alto, CA, USA
- Department of Statistics, Stanford University, Stanford, CA, USA
- Department of Statistics, The Pennsylvania State University, University Park, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | | | - Shoa L Clarke
- VA Palo Alto Health Care System, Palo Alto, CA, USA
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Valerio Napolioni
- School of Biosciences and Veterinary Medicine, University of Camerino, Camerino, Italy
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, CA, USA
| | - Shining Ma
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Kyung Min Lee
- VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, UT, USA
| | - Huaying Fang
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Fei Chen
- Department of Preventive Medicine, Center for Genetic Epidemiology, University of Southern California, Los Angeles, CA, USA
| | - Yingchang Lu
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Noah L Tsao
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Sridharan Raghavan
- Medicine Service, VA Eastern Colorado Health Care System, Aurora, CO, USA
- Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Satoshi Koyama
- Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Bryan R Gorman
- VA Boston Healthcare System, Boston, MA, USA
- Booz Allen Hamilton, McLean, VA, USA
| | - Marijana Vujkovic
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA
- Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Derek Klarin
- VA Palo Alto Health Care System, Palo Alto, CA, USA
- VA Boston Healthcare System, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Vascular Surgery and Endovascular Therapy, University of Florida School of Medicine, Gainesville, FL, USA
- Stanford University School of Medicine, Stanford, CA, USA
| | - Michael G Levin
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA
- Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Nasa Sinnott-Armstrong
- VA Palo Alto Health Care System, Palo Alto, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Genevieve L Wojcik
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - Mary E Plomondon
- Department of Medicine, Rocky Mountain Regional VA Medical Center, Aurora, CO, USA
- CART Program, VHA Office of Quality and Patient Safety, Washington, DC, USA
| | - Thomas M Maddox
- Healthcare Innovation Lab, JC HealthCare/Washington University School of Medicine, St Louis, MO, USA
- Division of Cardiology, Washington University School of Medicine, St Louis, MO, USA
| | - Stephen W Waldo
- Department of Medicine, Rocky Mountain Regional VA Medical Center, Aurora, CO, USA
- CART Program, VHA Office of Quality and Patient Safety, Washington, DC, USA
- Division of Cardiology, University of Colorado School of Medicine, Aurora, CO, USA
| | - Alexander G Bick
- Department of Biomedical Informatics, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Saiju Pyarajan
- VA Boston Healthcare System, Boston, MA, USA
- Department of Medicine, Brigham Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Jie Huang
- VA Boston Healthcare System, Boston, MA, USA
- Department of Global Health, Peking University School of Public Health, Beijing, China
- School of Public Health and Emergency Management, Southern University of Science and Technology, Shenzhen, China
| | | | - Yuk-Lam Ho
- VA Boston Healthcare System, Boston, MA, USA
| | - Steven Buyske
- Department of Statistics, Rutgers University, Piscataway, NJ, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Jeffrey Haessler
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Ruth J F Loos
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ron Do
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Marie Verbanck
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- EA 7537 BioSTM, Université de Paris, Paris, France
| | - Kumardeep Chaudhary
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kari E North
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Christy L Avery
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Mariaelisa Graff
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Christopher A Haiman
- Department of Preventive Medicine, Center for Genetic Epidemiology, University of Southern California, Los Angeles, CA, USA
| | - Loïc Le Marchand
- Cancer Epidemiology Program, University of Hawaii Cancer Center, University of Hawaii, Honolulu, HI, USA
| | - Lynne R Wilkens
- Cancer Epidemiology Program, University of Hawaii Cancer Center, University of Hawaii, Honolulu, HI, USA
| | - Joshua C Bis
- Department of Medicine, Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
| | - Hampton Leonard
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Data Tecnica Int'l, LLC, Glen Echo, MD, USA
| | - Botong Shen
- Health Disparities Research Section, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Leslie A Lange
- Department of Medicine, Division of Biomedical Informatics and Personalized Medicine, Aurora, CO, USA
- Lifecourse Epidemiology of Adiposity and Diabetes (LEAD) Center, Aurora, CO, USA
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Ayush Giri
- Department of Medicine, Division of Epidemiology, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Obstetrics and Gynecology, Division of Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ozan Dikilitas
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Ian B Stanaway
- Department of Medicine, Division of Nephrology, University of Washington, Seattle, WA, USA
| | - Gail P Jarvik
- Department of Medicine, Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Adam S Gordon
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Scott Hebbring
- Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield, WI, USA
| | - Bahram Namjou
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Kenneth M Kaufman
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Kaoru Ito
- Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Kazuyoshi Ishigaki
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Yoichiro Kamatani
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences - The University of Tokyo, Tokyo, Japan
| | - Shefali S Verma
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Rachel L Kember
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA
- Department of Psychiatry, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | - Sekar Kathiresan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Cardiology Division, Massachusetts General Hospital, Boston, MA, USA
- Verve Therapeutics, Cambridge, MA, USA
| | - Elizabeth R Hauser
- Cooperative Studies Program Epidemiology Center-Durham, Durham VA Health Care System, Durham, NC, USA
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NC, USA
| | - Donald R Miller
- Center for Healthcare Organization and Implementation Research, Bedford VA Healthcare System, Bedford, MA, USA
- Center for Population Health, Department of Biomedical and Nutritional Sciences, University of Massachusetts, Lowell, MA, USA
| | - Jennifer S Lee
- VA Palo Alto Health Care System, Palo Alto, CA, USA
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Danish Saleheen
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA
- Department of Medicine, Division of Cardiology, Columbia University, New York, NY, USA
| | - Peter D Reaven
- Phoenix VA Health Care System, Phoenix, AZ, USA
- College of Medicine, University of Arizona, Phoenix, AZ, USA
| | - Kelly Cho
- VA Boston Healthcare System, Boston, MA, USA
- Department of Medicine, Brigham Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - J Michael Gaziano
- VA Boston Healthcare System, Boston, MA, USA
- Department of Medicine, Brigham Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Pradeep Natarajan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Cardiology Division, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | | | - Benjamin F Voight
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Institute of Translational Medicine and Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Daniel J Rader
- Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Kyong-Mi Chang
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA
- Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Julie A Lynch
- VA Salt Lake City Health Care System, Salt Lake City, UT, USA
- College of Nursing and Health Sciences, University of Massachusetts, Boston, MA, USA
| | - Scott M Damrauer
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Peter W F Wilson
- Atlanta VA Medical Center, Atlanta, GA, USA
- Division of Cardiology, Emory University School of Medicine, Atlanta, GA, USA
| | - Hua Tang
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Yan V Sun
- Atlanta VA Health Care System, Atlanta, GA, USA
- Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, USA
| | - Philip S Tsao
- VA Palo Alto Health Care System, Palo Alto, CA, USA
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Christopher J O'Donnell
- VA Boston Healthcare System, Boston, MA, USA
- Department of Medicine, Brigham Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Themistocles L Assimes
- VA Palo Alto Health Care System, Palo Alto, CA, USA.
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA, USA.
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
18
|
Dey KK, Gazal S, van de Geijn B, Kim SS, Nasser J, Engreitz JM, Price AL. SNP-to-gene linking strategies reveal contributions of enhancer-related and candidate master-regulator genes to autoimmune disease. CELL GENOMICS 2022; 2:100145. [PMID: 35873673 PMCID: PMC9306342 DOI: 10.1016/j.xgen.2022.100145] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 04/03/2021] [Accepted: 05/27/2022] [Indexed: 12/11/2022]
Abstract
We assess contributions to autoimmune disease of genes whose regulation is driven by enhancer regions (enhancer-related) and genes that regulate other genes in trans (candidate master-regulator). We link these genes to SNPs using several SNP-to-gene (S2G) strategies and apply heritability analyses to draw three conclusions about 11 autoimmune/blood-related diseases/traits. First, several characterizations of enhancer-related genes using functional genomics data are informative for autoimmune disease heritability after conditioning on a broad set of regulatory annotations. Second, candidate master-regulator genes defined using trans-eQTL in blood are also conditionally informative for autoimmune disease heritability. Third, integrating enhancer-related and master-regulator gene sets with protein-protein interaction (PPI) network information magnified their disease signal. The resulting PPI-enhancer gene score produced >2-fold stronger heritability signal and >2-fold stronger enrichment for drug targets, compared with the recently proposed enhancer domain score. In each case, functionally informed S2G strategies produced 4.1- to 13-fold stronger disease signals than conventional window-based strategies.
Collapse
Affiliation(s)
- Kushal K. Dey
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Bryce van de Geijn
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Genentech, South San Francisco, CA 94080, USA
| | - Samuel Sungil Kim
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Joseph Nasser
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Jesse M. Engreitz
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
- BASE Initiative, Betty Irene Moore Children’s Heart Center, Lucile Packard Children’s Hospital, Stanford University School of Medicine, Stanford, CA 94304, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alkes L. Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
19
|
Xiao J, Cai M, Yu X, Hu X, Chen G, Wan X, Yang C. Leveraging the local genetic structure for trans-ancestry association mapping. Am J Hum Genet 2022; 109:1317-1337. [PMID: 35714612 PMCID: PMC9300880 DOI: 10.1016/j.ajhg.2022.05.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Accepted: 05/23/2022] [Indexed: 01/09/2023] Open
Abstract
Over the past two decades, genome-wide association studies (GWASs) have successfully advanced our understanding of the genetic basis of complex traits. Despite the fruitful discovery of GWASs, most GWAS samples are collected from European populations, and these GWASs are often criticized for their lack of ancestry diversity. Trans-ancestry association mapping (TRAM) offers an exciting opportunity to fill the gap of disparities in genetic studies between non-Europeans and Europeans. Here, we propose a statistical method, LOG-TRAM, to leverage the local genetic architecture for TRAM. By using biobank-scale datasets, we showed that LOG-TRAM can greatly improve the statistical power of identifying risk variants in under-represented populations while producing well-calibrated p values. We applied LOG-TRAM to the GWAS summary statistics of various complex traits/diseases from BioBank Japan, UK Biobank, and African populations. We obtained substantial gains in power and achieved effective correction of confounding biases in TRAM. Finally, we showed that LOG-TRAM can be successfully applied to identify ancestry-specific loci and the LOG-TRAM output can be further used for construction of more accurate polygenic risk scores in under-represented populations.
Collapse
Affiliation(s)
- Jiashun Xiao
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Mingxuan Cai
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Xinyi Yu
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Xianghong Hu
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Gang Chen
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Xiang Wan
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China; Pazhou Lab, Guangzhou 510330, China.
| | - Can Yang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
| |
Collapse
|
20
|
Mizikovsky D, Naval Sanchez M, Nefzger CM, Cuellar Partida G, Palpant NJ. Organization of gene programs revealed by unsupervised analysis of diverse gene-trait associations. Nucleic Acids Res 2022; 50:e87. [PMID: 35716123 PMCID: PMC9410900 DOI: 10.1093/nar/gkac413] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Revised: 04/28/2022] [Accepted: 05/09/2022] [Indexed: 12/28/2022] Open
Abstract
Genome wide association studies provide statistical measures of gene–trait associations that reveal how genetic variation influences phenotypes. This study develops an unsupervised dimensionality reduction method called UnTANGLeD (Unsupervised Trait Analysis of Networks from Gene Level Data) which organizes 16,849 genes into discrete gene programs by measuring the statistical association between genetic variants and 1,393 diverse complex traits. UnTANGLeD reveals 173 gene clusters enriched for protein–protein interactions and highly distinct biological processes governing development, signalling, disease, and homeostasis. We identify diverse gene networks with robust interactions but not associated with known biological processes. Analysis of independent disease traits shows that UnTANGLeD gene clusters are conserved across all complex traits, providing a simple and powerful framework to predict novel gene candidates and programs influencing orthogonal disease phenotypes. Collectively, this study demonstrates that gene programs co-ordinately orchestrating cell functions can be identified without reliance on prior knowledge, providing a method for use in functional annotation, hypothesis generation, machine learning and prediction algorithms, and the interpretation of diverse genomic data.
Collapse
Affiliation(s)
- Dalia Mizikovsky
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| | - Marina Naval Sanchez
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| | - Christian M Nefzger
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| | | | - Nathan J Palpant
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
21
|
Smith SP, Shahamatdar S, Cheng W, Zhang S, Paik J, Graff M, Haiman C, Matise TC, North KE, Peters U, Kenny E, Gignoux C, Wojcik G, Crawford L, Ramachandran S. Enrichment analyses identify shared associations for 25 quantitative traits in over 600,000 individuals from seven diverse ancestries. Am J Hum Genet 2022; 109:871-884. [PMID: 35349783 PMCID: PMC9118115 DOI: 10.1016/j.ajhg.2022.03.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 03/02/2022] [Indexed: 12/12/2022] Open
Abstract
Since 2005, genome-wide association (GWA) datasets have been largely biased toward sampling European ancestry individuals, and recent studies have shown that GWA results estimated from self-identified European individuals are not transferable to non-European individuals because of various confounding challenges. Here, we demonstrate that enrichment analyses that aggregate SNP-level association statistics at multiple genomic scales-from genes to genomic regions and pathways-have been underutilized in the GWA era and can generate biologically interpretable hypotheses regarding the genetic basis of complex trait architecture. We illustrate examples of the robust associations generated by enrichment analyses while studying 25 continuous traits assayed in 566,786 individuals from seven diverse self-identified human ancestries in the UK Biobank and the Biobank Japan as well as 44,348 admixed individuals from the PAGE consortium including cohorts of African American, Hispanic and Latin American, Native Hawaiian, and American Indian/Alaska Native individuals. We identify 1,000 gene-level associations that are genome-wide significant in at least two ancestry cohorts across these 25 traits as well as highly conserved pathway associations with triglyceride levels in European, East Asian, and Native Hawaiian cohorts.
Collapse
Affiliation(s)
- Samuel Pattillo Smith
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA
| | - Sahar Shahamatdar
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA
| | - Wei Cheng
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA
| | - Selena Zhang
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| | - Joseph Paik
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| | - Misa Graff
- Department of Epidemiology, University of North Carolina, Chapel Hill, Chapel Hill, NC 27599, USA
| | - Christopher Haiman
- Department of Preventative Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - T C Matise
- Department of Genetics, Rutgers University, Piscataway, NJ 08854, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina, Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Eimear Kenny
- The Center for Genomic Health, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA
| | - Chris Gignoux
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado, Denver, CO 80204, USA
| | - Genevieve Wojcik
- Department of Epidemiology, Johns Hopkins University, Baltimore, MD 21287, USA
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Biostatistics, Brown University, Providence, RI 02906, USA; Microsoft Research New England, Cambridge, MA 02142, USA
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA; Data Science Initiative, Brown University, Providence, RI 02912, USA.
| |
Collapse
|
22
|
Particle-based hematite crystallization is invariant to initial particle morphology. Proc Natl Acad Sci U S A 2022; 119:e2112679119. [PMID: 35275793 PMCID: PMC8931245 DOI: 10.1073/pnas.2112679119] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Many crystallization processes occurring in nature produce highly ordered hierarchical architectures. Their formation cannot be explained using classical models of monomer-by-monomer growth. One of the possible pathways involves crystallization through the attachment of oriented nanocrystals. Thus, it requires detailed understanding of the mechanism of particle dynamics that leads to their precise crystallographic alignment along specific faces. In this study, we discover a particle-morphology–independent oriented attachment mechanism for hematite nanocrystals. Independent of crystal morphology, particles always align along the [001] direction driven by aligning interactions between (001) faces and repulsive interactions between other pairs of hematite faces. These results highlight that strong face specificity along one crystallographic direction can render oriented attachment to be independent of initial particle morphology. Understanding the mechanism of particle-based crystallization is a formidable problem due to the complexity of macroscopic and interfacial forces driving particle dynamics. The oriented attachment (OA) pathway presents a particularly challenging phenomenon because it occurs only under select conditions and involves a precise crystallographic alignment of particle faces often from distances of several nanometers. Despite the progress made in recent years in understanding the driving forces for particle face selectivity and alignment, questions about the competition between ion-by-ion crystallization, near-surface nucleation, and OA remain. This study examines hydrothermal conditions leading to apparent OA for hematite using three initial particle morphologies with various exposed faces. All three particle types formed single-crystal or twinned one-dimensional (1D) chain-like structures along the [001] direction driven by the attractive interactions between (001) faces and repulsive interactions between other pairs of hematite faces. Moreover, simulations of the potential of mean force for iron species and scanning transmission electron microscopy (S/TEM) imaging confirm that the formation of 1D chains is a result of the attachment of independently nucleated particles and does not follow the near-surface nucleation or ion-by-ion crystallization pathways. These results highlight that strong face specificity along one crystallographic direction can render OA to be independent of initial particle morphology.
Collapse
|
23
|
Weiner DJ, Gazal S, Robinson EB, O'Connor LJ. Partitioning gene-mediated disease heritability without eQTLs. Am J Hum Genet 2022; 109:405-416. [PMID: 35143757 PMCID: PMC8948166 DOI: 10.1016/j.ajhg.2022.01.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 01/13/2022] [Indexed: 12/30/2022] Open
Abstract
Unknown SNP-to-gene regulatory architecture complicates efforts to link noncoding GWAS associations with genes implicated by sequencing or functional studies. eQTLs are often used to link SNPs to genes, but expression in bulk tissue explains a small fraction of disease heritability. A simple but successful approach has been to link SNPs with nearby genes via base pair windows, but genes may often be regulated by SNPs outside their window. We propose the abstract mediation model (AMM) to estimate (1) the fraction of heritability mediated by the closest or kth-closest gene to each SNP and (2) the mediated heritability enrichment of a gene set (e.g., genes with rare-variant associations). AMM jointly estimates these quantities by matching the decay in SNP enrichment with distance from genes in the gene set. Across 47 complex traits and diseases, we estimate that the closest gene to each SNP mediates 27% (SE: 6%) of heritability and that a substantial fraction is mediated by genes outside the ten closest. Mendelian disease genes are strongly enriched for common-variant heritability; for example, just 21 dyslipidemia genes mediate 25% of LDL heritability (211× enrichment, p = 0.01). Among brain-related traits, genes involved in neurodevelopmental disorders are only about 4× enriched, but gene expression patterns are highly informative, as they have detectable differences in per-gene heritability even among weakly brain-expressed genes.
Collapse
Affiliation(s)
- Daniel J Weiner
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Steven Gazal
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA; Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Elise B Robinson
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA
| | - Luke J O'Connor
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
24
|
Siewert-Rocks KM, Kim SS, Yao DW, Shi H, Price AL. Leveraging gene co-regulation to identify gene sets enriched for disease heritability. Am J Hum Genet 2022; 109:393-404. [PMID: 35108496 PMCID: PMC8948163 DOI: 10.1016/j.ajhg.2022.01.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 01/04/2022] [Indexed: 12/15/2022] Open
Abstract
Identifying gene sets that are associated to disease can provide valuable biological knowledge, but a fundamental challenge of gene set analyses of GWAS data is linking disease-associated SNPs to genes. Transcriptome-wide association studies (TWASs) detect associations between the genetically predicted expression of a gene and disease risk, thus implicating candidate disease genes. However, causal disease genes at TWAS-associated loci generally remain unknown due to gene co-regulation, which leads to correlations across genes in predicted expression. We developed a method, gene co-regulation score (GCSC) regression, to identify gene sets that are enriched for disease heritability explained by predicted expression. GCSC regresses TWAS chi-square statistics on gene co-regulation scores reflecting correlations in predicted gene expression; a gene set is enriched for heritability if genes with high co-regulation to the set have higher TWAS chi-square statistics than genes with low co-regulation to the set, beyond what is expected based on co-regulation to all genes. We verified via simulations that GCSC is well calibrated and well powered. We applied GCSC to gene expression data from GTEx (48 tissues) and GWAS summary statistics for 43 independent diseases and complex traits analyzing a broad set of biological pathways and specifically expressed gene sets. We identified many enriched sets, recapitulating known biology. For Alzheimer disease, we detected evidence of an immune basis, and specifically a role for antigen presentation, in analyses of both biological pathways and specifically expressed gene sets. Our results highlight the advantages of leveraging gene co-regulation within the TWAS framework to identify enriched gene sets.
Collapse
Affiliation(s)
- Katherine M Siewert-Rocks
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Samuel S Kim
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Douglas W Yao
- Program in Systems, Synthetic, and Quantitative Biology, Harvard University, Cambridge, MA 02138, USA
| | - Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
| |
Collapse
|
25
|
An effector index to predict target genes at GWAS loci. Hum Genet 2022; 141:1431-1447. [PMID: 35147782 DOI: 10.1007/s00439-022-02434-z] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 01/15/2022] [Indexed: 11/04/2022]
Abstract
Drug development and biological discovery require effective strategies to map existing genetic associations to causal genes. To approach this problem, we selected 12 common diseases and quantitative traits for which highly powered genome-wide association studies (GWAS) were available. For each disease or trait, we systematically curated positive control gene sets from Mendelian forms of the disease and from targets of medicines used for disease treatment. We found that these positive control genes were highly enriched in proximity of GWAS-associated single-nucleotide variants (SNVs). We then performed quantitative assessment of the contribution of commonly used genomic features, including open chromatin maps, expression quantitative trait loci (eQTL), and chromatin conformation data. Using these features, we trained and validated an Effector Index (Ei), to map target genes for these 12 common diseases and traits. Ei demonstrated high predictive performance, both with cross-validation on the training set, and an independently derived set for type 2 diabetes. Key predictive features included coding or transcript-altering SNVs, distance to gene, and open chromatin-based metrics. This work outlines a simple, understandable approach to prioritize genes at GWAS loci for functional follow-up and drug development, and provides a systematic strategy for prioritization of GWAS target genes.
Collapse
|
26
|
Leveraging cell-type-specific regulatory networks to interpret genetic variants in abdominal aortic aneurysm. Proc Natl Acad Sci U S A 2022; 119:2115601119. [PMID: 34930827 PMCID: PMC8740683 DOI: 10.1073/pnas.2115601119] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/22/2021] [Indexed: 12/17/2022] Open
Abstract
Abdominal aortic aneurysm (AAA) is a common and severe disease with major genetic risk factors. In this study we generated enhancer-promoter contact data to identify regulatory elements in AAA-relevant cell types and identified changes in their predicted chromatin accessibility between AAA patients and controls. We integrated this information with disease-associated variants in regulatory elements and gene bodies to further understand the etiology and pathogenetic mechanisms of AAA. Our study combined whole-genome sequencing data with gene regulatory relations in disease-relevant cell types to reveal the important roles of the interleukin 6 pathway and ERG and KLF regulation in AAA pathogenesis. Abdominal aortic aneurysm (AAA) is a common degenerative cardiovascular disease whose pathobiology is not clearly understood. The cellular heterogeneity and cell-type-specific gene regulation of vascular cells in human AAA have not been well-characterized. Here, we performed analysis of whole-genome sequencing data in AAA patients versus controls with the aim of detecting disease-associated variants that may affect gene regulation in human aortic smooth muscle cells (AoSMC) and human aortic endothelial cells (HAEC), two cell types of high relevance to AAA disease. To support this analysis, we generated H3K27ac HiChIP data for these cell types and inferred cell-type-specific gene regulatory networks. We observed that AAA-associated variants were most enriched in regulatory regions in AoSMC, compared with HAEC and CD4+ cells. The cell-type-specific regulation defined by this HiChIP data supported the importance of ERG and the KLF family of transcription factors in AAA disease. The analysis of regulatory elements that contain noncoding variants and also are differentially open between AAA patients and controls revealed the significance of the interleukin-6-mediated signaling pathway. This finding was further validated by including information from the deleteriousness effect of nonsynonymous single-nucleotide variants in AAA patients and additional control data from the Medical Genome Reference Bank dataset. These results shed important insights into AAA pathogenesis and provide a model for cell-type-specific analysis of disease-associated variants.
Collapse
|
27
|
Gholizadeh M, Esmaeili-Fard SM. Meta-analysis of genome-wide association studies for litter size in sheep. Theriogenology 2021; 180:103-112. [PMID: 34968818 DOI: 10.1016/j.theriogenology.2021.12.025] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 12/19/2021] [Accepted: 12/19/2021] [Indexed: 01/01/2023]
Abstract
Litter size and ovulation rate are important reproduction traits in sheep and have important impacts on the profitability of farm animals. To investigate the genetic architecture of litter size, we report the first meta-analysis of genome-wide association studies (GWAS) using 522 ewes and 564,377 SNPs from six sheep breeds. We identified 29 significant associations for litter size which 27 of which have not been reported in individual GWAS for each population. However, we could confirm the role of BMPR1B in prolificacy. Our gene set analysis discovered biological pathways related to cell signaling, communication, and adhesion. Functional clustering and enrichment using protein databases identified epidermal growth factor-like domain affecting litter size. Through analyzing protein-protein interaction data, we could identify hub genes like CASK, PLCB4, RPTOR, GRIA2, and PLCB1 that were enriched in most of the significant pathways. These genes have a role in cell proliferation, cell adhesion, cell growth and survival, and autophagy. Notably, identified SNPs were scattered on several different chromosomes implying different genetic mechanisms underlying variation of prolificacy in each breed. Given the different layers that make up the follicles and the need for communication and transfer of hormones and nutrients through these layers to the oocyte, the significance of pathways related to cell signaling and communication seems logical. Our results provide genetic insights into the litter size variation in different sheep breeds.
Collapse
Affiliation(s)
- Mohsen Gholizadeh
- Department of Animal Science, Faculty of Animal Science and Fisheries, Sari Agricultural Sciences and Natural Resources University, Sari, Iran.
| | - Seyed Mehdi Esmaeili-Fard
- Department of Animal Science, Faculty of Animal Science and Fisheries, Sari Agricultural Sciences and Natural Resources University, Sari, Iran
| |
Collapse
|
28
|
Jagadeesh KA, Dey KK, Montoro DT, Mohan R, Gazal S, Engreitz JM, Xavier RJ, Price AL, Regev A. Identifying disease-critical cell types and cellular processes across the human body by integration of single-cell profiles and human genetics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.03.19.436212. [PMID: 34845454 PMCID: PMC8629197 DOI: 10.1101/2021.03.19.436212] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Genome-wide association studies (GWAS) provide a powerful means to identify loci and genes contributing to disease, but in many cases the related cell types/states through which genes confer disease risk remain unknown. Deciphering such relationships is important for identifying pathogenic processes and developing therapeutics. Here, we introduce sc-linker, a framework for integrating single-cell RNA-seq (scRNA-seq), epigenomic maps and GWAS summary statistics to infer the underlying cell types and processes by which genetic variants influence disease. We analyzed 1.6 million scRNA-seq profiles from 209 individuals spanning 11 tissue types and 6 disease conditions, and constructed gene programs capturing cell types, disease progression, and cellular processes both within and across cell types. We evaluated these gene programs for disease enrichment by transforming them to SNP annotations with tissue-specific epigenomic maps and computing enrichment scores across 60 diseases and complex traits (average N= 297K). Cell type, disease progression, and cellular process programs captured distinct heritability signals even within the same cell type, as we show in multiple complex diseases that affect the brain (Alzheimer’s disease, multiple sclerosis), colon (ulcerative colitis) and lung (asthma, idiopathic pulmonary fibrosis, severe COVID-19). The inferred disease enrichments recapitulated known biology and highlighted novel cell-disease relationships, including GABAergic neurons in major depressive disorder (MDD), a disease progression M cell program in ulcerative colitis, and a disease-specific complement cascade process in multiple sclerosis. In autoimmune disease, both healthy and disease progression immune cell type programs were associated, whereas for epithelial cells, disease progression programs were most prominent, perhaps suggesting a role in disease progression over initiation. Our framework provides a powerful approach for identifying the cell types and cellular processes by which genetic variants influence disease.
Collapse
|
29
|
Insight into the Candidate Genes and Enriched Pathways Associated with Height, Length, Length to Height Ratio and Body-Weight of Korean Indigenous Breed, Jindo Dog Using Gene Set Enrichment-Based GWAS Analysis. Animals (Basel) 2021; 11:ani11113136. [PMID: 34827868 PMCID: PMC8614278 DOI: 10.3390/ani11113136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/21/2021] [Accepted: 10/28/2021] [Indexed: 12/14/2022] Open
Abstract
As a companion and hunting dog, height, length, length to height ratio (LHR) and body-weight are the vital economic traits for Jindo dog. Human selection and targeted breeding have produced an extraordinary diversity in these traits. Therefore, the identification of causative markers, genes and pathways that help us to understand the genetic basis of this variability is essential for their selection purposes. Here, we performed a genome-wide association study (GWAS) combined with enrichment analysis on 757 dogs using 118,879 SNPs. The genomic heritability (h2) was 0.33 for height and 0.28 for weight trait in Jindo. At p-value < 5 × 10-5, ten, six, thirteen and eleven SNPs on different chromosomes were significantly associated with height, length, LHR and body-weight traits, respectively. Based on our results, HHIP, LCORL and NCAPG for height, IGFI and FGFR3 for length, DLK1 and EFEMP1 for LHR and PTPN2, IGFI and RASAL2 for weight can be the potential candidate genes because of the significant SNPs located in their intronic or upstream regions. The gene-set enrichment analysis highlighted here nine and seven overlapping significant (p < 0.05) gene ontology (GO) terms and pathways among traits. Interestingly, the highlighted pathways were related to hormone synthesis, secretion and signalling were generally involved in the metabolism, growth and development process. Our data provide an insight into the significant genes and pathways if verified further, which will have a significant effect on the breeding of the Jindo dog's population.
Collapse
|
30
|
Nguyen TH, He X, Brown RC, Webb BT, Kendler KS, Vladimirov VI, Riley BP, Bacanu SA. DECO: a framework for jointly analyzing de novo and rare case/control variants, and biological pathways. Brief Bioinform 2021; 22:bbab067. [PMID: 33791774 PMCID: PMC8425460 DOI: 10.1093/bib/bbab067] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 01/25/2021] [Accepted: 02/09/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Rare variant-based analyses are beginning to identify risk genes for neuropsychiatric disorders and other diseases. However, the identified genes only account for a fraction of predicted causal genes. Recent studies have shown that rare damaging variants are significantly enriched in specific gene-sets. Methods which are able to jointly model rare variants and gene-sets to identify enriched gene-sets and use these enriched gene-sets to prioritize additional risk genes could improve understanding of the genetic architecture of diseases. RESULTS We propose DECO (Integrated analysis of de novo mutations, rare case/control variants and omics information via gene-sets), an integrated method for rare-variant and gene-set analysis. The method can (i) test the enrichment of gene-sets directly within the statistical model, and (ii) use enriched gene-sets to rank existing genes and prioritize additional risk genes for tested disorders. In simulations, DECO performs better than a homologous method that uses only variant data. To demonstrate the application of the proposed protocol, we have applied this approach to rare-variant datasets of schizophrenia. Compared with a method which only uses variant information, DECO is able to prioritize additional risk genes. AVAILABILITY DECO can be used to analyze rare-variants and biological pathways or cell types for any disease. The package is available on Github https://github.com/hoangtn/DECO.
Collapse
Affiliation(s)
- Tan-Hoang Nguyen
- Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
| | - Xin He
- The Department of Human Genetics, University of Chicago, IL 60637, USA; Grossman Institute for Neuroscience, Quantitative Biology and Human Behavior, University of Chicago, Chicago, IL 60637, USA
| | - Ruth C Brown
- Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
| | - Bradley T Webb
- Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
| | - Kenneth S Kendler
- Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
| | - Vladimir I Vladimirov
- Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA; Department of Psychiatry & Behavioral Sciences, College of Medicine, Texas A&M University, College Station, TX, USA; and the Lieber Institute for Brain Development, Baltimore, MD, USA
| | - Brien P Riley
- Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
| | - Silviu-Alin Bacanu
- Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
31
|
Qi G, Chatterjee N. A comprehensive evaluation of methods for Mendelian randomization using realistic simulations and an analysis of 38 biomarkers for risk of type 2 diabetes. Int J Epidemiol 2021; 50:1335-1349. [PMID: 33393617 PMCID: PMC8562333 DOI: 10.1093/ije/dyaa262] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 12/03/2020] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Previous studies have often evaluated methods for Mendelian randomization (MR) analysis based on simulations that do not adequately reflect the data-generating mechanisms in genome-wide association studies (GWAS) and there are often discrepancies in the performance of MR methods in simulations and real data sets. METHODS We use a simulation framework that generates data on full GWAS for two traits under a realistic model for effect-size distribution coherent with the heritability, co-heritability and polygenicity typically observed for complex traits. We further use recent data generated from GWAS of 38 biomarkers in the UK Biobank and performed down sampling to investigate trends in estimates of causal effects of these biomarkers on the risk of type 2 diabetes (T2D). RESULTS Simulation studies show that weighted mode and MRMix are the only two methods that maintain the correct type I error rate in a diverse set of scenarios. Between the two methods, MRMix tends to be more powerful for larger GWAS whereas the opposite is true for smaller sample sizes. Among the other methods, random-effect IVW (inverse-variance weighted method), MR-Robust and MR-RAPS (robust adjust profile score) tend to perform best in maintaining a low mean-squared error when the InSIDE assumption is satisfied, but can produce large bias when InSIDE is violated. In real-data analysis, some biomarkers showed major heterogeneity in estimates of their causal effects on the risk of T2D across the different methods and estimates from many methods trended in one direction with increasing sample size with patterns similar to those observed in simulation studies. CONCLUSION The relative performance of different MR methods depends heavily on the sample sizes of the underlying GWAS, the proportion of valid instruments and the validity of the InSIDE assumption. Down-sampling analysis can be used in large GWAS for the possible detection of bias in the MR methods.
Collapse
Affiliation(s)
- Guanghao Qi
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
32
|
Demetci P, Cheng W, Darnell G, Zhou X, Ramachandran S, Crawford L. Multi-scale inference of genetic trait architecture using biologically annotated neural networks. PLoS Genet 2021; 17:e1009754. [PMID: 34411094 PMCID: PMC8407593 DOI: 10.1371/journal.pgen.1009754] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 08/31/2021] [Accepted: 07/31/2021] [Indexed: 01/01/2023] Open
Abstract
In this article, we present Biologically Annotated Neural Networks (BANNs), a nonlinear probabilistic framework for association mapping in genome-wide association (GWA) studies. BANNs are feedforward models with partially connected architectures that are based on biological annotations. This setup yields a fully interpretable neural network where the input layer encodes SNP-level effects, and the hidden layer models the aggregated effects among SNP-sets. We treat the weights and connections of the network as random variables with prior distributions that reflect how genetic effects manifest at different genomic scales. The BANNs software uses variational inference to provide posterior summaries which allow researchers to simultaneously perform (i) mapping with SNPs and (ii) enrichment analyses with SNP-sets on complex traits. Through simulations, we show that our method improves upon state-of-the-art association mapping and enrichment approaches across a wide range of genetic architectures. We then further illustrate the benefits of BANNs by analyzing real GWA data assayed in approximately 2,000 heterogenous stock of mice from the Wellcome Trust Centre for Human Genetics and approximately 7,000 individuals from the Framingham Heart Study. Lastly, using a random subset of individuals of European ancestry from the UK Biobank, we show that BANNs is able to replicate known associations in high and low-density lipoprotein cholesterol content.
Collapse
Affiliation(s)
- Pinar Demetci
- Department of Computer Science, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| | - Wei Cheng
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island, United States of America
| | - Gregory Darnell
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
- Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sohini Ramachandran
- Department of Computer Science, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island, United States of America
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
- Department of Biostatistics, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
33
|
The distribution of common-variant effect sizes. Nat Genet 2021; 53:1243-1249. [PMID: 34326547 DOI: 10.1038/s41588-021-00901-3] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 06/23/2021] [Indexed: 01/08/2023]
Abstract
The genetic effect-size distribution of a disease describes the number of risk variants, the range of their effect sizes and sample sizes that will be required to discover them. Accurate estimation has been a challenge. Here I propose Fourier Mixture Regression (FMR), validating that it accurately estimates real and simulated effect-size distributions. Applied to summary statistics for ten diseases (average [Formula: see text]), FMR estimates that 100,000-1,000,000 cases will be required for genome-wide significant SNPs to explain 50% of SNP heritability. In such large studies, genome-wide significance becomes increasingly conservative, and less stringent thresholds achieve high true positive rates if confounding is controlled. Across traits, polygenicity varies, but the range of their effect sizes is similar. Compared with effect sizes in the top 10% of heritability, including most discovered thus far, those in the bottom 10-50% are orders of magnitude smaller and more numerous, spanning a large fraction of the genome.
Collapse
|
34
|
Albiñana C, Grove J, McGrath JJ, Agerbo E, Wray NR, Bulik CM, Nordentoft M, Hougaard DM, Werge T, Børglum AD, Mortensen PB, Privé F, Vilhjálmsson BJ. Leveraging both individual-level genetic data and GWAS summary statistics increases polygenic prediction. Am J Hum Genet 2021; 108:1001-1011. [PMID: 33964208 PMCID: PMC8206385 DOI: 10.1016/j.ajhg.2021.04.014] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 04/20/2021] [Indexed: 12/12/2022] Open
Abstract
The accuracy of polygenic risk scores (PRSs) to predict complex diseases increases with the training sample size. PRSs are generally derived based on summary statistics from large meta-analyses of multiple genome-wide association studies (GWASs). However, it is now common for researchers to have access to large individual-level data as well, such as the UK Biobank data. To the best of our knowledge, it has not yet been explored how best to combine both types of data (summary statistics and individual-level data) to optimize polygenic prediction. The most widely used approach to combine data is the meta-analysis of GWAS summary statistics (meta-GWAS), but we show that it does not always provide the most accurate PRS. Through simulations and using 12 real case-control and quantitative traits from both iPSYCH and UK Biobank along with external GWAS summary statistics, we compare meta-GWAS with two alternative data-combining approaches, stacked clumping and thresholding (SCT) and meta-PRS. We find that, when large individual-level data are available, the linear combination of PRSs (meta-PRS) is both a simple alternative to meta-GWAS and often more accurate.
Collapse
Affiliation(s)
- Clara Albiñana
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; National Centre for Register-Based Research, Aarhus University, 8210 Aarhus V, Denmark.
| | - Jakob Grove
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; Department of Biomedicine and Center for Integrative Sequencing, iSEQ, Aarhus University, 8000 Aarhus C, Denmark; Center for Genomics and Personalized Medicine, CGPM, Aarhus University, 8000 Aarhus C, Denmark; Bioinformatics Research Centre, Aarhus University, 8000 Aarhus C, Denmark
| | - John J McGrath
- National Centre for Register-Based Research, Aarhus University, 8210 Aarhus V, Denmark; Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Brisbane, QLD 4076, Australia; Queensland Brain Institute, University of Queensland, Brisbane, QLD 4072, Australia
| | - Esben Agerbo
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; National Centre for Register-Based Research, Aarhus University, 8210 Aarhus V, Denmark
| | - Naomi R Wray
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia; Queensland Brain Institute, University of Queensland, Brisbane, QLD 4072, Australia
| | - Cynthia M Bulik
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA; Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 171 77 Stockholm, Sweden; Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Merete Nordentoft
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; Copenhagen University Hospital, Mental Health Centre Copenhagen Mental Health Services in the Capital Region of Denmark, 2100 Copenhagen Ø, Denmark; Department of Clinical Medicine, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - David M Hougaard
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, 2300 Copenhagen S, Denmark
| | - Thomas Werge
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; Institute of Biological Psychiatry, MHC Sct. Hans, Mental Health Services Copenhagen, 4000 Roskilde, Denmark; Department of Clinical Medicine, University of Copenhagen, 2200 Copenhagen N, Denmark; Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, 1350 Copenhagen K, Denmark
| | - Anders D Børglum
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; Department of Biomedicine and Center for Integrative Sequencing, iSEQ, Aarhus University, 8000 Aarhus C, Denmark; Center for Genomics and Personalized Medicine, CGPM, Aarhus University, 8000 Aarhus C, Denmark
| | - Preben Bo Mortensen
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; National Centre for Register-Based Research, Aarhus University, 8210 Aarhus V, Denmark
| | - Florian Privé
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; National Centre for Register-Based Research, Aarhus University, 8210 Aarhus V, Denmark
| | - Bjarni J Vilhjálmsson
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; National Centre for Register-Based Research, Aarhus University, 8210 Aarhus V, Denmark; Bioinformatics Research Centre, Aarhus University, 8000 Aarhus C, Denmark.
| |
Collapse
|
35
|
Tang H, He Z. Advances and challenges in quantitative delineation of the genetic architecture of complex traits. QUANTITATIVE BIOLOGY 2021; 9:168-184. [PMID: 35492964 PMCID: PMC9053444 DOI: 10.15302/j-qb-021-0249] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background Genome-wide association studies (GWAS) have been widely adopted in studies of human complex traits and diseases. Results This review surveys areas of active research: quantifying and partitioning trait heritability, fine mapping functional variants and integrative analysis, genetic risk prediction of phenotypes, and the analysis of sequencing studies that have identified millions of rare variants. Current challenges and opportunities are highlighted. Conclusion GWAS have fundamentally transformed the field of human complex trait genetics. Novel statistical and computational methods have expanded the scope of GWAS and have provided valuable insights on the genetic architecture underlying complex phenotypes.
Collapse
Affiliation(s)
- Hua Tang
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
36
|
Wang B, Sudijono T, Kirveslahti H, Gao T, Boyer DM, Mukherjee S, Crawford L. A statistical pipeline for identifying physical features that differentiate classes of 3D shapes. Ann Appl Stat 2021. [DOI: 10.1214/20-aoas1430] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Bruce Wang
- Lewis-Sigler Institute for Integrative Genomics, Princeton University
| | | | | | - Tingran Gao
- Committee on Computational and Applied Mathematics, Department of Statistics, University of Chicago
| | | | - Sayan Mukherjee
- Department of Statistical Science, Department of Computer Science, Department of Mathematics, and Department of Bioinformatics & Biostatistics, Duke University
| | | |
Collapse
|
37
|
Zhu X, Duren Z, Wong WH. Modeling regulatory network topology improves genome-wide analyses of complex human traits. Nat Commun 2021; 12:2851. [PMID: 33990562 PMCID: PMC8121952 DOI: 10.1038/s41467-021-22588-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 03/03/2021] [Indexed: 01/22/2023] Open
Abstract
Genome-wide association studies (GWAS) have cataloged many significant associations between genetic variants and complex traits. However, most of these findings have unclear biological significance, because they often have small effects and occur in non-coding regions. Integration of GWAS with gene regulatory networks addresses both issues by aggregating weak genetic signals within regulatory programs. Here we develop a Bayesian framework that integrates GWAS summary statistics with regulatory networks to infer genetic enrichments and associations simultaneously. Our method improves upon existing approaches by explicitly modeling network topology to assess enrichments, and by automatically leveraging enrichments to identify associations. Applying this method to 18 human traits and 38 regulatory networks shows that genetic signals of complex traits are often enriched in interconnections specific to trait-relevant cell types or tissues. Prioritizing variants within enriched networks identifies known and previously undescribed trait-associated genes revealing biological and therapeutic insights.
Collapse
Affiliation(s)
- Xiang Zhu
- Department of Statistics, The Pennsylvania State University, University Park, PA, 16802, USA. .,Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, 16802, USA. .,Department of Statistics, Stanford University, Stanford, CA, 94305, USA.
| | - Zhana Duren
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA.,Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, 29646, USA
| | - Wing Hung Wong
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA. .,Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, 94305, USA.
| |
Collapse
|
38
|
Silberstein M, Nesbit N, Cai J, Lee PH. Pathway analysis for genome-wide genetic variation data: Analytic principles, latest developments, and new opportunities. J Genet Genomics 2021; 48:173-183. [PMID: 33896739 PMCID: PMC8286309 DOI: 10.1016/j.jgg.2021.01.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 01/24/2021] [Accepted: 01/25/2021] [Indexed: 12/23/2022]
Abstract
Pathway analysis, also known as gene-set enrichment analysis, is a multilocus analytic strategy that integrates a priori, biological knowledge into the statistical analysis of high-throughput genetics data. Originally developed for the studies of gene expression data, it has become a powerful analytic procedure for in-depth mining of genome-wide genetic variation data. Astonishing discoveries were made in the past years, uncovering genes and biological mechanisms underlying common and complex disorders. However, as massive amounts of diverse functional genomics data accrue, there is a pressing need for newer generations of pathway analysis methods that can utilize multiple layers of high-throughput genomics data. In this review, we provide an intellectual foundation of this powerful analytic strategy, as well as an update of the state-of-the-art in recent method developments. The goal of this review is threefold: (1) introduce the motivation and basic steps of pathway analysis for genome-wide genetic variation data; (2) review the merits and the shortcomings of classic and newly emerging integrative pathway analysis tools; and (3) discuss remaining challenges and future directions for further method developments.
Collapse
Affiliation(s)
- Micah Silberstein
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Nicholas Nesbit
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Jacquelyn Cai
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Phil H Lee
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Psychiatry, Harvard Medical School, Boston, MA 02115, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
39
|
Shadrin AA, Frei O, Smeland OB, Bettella F, O'Connell KS, Gani O, Bahrami S, Uggen TKE, Djurovic S, Holland D, Andreassen OA, Dale AM. Phenotype-specific differences in polygenicity and effect size distribution across functional annotation categories revealed by AI-MiXeR. Bioinformatics 2021; 36:4749-4756. [PMID: 32539089 PMCID: PMC7750998 DOI: 10.1093/bioinformatics/btaa568] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 06/03/2020] [Accepted: 06/09/2020] [Indexed: 11/30/2022] Open
Abstract
Motivation Determining the relative contributions of functional genetic categories is fundamental to understanding the genetic etiology of complex human traits and diseases. Here, we present Annotation Informed-MiXeR, a likelihood-based method for estimating the number of variants influencing a phenotype and their effect sizes across different functional annotation categories of the genome using summary statistics from genome-wide association studies. Results Extensive simulations demonstrate that the model is valid for a broad range of genetic architectures. The model suggests that complex human phenotypes substantially differ in the number of causal variants, their localization in the genome and their effect sizes. Specifically, the exons of protein-coding genes harbor more than 90% of variants influencing type 2 diabetes and inflammatory bowel disease, making them good candidates for whole-exome studies. In contrast, <10% of the causal variants for schizophrenia, bipolar disorder and attention-deficit/hyperactivity disorder are located in protein-coding exons, indicating a more substantial role of regulatory mechanisms in the pathogenesis of these disorders. Availability and implementation The software is available at: https://github.com/precimed/mixer. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alexey A Shadrin
- NORMENT, Institute of Clinical Medicine, University of Oslo, Oslo 0424, Norway.,Division of Mental Health and Addiction, Oslo University Hospital, Oslo 0424, Norway
| | - Oleksandr Frei
- NORMENT, Institute of Clinical Medicine, University of Oslo, Oslo 0424, Norway.,Division of Mental Health and Addiction, Oslo University Hospital, Oslo 0424, Norway.,Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo 0373, Norway
| | - Olav B Smeland
- NORMENT, Institute of Clinical Medicine, University of Oslo, Oslo 0424, Norway.,Division of Mental Health and Addiction, Oslo University Hospital, Oslo 0424, Norway
| | - Francesco Bettella
- NORMENT, Institute of Clinical Medicine, University of Oslo, Oslo 0424, Norway.,Division of Mental Health and Addiction, Oslo University Hospital, Oslo 0424, Norway
| | - Kevin S O'Connell
- NORMENT, Institute of Clinical Medicine, University of Oslo, Oslo 0424, Norway.,Division of Mental Health and Addiction, Oslo University Hospital, Oslo 0424, Norway
| | - Osman Gani
- NORMENT, Institute of Clinical Medicine, University of Oslo, Oslo 0424, Norway.,Division of Mental Health and Addiction, Oslo University Hospital, Oslo 0424, Norway
| | - Shahram Bahrami
- NORMENT, Institute of Clinical Medicine, University of Oslo, Oslo 0424, Norway.,Division of Mental Health and Addiction, Oslo University Hospital, Oslo 0424, Norway
| | - Tea K E Uggen
- NORMENT, Institute of Clinical Medicine, University of Oslo, Oslo 0424, Norway.,Division of Mental Health and Addiction, Oslo University Hospital, Oslo 0424, Norway
| | - Srdjan Djurovic
- Department of Medical Genetics, Oslo University Hospital, Oslo 0424, Norway.,NORMENT, Department of Clinical Science, University of Bergen, Bergen 5020, Norway
| | - Dominic Holland
- Center for Multimodal Imaging and Genetics, University of California, San Diego, La Jolla, CA, 92037, USA.,Department of Neurosciences, University of California, San Diego, La Jolla, CA 92093, USA
| | - Ole A Andreassen
- NORMENT, Institute of Clinical Medicine, University of Oslo, Oslo 0424, Norway.,Division of Mental Health and Addiction, Oslo University Hospital, Oslo 0424, Norway.,Department of Neurosciences, University of California, San Diego, La Jolla, CA 92093, USA
| | - Anders M Dale
- Center for Multimodal Imaging and Genetics, University of California, San Diego, La Jolla, CA, 92037, USA.,Department of Neurosciences, University of California, San Diego, La Jolla, CA 92093, USA.,Department of Radiology, University of California, San Diego, La Jolla, CA 92093, USA.,Department of Psychiatry, University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
40
|
Widespread signatures of natural selection across human complex traits and functional genomic categories. Nat Commun 2021; 12:1164. [PMID: 33608517 PMCID: PMC7896067 DOI: 10.1038/s41467-021-21446-3] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Accepted: 01/27/2021] [Indexed: 01/16/2023] Open
Abstract
Understanding how natural selection has shaped genetic architecture of complex traits is of importance in medical and evolutionary genetics. Bayesian methods have been developed using individual-level GWAS data to estimate multiple genetic architecture parameters including selection signature. Here, we present a method (SBayesS) that only requires GWAS summary statistics. We analyse data for 155 complex traits (n = 27k–547k) and project the estimates onto those obtained from evolutionary simulations. We estimate that, on average across traits, about 1% of human genome sequence are mutational targets with a mean selection coefficient of ~0.001. Common diseases, on average, show a smaller number of mutational targets and have been under stronger selection, compared to other traits. SBayesS analyses incorporating functional annotations reveal that selection signatures vary across genomic regions, among which coding regions have the strongest selection signature and are enriched for both the number of associated variants and the magnitude of effect sizes. Methods to study how natural selection shapes genetic architecture of complex traits rely on individual level genome-wide association study (GWAS) data. Here, the authors present a Bayesian method using GWAS summary statistics to study genetic architecture and apply this to 155 complex traits.
Collapse
|
41
|
Koomar T, Thomas TR, Pottschmidt NR, Lutter M, Michaelson JJ. Estimating the Prevalence and Genetic Risk Mechanisms of ARFID in a Large Autism Cohort. Front Psychiatry 2021; 12:668297. [PMID: 34177659 PMCID: PMC8221394 DOI: 10.3389/fpsyt.2021.668297] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 05/07/2021] [Indexed: 12/27/2022] Open
Abstract
This study is the first genetically-informed investigation of avoidant/restrictive food intake disorder (ARFID), an eating disorder that profoundly impacts quality of life for those affected. ARFID is highly comorbid with autism, and we provide the first estimate of its prevalence in a large and phenotypically diverse autism cohort (a subsample of the SPARK study, N = 5,157 probands). This estimate, 21% (at a balanced accuracy 80%), is at the upper end of previous estimates from studies based on clinical samples, suggesting under-diagnosis and potentially lack of awareness among caretakers and clinicians. Although some studies suggest a decrease of disordered eating symptoms by age 6, our estimates indicate that up to 17% (at a balanced accuracy 87%) of parents of autistic children are also at heightened risk for ARFID, suggesting a lifelong risk for disordered eating. We were also able to provide the first estimates of narrow-sense heritability (h2) for ARFID risk, at 0.45. Genome-wide association revealed a single hit near ZSWIM6, a gene previously implicated in neurodevelopmental conditions. While, the current sample was not well-powered for GWAS, effect size and heritability estimates allowed us to project the sample sizes necessary to more robustly discover ARFID-linked loci via common variants. Further genetic analysis using polygenic risk scores (PRS) affirmed genetic links to autism as well as neuroticism and metabolic syndrome.
Collapse
Affiliation(s)
- Tanner Koomar
- Department of Psychiatry, The University of Iowa, Iowa City, IA, United States
| | - Taylor R Thomas
- Department of Psychiatry, The University of Iowa, Iowa City, IA, United States
| | - Natalie R Pottschmidt
- Department of Psychology, Pennsylvania State University, State College, PA, United States
| | - Michael Lutter
- Eating Recovery Center of San Antonio, San Antonio, TX, United States
| | - Jacob J Michaelson
- Department of Psychiatry, The University of Iowa, Iowa City, IA, United States
| |
Collapse
|
42
|
Kim SS, Dey KK, Weissbrod O, Márquez-Luna C, Gazal S, Price AL. Improving the informativeness of Mendelian disease-derived pathogenicity scores for common disease. Nat Commun 2020; 11:6258. [PMID: 33288751 PMCID: PMC7721881 DOI: 10.1038/s41467-020-20087-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 11/09/2020] [Indexed: 02/08/2023] Open
Abstract
Despite considerable progress on pathogenicity scores prioritizing variants for Mendelian disease, little is known about the utility of these scores for common disease. Here, we assess the informativeness of Mendelian disease-derived pathogenicity scores for common disease and improve upon existing scores. We first apply stratified linkage disequilibrium (LD) score regression to evaluate published pathogenicity scores across 41 common diseases and complex traits (average N = 320K). Several of the resulting annotations are informative for common disease, even after conditioning on a broad set of functional annotations. We then improve upon published pathogenicity scores by developing AnnotBoost, a machine learning framework to impute and denoise pathogenicity scores using a broad set of functional annotations. AnnotBoost substantially increases the informativeness for common disease of both previously uninformative and previously informative pathogenicity scores, implying that Mendelian and common disease variants share similar properties. The boosted scores also produce improvements in heritability model fit and in classifying disease-associated, fine-mapped SNPs. Our boosted scores may improve fine-mapping and candidate gene discovery for common disease.
Collapse
Affiliation(s)
- Samuel S Kim
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
| | - Kushal K Dey
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Carla Márquez-Luna
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
| |
Collapse
|
43
|
Hao X, Wang K, Dai C, Ding Z, Yang W, Wang C, Cheng S. Integrative analysis of scRNA-seq and GWAS data pinpoints periportal hepatocytes as the relevant liver cell types for blood lipids. Hum Mol Genet 2020; 29:3145-3153. [PMID: 32821946 DOI: 10.1093/hmg/ddaa188] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 08/10/2020] [Accepted: 08/18/2020] [Indexed: 12/22/2022] Open
Abstract
Liver, a heterogeneous tissue consisting of various cell types, is known to be relevant for blood lipid traits. By integrating summary statistics from genome-wide association studies (GWAS) of lipid traits and single-cell transcriptome data of the liver, we sought to identify specific cell types in the liver that were most relevant for blood lipid levels. We conducted differential expression analyses for 40 cell types from human and mouse livers in order to construct the cell-type specifically expressed gene sets, which we refer to as construction of the liver cell-type specifically expressed gene sets (CT-SEGS). Under the assumption that CT-SEGS represented specific functions of each cell type, we applied stratified linkage disequilibrium score regression to determine cell types that were most relevant for complex traits and diseases. We first confirmed the validity of this method (of delineating functionally relevant cell types) by identifying the immune cell types as relevant for autoimmune diseases. We further showed that lipid GWAS signals were enriched in the human and mouse periportal hepatocytes. Our results provide important information to facilitate future cellular studies of the metabolic mechanism affecting blood lipid levels.
Collapse
Affiliation(s)
- Xingjie Hao
- Department of Epidemiology and Biostatistics, Key Laboratory for Environment and Health, School of Public Health
| | - Kai Wang
- Department of Epidemiology and Biostatistics, Key Laboratory for Environment and Health, School of Public Health
| | - Chengguqiu Dai
- Department of Epidemiology and Biostatistics, Key Laboratory for Environment and Health, School of Public Health
| | | | - Wei Yang
- Department of Nutrition and Food Hygiene, School of Public Health
| | - Chaolong Wang
- Department of Epidemiology and Biostatistics, Key Laboratory for Environment and Health, School of Public Health
- Department of Orthopedic Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Shanshan Cheng
- Department of Epidemiology and Biostatistics, Key Laboratory for Environment and Health, School of Public Health
| |
Collapse
|
44
|
Timshel PN, Thompson JJ, Pers TH. Genetic mapping of etiologic brain cell types for obesity. eLife 2020; 9:55851. [PMID: 32955435 PMCID: PMC7505664 DOI: 10.7554/elife.55851] [Citation(s) in RCA: 80] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 09/04/2020] [Indexed: 12/11/2022] Open
Abstract
The underlying cell types mediating predisposition to obesity remain largely obscure. Here, we integrated recently published single-cell RNA-sequencing (scRNA-seq) data from 727 peripheral and nervous system cell types spanning 17 mouse organs with body mass index (BMI) genome-wide association study (GWAS) data from >457,000 individuals. Developing a novel strategy for integrating scRNA-seq data with GWAS data, we identified 26, exclusively neuronal, cell types from the hypothalamus, subthalamus, midbrain, hippocampus, thalamus, cortex, pons, medulla, pallidum that were significantly enriched for BMI heritability (p<1.6×10−4). Using genes harboring coding mutations associated with obesity, we replicated midbrain cell types from the anterior pretectal nucleus and periaqueductal gray (p<1.2×10−4). Together, our results suggest that brain nuclei regulating integration of sensory stimuli, learning and memory are likely to play a key role in obesity and provide testable hypotheses for mechanistic follow-up studies.
Collapse
Affiliation(s)
- Pascal N Timshel
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | - Jonatan J Thompson
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | - Tune H Pers
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
45
|
Yang Y, Shi X, Jiao Y, Huang J, Chen M, Zhou X, Sun L, Lin X, Yang C, Liu J. CoMM-S2: a collaborative mixed model using summary statistics in transcriptome-wide association studies. Bioinformatics 2020; 36:2009-2016. [PMID: 31755899 DOI: 10.1093/bioinformatics/btz880] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 09/25/2019] [Accepted: 11/21/2019] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Although genome-wide association studies (GWAS) have deepened our understanding of the genetic architecture of complex traits, the mechanistic links that underlie how genetic variants cause complex traits remains elusive. To advance our understanding of the underlying mechanistic links, various consortia have collected a vast volume of genomic data that enable us to investigate the role that genetic variants play in gene expression regulation. Recently, a collaborative mixed model (CoMM) was proposed to jointly interrogate genome on complex traits by integrating both the GWAS dataset and the expression quantitative trait loci (eQTL) dataset. Although CoMM is a powerful approach that leverages regulatory information while accounting for the uncertainty in using an eQTL dataset, it requires individual-level GWAS data and cannot fully make use of widely available GWAS summary statistics. Therefore, statistically efficient methods that leverages transcriptome information using only summary statistics information from GWAS data are required. RESULTS In this study, we propose a novel probabilistic model, CoMM-S2, to examine the mechanistic role that genetic variants play, by using only GWAS summary statistics instead of individual-level GWAS data. Similar to CoMM which uses individual-level GWAS data, CoMM-S2 combines two models: the first model examines the relationship between gene expression and genotype, while the second model examines the relationship between the phenotype and the predicted gene expression from the first model. Distinct from CoMM, CoMM-S2 requires only GWAS summary statistics. Using both simulation studies and real data analysis, we demonstrate that even though CoMM-S2 utilizes GWAS summary statistics, it has comparable performance as CoMM, which uses individual-level GWAS data. AVAILABILITY AND IMPLEMENTATION The implement of CoMM-S2 is included in the CoMM package that can be downloaded from https://github.com/gordonliu810822/CoMM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yi Yang
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai 200433, China.,Centre for Quantitative Medicine, Program in Health Services & Systems Research, Duke-NUS Medical School, 169857, Singapore
| | - Xingjie Shi
- Centre for Quantitative Medicine, Program in Health Services & Systems Research, Duke-NUS Medical School, 169857, Singapore.,Department of Statistics, Nanjing University of Finance and Economics, Nanjing 210046, China
| | - Yuling Jiao
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan 430073, China
| | - Jian Huang
- Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA 52242, USA
| | - Min Chen
- Academy of Mathematics and Systems Science, The Chinese Academy of Sciences, Beijing 100190, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Lei Sun
- Cardiovascular and Metabolic Disorders Program, Duke-NUS Medical School, 169857, Singapore
| | - Xinyi Lin
- Centre for Quantitative Medicine, Program in Health Services & Systems Research, Duke-NUS Medical School, 169857, Singapore.,Singapore Clinical Research Institute, 138669, Singapore.,Singapore Institute for Clinical Sciences, A*STAR, 117609, Singapore
| | - Can Yang
- Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong 999077, China
| | - Jin Liu
- Centre for Quantitative Medicine, Program in Health Services & Systems Research, Duke-NUS Medical School, 169857, Singapore
| |
Collapse
|
46
|
Identification of therapeutic targets from genetic association studies using hierarchical component analysis. BioData Min 2020; 13:6. [PMID: 32565911 PMCID: PMC7301559 DOI: 10.1186/s13040-020-00216-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 05/29/2020] [Indexed: 01/08/2023] Open
Abstract
Background Mapping disease-associated genetic variants to complex disease pathophysiology is a major challenge in translating findings from genome-wide association studies into novel therapeutic opportunities. The difficulty lies in our limited understanding of how phenotypic traits arise from non-coding genetic variants in highly organized biological systems with heterogeneous gene expression across cells and tissues. Results We present a novel strategy, called GWAS component analysis, for transferring disease associations from single-nucleotide polymorphisms to co-expression modules by stacking models trained using reference genome and tissue-specific gene expression data. Application of this method to genome-wide association studies of blood cell counts confirmed that it could detect gene sets enriched in expected cell types. In addition, coupling of our method with Bayesian networks enables GWAS components to be used to discover drug targets. Conclusions We tested genome-wide associations of four disease phenotypes, including age-related macular degeneration, Crohn’s disease, ulcerative colitis and rheumatoid arthritis, and demonstrated the proposed method could select more functional genes than S-PrediXcan, the previous single-step model for predicting gene-level associations from SNP-level associations.
Collapse
|
47
|
A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk. Proc Natl Acad Sci U S A 2020; 117:15028-15035. [PMID: 32522875 PMCID: PMC7334489 DOI: 10.1073/pnas.1918862117] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Variation is rampant throughout human genomes: some of it affects disease risk, and most does not; to separate the two requires a plethora of hypothesis tests. This challenge of multiple testing—limiting false positives while maximizing power—arises in many “omics” studies and sciences. One approach is to control the false discovery rate (FDR), and a recent selective inference method for controlling FDR, adaptive P-value thresholding (AdaPT), facilitates incorporation of auxiliary information (covariates) related to each hypothesis test. How AdaPT performs on data is an open question. We apply AdaPT to results from genomic association studies and include many covariates. This adaptive search discovers a more complex and interpretable model with far greater power than classic multiple testing procedures. To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new methodologies of selective inference could potentially improve power while retaining statistical guarantees, especially those that enable exploration of test statistics using auxiliary information (covariates) to weight hypothesis tests for association. We explore one such method, adaptive P-value thresholding (AdaPT), in the framework of genome-wide association studies (GWAS) and gene expression/coexpression studies, with particular emphasis on schizophrenia (SCZ). Selected SCZ GWAS association P values play the role of the primary data for AdaPT; single-nucleotide polymorphisms (SNPs) are selected because they are gene expression quantitative trait loci (eQTLs). This natural pairing of SNPs and genes allow us to map the following covariate values to these pairs: GWAS statistics from genetically correlated bipolar disorder, the effect size of SNP genotypes on gene expression, and gene–gene coexpression, captured by subnetwork (module) membership. In all, 24 covariates per SNP/gene pair were included in the AdaPT analysis using flexible gradient boosted trees. We demonstrate a substantial increase in power to detect SCZ associations using gene expression information from the developing human prefrontal cortex. We interpret these results in light of recent theories about the polygenic nature of SCZ. Importantly, our entire process for identifying enrichment and creating features with independent complementary data sources can be implemented in many different high-throughput settings to ultimately improve power.
Collapse
|
48
|
Shi H, Burch KS, Johnson R, Freund MK, Kichaev G, Mancuso N, Manuel AM, Dong N, Pasaniuc B. Localizing Components of Shared Transethnic Genetic Architecture of Complex Traits from GWAS Summary Data. Am J Hum Genet 2020; 106:805-817. [PMID: 32442408 PMCID: PMC7273527 DOI: 10.1016/j.ajhg.2020.04.012] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 04/20/2020] [Indexed: 12/19/2022] Open
Abstract
Despite strong transethnic genetic correlations reported in the literature for many complex traits, the non-transferability of polygenic risk scores across populations suggests the presence of population-specific components of genetic architecture. We propose an approach that models GWAS summary data for one trait in two populations to estimate genome-wide proportions of population-specific/shared causal SNPs. In simulations across various genetic architectures, we show that our approach yields approximately unbiased estimates with in-sample LD and slight upward-bias with out-of-sample LD. We analyze nine complex traits in individuals of East Asian and European ancestry, restricting to common SNPs (MAF > 5%), and find that most common causal SNPs are shared by both populations. Using the genome-wide estimates as priors in an empirical Bayes framework, we perform fine-mapping and observe that high-posterior SNPs (for both the population-specific and shared causal configurations) have highly correlated effects in East Asians and Europeans. In population-specific GWAS risk regions, we observe a 2.8× enrichment of shared high-posterior SNPs, suggesting that population-specific GWAS risk regions harbor shared causal SNPs that are undetected in the other GWASs due to differences in LD, allele frequencies, and/or sample size. Finally, we report enrichments of shared high-posterior SNPs in 53 tissue-specific functional categories and find evidence that SNP-heritability enrichments are driven largely by many low-effect common SNPs.
Collapse
Affiliation(s)
- Huwenbo Shi
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Kathryn S Burch
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| | - Ruth Johnson
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Malika K Freund
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Gleb Kichaev
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Astrid M Manuel
- Department of Biological Sciences, Florida International University, Miami, FL 33199, USA
| | - Natalie Dong
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
49
|
Cheng W, Ramachandran S, Crawford L. Estimation of non-null SNP effect size distributions enables the detection of enriched genes underlying complex traits. PLoS Genet 2020; 16:e1008855. [PMID: 32542026 PMCID: PMC7316356 DOI: 10.1371/journal.pgen.1008855] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Revised: 06/25/2020] [Accepted: 05/13/2020] [Indexed: 12/22/2022] Open
Abstract
Traditional univariate genome-wide association studies generate false positives and negatives due to difficulties distinguishing associated variants from variants with spurious nonzero effects that do not directly influence the trait. Recent efforts have been directed at identifying genes or signaling pathways enriched for mutations in quantitative traits or case-control studies, but these can be computationally costly and hampered by strict model assumptions. Here, we present gene-ε, a new approach for identifying statistical associations between sets of variants and quantitative traits. Our key insight is that enrichment studies on the gene-level are improved when we reformulate the genome-wide SNP-level null hypothesis to identify spurious small-to-intermediate SNP effects and classify them as non-causal. gene-ε efficiently identifies enriched genes under a variety of simulated genetic architectures, achieving greater than a 90% true positive rate at 1% false positive rate for polygenic traits. Lastly, we apply gene-ε to summary statistics derived from six quantitative traits using European-ancestry individuals in the UK Biobank, and identify enriched genes that are in biologically relevant pathways.
Collapse
Affiliation(s)
- Wei Cheng
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| | - Sohini Ramachandran
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Biostatistics, Brown University, Providence, Rhode Island, United States of America
- Center for Statistical Sciences, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
50
|
McGuirl MR, Smith SP, Sandstede B, Ramachandran S. Detecting Shared Genetic Architecture Among Multiple Phenotypes by Hierarchical Clustering of Gene-Level Association Statistics. Genetics 2020; 215:511-529. [PMID: 32245788 PMCID: PMC7268989 DOI: 10.1534/genetics.120.303096] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 03/31/2020] [Indexed: 12/31/2022] Open
Abstract
Emerging large-scale biobanks pairing genotype data with phenotype data present new opportunities to prioritize shared genetic associations across multiple phenotypes for molecular validation. Past research, by our group and others, has shown gene-level tests of association produce biologically interpretable characterization of the genetic architecture of a given phenotype. Here, we present a new method, Ward clustering to identify Internal Node branch length outliers using Gene Scores (WINGS), for identifying shared genetic architecture among multiple phenotypes. The objective of WINGS is to identify groups of phenotypes, or "clusters," sharing a core set of genes enriched for mutations in cases. We validate WINGS using extensive simulation studies and then combine gene-level association tests with WINGS to identify shared genetic architecture among 81 case-control and seven quantitative phenotypes in 349,468 European-ancestry individuals from the UK Biobank. We identify eight prioritized phenotype clusters and recover multiple published gene-level associations within prioritized clusters.
Collapse
Affiliation(s)
- Melissa R McGuirl
- Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912
| | - Samuel Pattillo Smith
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island 02912
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island 02912
| | - Björn Sandstede
- Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912
- Data Science Initiative, Brown University, Providence, Rhode Island 02912
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island 02912
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island 02912
| |
Collapse
|