1
|
Guo S, Yang J. Bayesian genome-wide TWAS with reference transcriptomic data of brain and blood tissues identified 141 risk genes for Alzheimer's disease dementia. Alzheimers Res Ther 2024; 16:120. [PMID: 38824563 PMCID: PMC11144322 DOI: 10.1186/s13195-024-01488-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 05/27/2024] [Indexed: 06/03/2024]
Abstract
BACKGROUND Transcriptome-wide association study (TWAS) is an influential tool for identifying genes associated with complex diseases whose genetic effects are likely mediated through transcriptome. TWAS utilizes reference genetic and transcriptomic data to estimate effect sizes of genetic variants on gene expression (i.e., effect sizes of a broad sense of expression quantitative trait loci, eQTL). These estimated effect sizes are employed as variant weights in gene-based association tests, facilitating the mapping of risk genes with genome-wide association study (GWAS) data. However, most existing TWAS of Alzheimer's disease (AD) dementia are limited to studying only cis-eQTL proximal to the test gene. To overcome this limitation, we applied the Bayesian Genome-wide TWAS (BGW-TWAS) method to leveraging both cis- and trans- eQTL of brain and blood tissues, in order to enhance mapping risk genes for AD dementia. METHODS We first applied BGW-TWAS to the Genotype-Tissue Expression (GTEx) V8 dataset to estimate cis- and trans- eQTL effect sizes of the prefrontal cortex, cortex, and whole blood tissues. Estimated eQTL effect sizes were integrated with the summary data of the most recent GWAS of AD dementia to obtain BGW-TWAS (i.e., gene-based association test) p-values of AD dementia per gene per tissue type. Then we used the aggregated Cauchy association test to combine TWAS p-values across three tissues to obtain omnibus TWAS p-values per gene. RESULTS We identified 85 significant genes in prefrontal cortex, 82 in cortex, and 76 in whole blood that were significantly associated with AD dementia. By combining BGW-TWAS p-values across these three tissues, we obtained 141 significant risk genes including 34 genes primarily due to trans-eQTL and 35 mapped risk genes in GWAS Catalog. With these 141 significant risk genes, we detected functional clusters comprised of both known mapped GWAS risk genes of AD in GWAS Catalog and our identified TWAS risk genes by protein-protein interaction network analysis, as well as several enriched phenotypes related to AD. CONCLUSION We applied BGW-TWAS and aggregated Cauchy test methods to integrate both cis- and trans- eQTL data of brain and blood tissues with GWAS summary data, identifying 141 TWAS risk genes of AD dementia. These identified risk genes provide novel insights into the underlying biological mechanisms of AD dementia and potential gene targets for therapeutics development.
Collapse
Affiliation(s)
- Shuyi Guo
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA.
| |
Collapse
|
2
|
Zhu X, Ma S, Wong WH. Genetic effects of sequence-conserved enhancer-like elements on human complex traits. Genome Biol 2024; 25:1. [PMID: 38167462 PMCID: PMC10759394 DOI: 10.1186/s13059-023-03142-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 12/08/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND The vast majority of findings from human genome-wide association studies (GWAS) map to non-coding sequences, complicating their mechanistic interpretations and clinical translations. Non-coding sequences that are evolutionarily conserved and biochemically active could offer clues to the mechanisms underpinning GWAS discoveries. However, genetic effects of such sequences have not been systematically examined across a wide range of human tissues and traits, hampering progress to fully understand regulatory causes of human complex traits. RESULTS Here we develop a simple yet effective strategy to identify functional elements exhibiting high levels of human-mouse sequence conservation and enhancer-like biochemical activity, which scales well to 313 epigenomic datasets across 106 human tissues and cell types. Combined with 468 GWAS of European (EUR) and East Asian (EAS) ancestries, these elements show tissue-specific enrichments of heritability and causal variants for many traits, which are significantly stronger than enrichments based on enhancers without sequence conservation. These elements also help prioritize candidate genes that are functionally relevant to body mass index (BMI) and schizophrenia but were not reported in previous GWAS with large sample sizes. CONCLUSIONS Our findings provide a comprehensive assessment of how sequence-conserved enhancer-like elements affect complex traits in diverse tissues and demonstrate a generalizable strategy of integrating evolutionary and biochemical data to elucidate human disease genetics.
Collapse
Affiliation(s)
- Xiang Zhu
- Department of Statistics, The Pennsylvania State University, 326 Thomas Building, University Park, 16802, PA, USA.
- Huck Institutes of the Life Sciences, The Pennsylvania State University, 201 Huck Life Sciences Building, University Park, 16802, PA, USA.
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, 94305, CA, USA.
| | - Shining Ma
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, 94305, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, 1265 Welch Road MC5464, Stanford, 94305, CA, USA
| | - Wing Hung Wong
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, 94305, CA, USA.
- Department of Biomedical Data Science, Stanford University School of Medicine, 1265 Welch Road MC5464, Stanford, 94305, CA, USA.
| |
Collapse
|
3
|
Gao B, Zhou X. MESuSiE enables scalable and powerful multi-ancestry fine-mapping of causal variants in genome-wide association studies. Nat Genet 2024; 56:170-179. [PMID: 38168930 PMCID: PMC11849347 DOI: 10.1038/s41588-023-01604-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 10/30/2023] [Indexed: 01/05/2024]
Abstract
Fine-mapping in genome-wide association studies attempts to identify causal SNPs from a set of candidate SNPs in a local genomic region of interest and is commonly performed in one genetic ancestry at a time. Here, we present multi-ancestry sum of the single effects model (MESuSiE), a probabilistic multi-ancestry fine-mapping method, to improve the accuracy and resolution of fine-mapping by leveraging association information across ancestries. MESuSiE uses summary statistics as input, accounts for the diverse linkage disequilibrium pattern observed in different ancestries, explicitly models both shared and ancestry-specific causal SNPs, and relies on a variational inference algorithm for scalable computation. We evaluated the performance of MESuSiE through comprehensive simulations and multi-ancestry fine-mapping of four lipid traits with both European and African samples. In the real data, MESuSiE improves fine-mapping resolution by 19.0% to 72.0% compared to existing approaches, is an order of magnitude faster, and captures and categorizes shared and ancestry-specific causal signals with enhanced functional enrichment.
Collapse
Affiliation(s)
- Boran Gao
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
4
|
Jung S, Lee CH, Sul JH, Han B. Building an optimal predictive model for imputing tissue-specific gene expression by combining genotype and whole-blood transcriptome data. HGG ADVANCES 2023; 4:100223. [PMID: 37576186 PMCID: PMC10413136 DOI: 10.1016/j.xhgg.2023.100223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 05/04/2023] [Indexed: 08/15/2023] Open
Abstract
Accurate imputation of tissue-specific gene expression can be a powerful tool for understanding the biological mechanisms underlying human complex traits. Existing imputation methods can be grouped into two categories according to the types of predictors used. The first category uses genotype data, while the second category uses whole-blood expression data. Both data types can be easily collected from blood, avoiding invasive tissue biopsies. In this study, we attempted to build an optimal predictive model for imputing tissue-specific gene expression by combining the genotype and whole-blood expression data. We first evaluated the imputation performance of each standalone model (using genotype data [GEN model] and using whole-blood expression data [WBE model]) using their respective data types across 47 human tissues. The WBE model outperformed the GEN model in most tissues by a large gain. Then, we developed several combined models that leverage both types of predictors to further improve imputation performance. We tried various strategies, including utilizing a merged dataset of the two data types (MERGED models) and integrating the imputation outcomes of the two standalone models (inverse variance-weighted [IVW] models). We found that one of the MERGED models noticeably outperformed the standalone models. This model involved a fixed ratio between the two regularization penalty factors for the two predictor types so that the contribution of the whole-blood transcriptome is upweighted compared with the genotype. Our study suggests that one can improve the imputation of tissue-specific gene expression by combining the genotype and whole-blood expression, but the improvement can be largely dependent on the combination strategy chosen.
Collapse
Affiliation(s)
- Sunwoo Jung
- Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea
| | - Cue Hyunkyu Lee
- Department of Biostatistics, Columbia University, New York, NY, USA
| | - Jae Hoon Sul
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Buhm Han
- Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea
- Department of Biomedical Sciences, BK21 Plus Biomedical Science Project, Seoul National University College of Medicine, Seoul, Republic of Korea
| |
Collapse
|
5
|
Guo S, Yang J. Bayesian genome-wide TWAS with reference transcriptomic data of brain and blood tissues identified 93 risk genes for Alzheimer's disease dementia. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.06.23292336. [PMID: 37503151 PMCID: PMC10370241 DOI: 10.1101/2023.07.06.23292336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Background Transcriptome-wide association study (TWAS) is an influential tool for identifying novel genes associated with complex diseases, where their genetic effects may be mediated through transcriptome. TWAS utilizes reference genetic and transcriptomic data to estimate genetic effect sizes on expression quantitative traits of target genes (i.e., effect sizes of a broad sense of expression quantitative trait loci, eQTL). These estimated effect sizes are then employed as variant weights in burden gene-based association test statistics, facilitating the mapping of risk genes for complex diseases with genome-wide association study (GWAS) data. However, most existing TWAS of Alzheimer's disease (AD) dementia have primarily focused on cis -eQTL, disregarding potential trans -eQTL. To overcome this limitation, we applied the Bayesian Genome-wide TWAS (BGW-TWAS) method which incorporated both cis - and trans -eQTL of brain and blood tissues to enhance mapping risk genes for AD dementia. Methods We first applied BGW-TWAS to the Genotype-Tissue Expression (GTEx) V8 dataset to estimate cis - and trans -eQTL effect sizes of the prefrontal cortex, cortex, and whole blood tissues. Subsequently, estimated eQTL effect sizes were integrated with the summary data of the most recent GWAS of AD dementia to obtain BGW-TWAS (i.e., gene-based association test) p-values of AD dementia per tissue type. Finally, we used the aggregated Cauchy association test to combine TWAS p-values across three tissues to obtain omnibus TWAS p-values per gene. Results We identified 37 genes in prefrontal cortex, 55 in cortex, and 51 in whole blood that were significantly associated with AD dementia. By combining BGW-TWAS p-values across these three tissues, we obtained 93 significant risk genes including 29 genes primarily due to trans -eQTL and 50 novel genes. Utilizing protein-protein interaction network and phenotype enrichment analyses with these 93 significant risk genes, we detected 5 functional clusters comprised of both known and novel AD risk genes and 7 enriched phenotypes. Conclusion We applied BGW-TWAS and aggregated Cauchy test methods to integrate both cis - and trans -eQTL data of brain and blood tissues with GWAS summary data to identify risk genes of AD dementia. The risk genes we identified provide novel insights into the underlying biological pathways implicated in AD dementia.
Collapse
|
6
|
McManus JN, Lovelett RJ, Lowengrub D, Christensen S. A unifying statistical framework to discover disease genes from GWASs. CELL GENOMICS 2023; 3:100264. [PMID: 36950381 PMCID: PMC10025450 DOI: 10.1016/j.xgen.2023.100264] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 09/07/2022] [Accepted: 01/19/2023] [Indexed: 03/10/2023]
Abstract
Genome-wide association studies (GWASs) identify genomic loci associated with complex traits, but it remains a challenge to identify the genes affected by causal genetic variants in these loci. Attempts to solve this challenge are frustrated by a number of compounding problems. Here, we show how to combine solutions to these problems into a unified mathematical framework. From this synthesis, it becomes possible to compute the probability that each gene in the genome is affected by a causal variant, given a particular trait, without making assumptions about the relevant cell types or tissues. We validate each component of the framework individually and in combination. When applied to large GWASs of human disease, the resulting paradigm can rediscover the majority of well-known disease genes. Moreover, it establishes human genetics support for many genes previously implicated only by clinical or preclinical evidence, and it uncovers a plethora of novel disease genes with compelling biological rationale.
Collapse
|
7
|
Li Q, Perera D, Cao C, He J, Bian J, Chen X, Azeem F, Howe A, Au B, Wu J, Yan J, Long Q. Interaction-integrated linear mixed model reveals 3D-genetic basis underlying Autism. Genomics 2023; 115:110575. [PMID: 36758877 DOI: 10.1016/j.ygeno.2023.110575] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 01/16/2023] [Accepted: 02/03/2023] [Indexed: 02/10/2023]
Abstract
Genetic interactions play critical roles in genotype-phenotype associations. We developed a novel interaction-integrated linear mixed model (ILMM) that integrates a priori knowledge into linear mixed models. ILMM enables statistical integration of genetic interactions upfront and overcomes the problems of searching for combinations. To demonstrate its utility, with 3D genomic interactions (assessed by Hi-C experiments) as a priori, we applied ILMM to whole-genome sequencing data for Autism Spectrum Disorders (ASD) and brain transcriptome data, revealing the 3D-genetic basis of ASD and 3D-expression quantitative loci (3D-eQTLs) for brain tissues. Notably, we reported a potential mechanism involving distal regulation between FOXP2 and DNMT3A, conferring the risk of ASD.
Collapse
Affiliation(s)
- Qing Li
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Deshan Perera
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Chen Cao
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Jingni He
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Jiayi Bian
- Department of Mathematics and Statistics, University of Calgary, Alberta T2N 1N4, Canada
| | - Xingyu Chen
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Feeha Azeem
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Aaron Howe
- Heritage Youth Researcher Summer Program, University of Calgary, Alberta T2N 1N4, Canada
| | - Billie Au
- Department of Medical Genetics, University of Calgary, Alberta T2N 1N4, Canada; Alberta Children's Hospital Research Institute, University of Calgary, Alberta T2N 1N4, Canada
| | - Jingjing Wu
- Department of Mathematics and Statistics, University of Calgary, Alberta T2N 1N4, Canada
| | - Jun Yan
- Department of Physiology and Pharmacology, University of Calgary, Alberta T2N 1N4, Canada; Hotchkiss Brain Institute, University of Calgary, Alberta T2N 1N4, Canada.
| | - Quan Long
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada; Department of Medical Genetics, University of Calgary, Alberta T2N 1N4, Canada; Department of Mathematics and Statistics, University of Calgary, Alberta T2N 1N4, Canada; Alberta Children's Hospital Research Institute, University of Calgary, Alberta T2N 1N4, Canada; Hotchkiss Brain Institute, University of Calgary, Alberta T2N 1N4, Canada.
| |
Collapse
|
8
|
Xia X, Zhang Y, Wei Y, Wang MH. Statistical Methods for Disease Risk Prediction with Genotype Data. Methods Mol Biol 2023; 2629:331-347. [PMID: 36929084 DOI: 10.1007/978-1-0716-2986-4_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Single-nucleotide polymorphism (SNP) is the basic unit to understand the heritability of complex traits. One attractive application of the susceptible SNPs is to construct prediction models for assessing disease risk. Here, we introduce prediction methods for human traits using SNPs data, including the polygenic risk score (PRS), linear mixed models (LMMs), penalized regressions, and methods for controlling population stratification.
Collapse
Affiliation(s)
- Xiaoxuan Xia
- JC School of Public Health and Primary Care, the Chinese University of Hong Kong (CUHK), Shatin, Hong Kong
- Department of Statistics, the Chinese University of Hong Kong (CUHK), Shatin, Hong Kong
| | | | - Yingying Wei
- Department of Statistics, the Chinese University of Hong Kong (CUHK), Shatin, Hong Kong
| | - Maggie Haitian Wang
- JC School of Public Health and Primary Care, the Chinese University of Hong Kong (CUHK), Shatin, Hong Kong.
- CUHK Shenzhen Institute, Shenzhen, China.
| |
Collapse
|
9
|
Chen J, Wang L, De Jager PL, Bennett DA, Buchman AS, Yang J. A scalable Bayesian functional GWAS method accounting for multivariate quantitative functional annotations with applications for studying Alzheimer disease. HGG ADVANCES 2022; 3:100143. [PMID: 36204489 PMCID: PMC9530673 DOI: 10.1016/j.xhgg.2022.100143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 09/14/2022] [Indexed: 11/30/2022] Open
Abstract
Existing methods for integrating functional annotations in genome-wide association studies (GWASs) to fine-map and prioritize potential causal variants are limited to using non-overlapped categorical annotations or limited by the computation burden of modeling genome-wide variants. To overcome these limitations, we propose a scalable Bayesian functional GWAS method to account for multivariate quantitative functional annotations (BFGWAS_QUANT), accompanied by a scalable computation algorithm enabling joint modeling of genome-wide variants. Simulation studies validated the performance of BFGWAS_QUANT for accurately quantifying annotation enrichment and improving GWAS power. Applying BFGWAS_QUANT to study five Alzheimer disease (AD)-related phenotypes using individual-level GWAS data (n = ∼1,000), we found that histone modification annotations have higher enrichment than expression quantitative trait locus (eQTL) annotations for all considered phenotypes, with the highest enrichment in H3K27me3 (polycomb regression). We also found that cis-eQTLs in microglia had higher enrichment than eQTLs of bulk brain frontal cortex tissue for all considered phenotypes. A similar enrichment pattern was also identified using the International Genomics of Alzheimer's Project (IGAP) summary-level GWAS data of AD (n = ∼54,000). The strongest known APOE E4 risk allele was identified for all five phenotypes, and the APOE locus was validated using the IGAP data. BFGWAS_QUANT fine-mapped 32 significant variants from 1,073 genome-wide significant variants in the IGAP data. We also demonstrated that the polygenic risk scores (PRSs) using effect size estimates by BFGWAS_QUANT had a similar prediction accuracy as other methods assuming a sparse causal model. Overall, BFGWAS_QUANT is a useful GWAS tool for quantifying annotation enrichment and prioritizing potential causal variants.
Collapse
Affiliation(s)
- Junyu Chen
- Department of Epidemiology, Emory University School of Public Health, Atlanta, GA 30322, USA
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Lei Wang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Philip L. De Jager
- Center for Translational and Computational Neuroimmunology, Department of Neurology and Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - David A. Bennett
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - Aron S. Buchman
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| |
Collapse
|
10
|
Ros-Freixedes R, Johnsson M, Whalen A, Chen CY, Valente BD, Herring WO, Gorjanc G, Hickey JM. Genomic prediction with whole-genome sequence data in intensely selected pig lines. GENETICS SELECTION EVOLUTION 2022; 54:65. [PMID: 36153511 PMCID: PMC9509613 DOI: 10.1186/s12711-022-00756-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 09/05/2022] [Indexed: 12/03/2022]
Abstract
Background Early simulations indicated that whole-genome sequence data (WGS) could improve the accuracy of genomic predictions within and across breeds. However, empirical results have been ambiguous so far. Large datasets that capture most of the genomic diversity in a population must be assembled so that allele substitution effects are estimated with high accuracy. The objectives of this study were to use a large pig dataset from seven intensely selected lines to assess the benefits of using WGS for genomic prediction compared to using commercial marker arrays and to identify scenarios in which WGS provides the largest advantage. Methods We sequenced 6931 individuals from seven commercial pig lines with different numerical sizes. Genotypes of 32.8 million variants were imputed for 396,100 individuals (17,224 to 104,661 per line). We used BayesR to perform genomic prediction for eight complex traits. Genomic predictions were performed using either data from a standard marker array or variants preselected from WGS based on association tests. Results The accuracies of genomic predictions based on preselected WGS variants were not robust across traits and lines and the improvements in prediction accuracy that we achieved so far with WGS compared to standard marker arrays were generally small. The most favourable results for WGS were obtained when the largest training sets were available and standard marker arrays were augmented with preselected variants with statistically significant associations to the trait. With this method and training sets of around 80k individuals, the accuracy of within-line genomic predictions was on average improved by 0.025. With multi-line training sets, improvements of 0.04 compared to marker arrays could be expected. Conclusions Our results showed that WGS has limited potential to improve the accuracy of genomic predictions compared to marker arrays in intensely selected pig lines. Thus, although we expect that larger improvements in accuracy from the use of WGS are possible with a combination of larger training sets and optimised pipelines for generating and analysing such datasets, the use of WGS in the current implementations of genomic prediction should be carefully evaluated against the cost of large-scale WGS data on a case-by-case basis. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-022-00756-0.
Collapse
|
11
|
Yuan Z, Liu L, Guo P, Yan R, Xue F, Zhou X. Likelihood-based Mendelian randomization analysis with automated instrument selection and horizontal pleiotropic modeling. SCIENCE ADVANCES 2022; 8:eabl5744. [PMID: 35235357 PMCID: PMC8890724 DOI: 10.1126/sciadv.abl5744] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Accepted: 01/05/2022] [Indexed: 05/03/2023]
Abstract
Mendelian randomization (MR) is a common tool for identifying causal risk factors underlying diseases. Here, we present a method, MR with automated instrument determination (MRAID), for effective MR analysis. MRAID borrows ideas from fine-mapping analysis to model an initial set of candidate single-nucleotide polymorphisms that are in potentially high linkage disequilibrium with each other and automatically selects among them the suitable instruments for causal inference. MRAID also explicitly models both uncorrelated and correlated horizontal pleiotropic effects that are widespread for complex trait analysis. MRAID achieves both tasks through a joint likelihood framework and relies on a scalable sampling-based algorithm to compute calibrated P values. Comprehensive and realistic simulations show that MRAID can provide calibrated type I error control and reduce false positives while being more powerful than existing approaches. We illustrate the benefits of MRAID for an MR screening analysis across 645 trait pairs in U.K. Biobank, identifying multiple lifestyle causal risk factors of cardiovascular disease-related traits.
Collapse
Affiliation(s)
- Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
| | - Lu Liu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
| | - Ping Guo
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
| | - Ran Yan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
12
|
Yang Y, Sun Q, Huang L, Broome JG, Correa A, Reiner A, Raffield LM, Yang Y, Li Y. eSCAN: scan regulatory regions for aggregate association testing using whole-genome sequencing data. Brief Bioinform 2022; 23:bbab497. [PMID: 34882196 PMCID: PMC8898002 DOI: 10.1093/bib/bbab497] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 10/25/2021] [Accepted: 10/30/2021] [Indexed: 02/07/2023] Open
Abstract
Multiple statistical methods for aggregate association testing have been developed for whole-genome sequencing (WGS) data. Many aggregate variants in a given genomic window and ignore existing knowledge to define test regions, resulting in many identified regions not clearly linked to genes, and thus, limiting biological understanding. Functional information from new technologies (such as Hi-C and its derivatives), which can help link enhancers to their effector genes, can be leveraged to predefine variant sets for aggregate testing in WGS data. Here, we propose the eSCAN (scan the enhancers) method for genome-wide assessment of enhancer regions in sequencing studies, combining the advantages of dynamic window selection in SCANG (SCAN the Genome), a previously developed method, with the advantages of incorporating putative regulatory regions from annotation. eSCAN, by searching in putative enhancers, increases statistical power and aids mechanistic interpretation, as demonstrated by extensive simulation studies. We also apply eSCAN for blood cell traits using NHLBI Trans-Omics for Precision Medicine WGS data. Results from real data analysis show that eSCAN is able to capture more significant signals, and these signals are of shorter length (indicating higher resolution fine-mapping capability) and drive association of larger regions detected by other methods.
Collapse
Affiliation(s)
- Yingxi Yang
- Department of Statistics and Data Science, Yale University, New Haven, CT, 06511, USA
| | - Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Le Huang
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Jai G Broome
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA
| | - Adolfo Correa
- Department of Medicine and Population Health Science, University of Mississippi Medical Center, Jackson, MS, 39216, USA
| | - Alexander Reiner
- Department of Epidemiology, University of Washington, Seattle, WA, 98195, USA
- Fred Hutchinson Cancer Research Center, University of Washington, Seattle, WA, 98195, USA
| | | | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Yuchen Yang
- State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, 510275 Guangzhou, China
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| |
Collapse
|
13
|
Demetci P, Cheng W, Darnell G, Zhou X, Ramachandran S, Crawford L. Multi-scale inference of genetic trait architecture using biologically annotated neural networks. PLoS Genet 2021; 17:e1009754. [PMID: 34411094 PMCID: PMC8407593 DOI: 10.1371/journal.pgen.1009754] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 08/31/2021] [Accepted: 07/31/2021] [Indexed: 01/01/2023] Open
Abstract
In this article, we present Biologically Annotated Neural Networks (BANNs), a nonlinear probabilistic framework for association mapping in genome-wide association (GWA) studies. BANNs are feedforward models with partially connected architectures that are based on biological annotations. This setup yields a fully interpretable neural network where the input layer encodes SNP-level effects, and the hidden layer models the aggregated effects among SNP-sets. We treat the weights and connections of the network as random variables with prior distributions that reflect how genetic effects manifest at different genomic scales. The BANNs software uses variational inference to provide posterior summaries which allow researchers to simultaneously perform (i) mapping with SNPs and (ii) enrichment analyses with SNP-sets on complex traits. Through simulations, we show that our method improves upon state-of-the-art association mapping and enrichment approaches across a wide range of genetic architectures. We then further illustrate the benefits of BANNs by analyzing real GWA data assayed in approximately 2,000 heterogenous stock of mice from the Wellcome Trust Centre for Human Genetics and approximately 7,000 individuals from the Framingham Heart Study. Lastly, using a random subset of individuals of European ancestry from the UK Biobank, we show that BANNs is able to replicate known associations in high and low-density lipoprotein cholesterol content.
Collapse
Affiliation(s)
- Pinar Demetci
- Department of Computer Science, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| | - Wei Cheng
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island, United States of America
| | - Gregory Darnell
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
- Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sohini Ramachandran
- Department of Computer Science, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island, United States of America
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
- Department of Biostatistics, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
14
|
Liu L, Chandrashekar P, Zeng B, Sanderford MD, Kumar S, Gibson G. TreeMap: a structured approach to fine mapping of eQTL variants. Bioinformatics 2021; 37:1125-1134. [PMID: 33135051 PMCID: PMC8150140 DOI: 10.1093/bioinformatics/btaa927] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 10/01/2020] [Accepted: 10/20/2020] [Indexed: 11/14/2022] Open
Abstract
Motivation Expression quantitative trait loci (eQTL) harbor genetic variants modulating gene transcription. Fine mapping of regulatory variants at these loci is a daunting task due to the juxtaposition of causal and linked variants at a locus as well as the likelihood of interactions among multiple variants. This problem is exacerbated in genes with multiple cis-acting eQTL, where superimposed effects of adjacent loci further distort the association signals. Results We developed a novel algorithm, TreeMap, that identifies putative causal variants in cis-eQTL accounting for multisite effects and genetic linkage at a locus. Guided by the hierarchical structure of linkage disequilibrium, TreeMap performs an organized search for individual and multiple causal variants. Via extensive simulations, we show that TreeMap detects co-regulating variants more accurately than current methods. Furthermore, its high computational efficiency enables genome-wide analysis of long-range eQTL. We applied TreeMap to GTEx data of brain hippocampus samples and transverse colon samples to search for eQTL in gene bodies and in 4 Mbps gene-flanking regions, discovering numerous distal eQTL. Furthermore, we found concordant distal eQTL that were present in both brain and colon samples, implying long-range regulation of gene expression. Availability and implementation TreeMap is available as an R package enabled for parallel processing at https://github.com/liliulab/treemap. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA.,Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA
| | - Pramod Chandrashekar
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA.,Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA
| | - Biao Zeng
- Center for Integrative Genomics, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Maxwell D Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA.,Department of Biology, Temple University, Philadelphia, PA 19122, USA.,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Greg Gibson
- Center for Integrative Genomics, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
15
|
Ruffieux H, Fairfax BP, Nassiri I, Vigorito E, Wallace C, Richardson S, Bottolo L. EPISPOT: An epigenome-driven approach for detecting and interpreting hotspots in molecular QTL studies. Am J Hum Genet 2021; 108:983-1000. [PMID: 33909991 PMCID: PMC8206410 DOI: 10.1016/j.ajhg.2021.04.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 04/08/2021] [Indexed: 12/27/2022] Open
Abstract
We present EPISPOT, a fully joint framework which exploits large panels of epigenetic annotations as variant-level information to enhance molecular quantitative trait locus (QTL) mapping. Thanks to a purpose-built Bayesian inferential algorithm, EPISPOT accommodates functional information for both cis and trans actions, including QTL hotspot effects. It effectively couples simultaneous QTL analysis of thousands of genetic variants and molecular traits with hypothesis-free selection of biologically interpretable annotations which directly contribute to the QTL effects. This unified, epigenome-aided learning boosts statistical power and sheds light on the regulatory basis of the uncovered hits; EPISPOT therefore marks an essential step toward improving the challenging detection and functional interpretation of trans-acting genetic variants and hotspots. We illustrate the advantages of EPISPOT in simulations emulating real-data conditions and in a monocyte expression QTL study, which confirms known hotspots and finds other signals, as well as plausible mechanisms of action. In particular, by highlighting the role of monocyte DNase-I sensitivity sites from >150 epigenetic annotations, we clarify the mediation effects and cell-type specificity of major hotspots close to the lysozyme gene. Our approach forgoes the daunting and underpowered task of one-annotation-at-a-time enrichment analyses for prioritizing cis and trans QTL hits and is tailored to any transcriptomic, proteomic, or metabolomic QTL problem. By enabling principled epigenome-driven QTL mapping transcriptome-wide, EPISPOT helps progress toward a better functional understanding of genetic regulation.
Collapse
Affiliation(s)
- Hélène Ruffieux
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK.
| | - Benjamin P Fairfax
- Department of Oncology, MRC Weatherall Institute for Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DS, UK
| | - Isar Nassiri
- Department of Oncology, MRC Weatherall Institute for Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DS, UK
| | - Elena Vigorito
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK
| | - Chris Wallace
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK; Cambridge Institute of Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge CB2 0AW, UK
| | - Sylvia Richardson
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK; The Alan Turing Institute, London NW1 2DB, UK
| | - Leonardo Bottolo
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK; The Alan Turing Institute, London NW1 2DB, UK; Department of Medical Genetics, University of Cambridge, Cambridge CB2 0QQ, UK
| |
Collapse
|
16
|
Palmer RHC, Johnson EC, Won H, Polimanti R, Kapoor M, Chitre A, Bogue MA, Benca‐Bachman CE, Parker CC, Verma A, Reynolds T, Ernst J, Bray M, Kwon SB, Lai D, Quach BC, Gaddis NC, Saba L, Chen H, Hawrylycz M, Zhang S, Zhou Y, Mahaffey S, Fischer C, Sanchez‐Roige S, Bandrowski A, Lu Q, Shen L, Philip V, Gelernter J, Bierut LJ, Hancock DB, Edenberg HJ, Johnson EO, Nestler EJ, Barr PB, Prins P, Smith DJ, Akbarian S, Thorgeirsson T, Walton D, Baker E, Jacobson D, Palmer AA, Miles M, Chesler EJ, Emerson J, Agrawal A, Martone M, Williams RW. Integration of evidence across human and model organism studies: A meeting report. GENES, BRAIN, AND BEHAVIOR 2021; 20:e12738. [PMID: 33893716 PMCID: PMC8365690 DOI: 10.1111/gbb.12738] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 04/11/2021] [Accepted: 04/21/2021] [Indexed: 12/13/2022]
Abstract
The National Institute on Drug Abuse and Joint Institute for Biological Sciences at the Oak Ridge National Laboratory hosted a meeting attended by a diverse group of scientists with expertise in substance use disorders (SUDs), computational biology, and FAIR (Findability, Accessibility, Interoperability, and Reusability) data sharing. The meeting's objective was to discuss and evaluate better strategies to integrate genetic, epigenetic, and 'omics data across human and model organisms to achieve deeper mechanistic insight into SUDs. Specific topics were to (a) evaluate the current state of substance use genetics and genomics research and fundamental gaps, (b) identify opportunities and challenges of integration and sharing across species and data types, (c) identify current tools and resources for integration of genetic, epigenetic, and phenotypic data, (d) discuss steps and impediment related to data integration, and (e) outline future steps to support more effective collaboration-particularly between animal model research communities and human genetics and clinical research teams. This review summarizes key facets of this catalytic discussion with a focus on new opportunities and gaps in resources and knowledge on SUDs.
Collapse
Affiliation(s)
- Rohan H. C. Palmer
- Behavioral Genetics of Addiction Laboratory, Department of PsychologyEmory UniversityAtlantaGeorgiaUSA
| | - Emma C. Johnson
- Department of PsychiatryWashington University School of MedicineSt. LouisMissouriUSA
| | - Hyejung Won
- Department of Genetics and Neuroscience CenterUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
| | - Renato Polimanti
- Department of PsychiatryYale University School of MedicineWest HavenConnecticutUSA
| | - Manav Kapoor
- Nash Family Department of Neuroscience and Friedman Brain InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Apurva Chitre
- Department of PsychiatryUniversity of California, San DiegoLa JollaCaliforniaUSA
| | | | - Chelsie E. Benca‐Bachman
- Behavioral Genetics of Addiction Laboratory, Department of PsychologyEmory UniversityAtlantaGeorgiaUSA
| | - Clarissa C. Parker
- Department of Psychology and Program in NeuroscienceMiddlebury CollegeMiddleburyVermontUSA
| | - Anurag Verma
- Biomedical and Translational Informatics LaboratoryUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | | | - Jason Ernst
- Department of Biological ChemistryUniversity of California Los AngelesLos AngelesCaliforniaUSA
| | - Michael Bray
- Department of PsychiatryWashington University School of MedicineSt. LouisMissouriUSA
| | - Soo Bin Kwon
- Department of Biological ChemistryUniversity of California Los AngelesLos AngelesCaliforniaUSA
| | - Dongbing Lai
- Department of Medical and Molecular GeneticsIndiana University School of MedicineIndianapolisIndianaUSA
| | - Bryan C. Quach
- GenOmics, Bioinformatics, and Translational Research Center, Biostatistics and Epidemiology DivisionRTI InternationalResearch Triangle ParkNorth CarolinaUSA
| | - Nathan C. Gaddis
- GenOmics, Bioinformatics, and Translational Research Center, Biostatistics and Epidemiology DivisionRTI InternationalResearch Triangle ParkNorth CarolinaUSA
| | - Laura Saba
- Department of Pharmaceutical SciencesUniversity of Colorado, Anschutz Medical CampusAuroraColoradoUSA
| | - Hao Chen
- Department of Pharmacology, Addiction Science, and ToxicologyUniversity of Tennessee Health Science CenterMemphisTennesseeUSA
| | | | - Shan Zhang
- Department of Statistics and ProbabilityMichigan State UniversityEast LansingMichiganUSA
| | - Yuan Zhou
- Department of Department of BiostatisticsUniversity of FloridaGainesvilleFloridaUSA
| | - Spencer Mahaffey
- Department of Pharmaceutical Sciences, School of PharmacyUniversity of Colorado DenverAuroraColoradoUSA
| | - Christian Fischer
- Department of Genetics, Genomics and InformaticsUniversity of Tennessee Health Science CenterMemphisTennesseeUSA
| | - Sandra Sanchez‐Roige
- Department of PsychiatryUniversity of California, San DiegoLa JollaCaliforniaUSA
| | - Anita Bandrowski
- Department of NeuroscienceUniversity of California, San DiegoLa JollaCaliforniaUSA
| | - Qing Lu
- Department of Department of BiostatisticsUniversity of FloridaGainesvilleFloridaUSA
| | - Li Shen
- Nash Family Department of Neuroscience and Friedman Brain InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | | | - Joel Gelernter
- Department of PsychiatryYale University School of MedicineWest HavenConnecticutUSA
| | - Laura J. Bierut
- Department of PsychiatryWashington University School of MedicineSt. LouisMissouriUSA
| | - Dana B. Hancock
- GenOmics, Bioinformatics, and Translational Research Center, Biostatistics and Epidemiology DivisionRTI InternationalResearch Triangle ParkNorth CarolinaUSA
| | - Howard J. Edenberg
- Department of Medical and Molecular GeneticsIndiana University School of MedicineIndianapolisIndianaUSA
- Department of Biochemistry and Molecular BiologyIndiana University School of MedicineIndianapolisIndianaUSA
| | - Eric O. Johnson
- GenOmics, Bioinformatics, and Translational Research Center, Biostatistics and Epidemiology DivisionRTI InternationalResearch Triangle ParkNorth CarolinaUSA
| | - Eric J. Nestler
- Nash Family Department of Neuroscience and Friedman Brain InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Peter B. Barr
- Department of PsychologyVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Pjotr Prins
- Department of Genetics, Genomics and InformaticsUniversity of Tennessee Health Science CenterMemphisTennesseeUSA
| | - Desmond J. Smith
- Department of Molecular and Medical PharmacologyDavid Geffen School of Medicine, UCLALos AngelesCaliforniaUSA
| | - Schahram Akbarian
- Friedman Brain Institute and Departments of Psychiatry and NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | | | | | - Erich Baker
- Department of Computer ScienceBaylor UniversityWacoTexasUSA
| | - Daniel Jacobson
- Computational and Predictive Biology, BiosciencesOak Ridge National LaboratoryOak RidgeTennesseeUSA
- Department of PsychologyUniversity of Tennessee KnoxvilleKnoxvilleTennesseeUSA
| | - Abraham A. Palmer
- Department of PsychiatryUniversity of California, San DiegoLa JollaCaliforniaUSA
- Institute for Genomic Medicine, University of California San DiegoLa JollaCaliforniaUSA
| | - Michael Miles
- Department of Pharmacology and ToxicologyVirginia Commonwealth UniversityRichmondVirginiaUSA
| | | | | | - Arpana Agrawal
- Department of PsychiatryWashington University School of MedicineSt. LouisMissouriUSA
| | - Maryann Martone
- Department of NeuroscienceUniversity of California, San DiegoLa JollaCaliforniaUSA
| | - Robert W. Williams
- Department of Genetics, Genomics and InformaticsUniversity of Tennessee Health Science CenterMemphisTennesseeUSA
| |
Collapse
|
17
|
Ainsworth HC, Howard TD, Langefeld CD. Intrinsic DNA topology as a prioritization metric in genomic fine-mapping studies. Nucleic Acids Res 2020; 48:11304-11321. [PMID: 33084892 PMCID: PMC7672465 DOI: 10.1093/nar/gkaa877] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 08/23/2020] [Accepted: 09/25/2020] [Indexed: 12/15/2022] Open
Abstract
In genomic fine-mapping studies, some approaches leverage annotation data to prioritize likely functional polymorphisms. However, existing annotation resources can present challenges as many lack information for novel variants and/or may be uninformative for non-coding regions. We propose a novel annotation source, sequence-dependent DNA topology, as a prioritization metric for fine-mapping. DNA topology and function are well-intertwined, and as an intrinsic DNA property, it is readily applicable to any genomic region. Here, we constructed and applied Minor Groove Width (MGW) as a prioritization metric. Using an established MGW-prediction method, we generated a MGW census for 199 038 197 SNPs across the human genome. Summarizing a SNP's change in MGW (ΔMGW) as a Euclidean distance, ΔMGW exhibited a strongly right-skewed distribution, highlighting the infrequency of SNPs that generate dissimilar shape profiles. We hypothesized that phenotypically-associated SNPs can be prioritized by ΔMGW. We tested this hypothesis in 116 regions analyzed by a Massively Parallel Reporter Assay and observed enrichment of large ΔMGW for functional polymorphisms (P = 0.0007). To illustrate application in fine-mapping studies, we applied our MGW-prioritization approach to three non-coding regions associated with systemic lupus erythematosus. Together, this study presents the first usage of sequence-dependent DNA topology as a prioritization metric in genomic association studies.
Collapse
Affiliation(s)
- Hannah C Ainsworth
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
- Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Timothy D Howard
- Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Carl D Langefeld
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
- Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
- Comprehensive Cancer Center of Wake Forest Baptist Medical Center, Winston-Salem, NC 27157, USA
| |
Collapse
|
18
|
Luningham JM, Chen J, Tang S, De Jager PL, Bennett DA, Buchman AS, Yang J. Bayesian Genome-wide TWAS Method to Leverage both cis- and trans-eQTL Information through Summary Statistics. Am J Hum Genet 2020; 107:714-726. [PMID: 32961112 PMCID: PMC7536614 DOI: 10.1016/j.ajhg.2020.08.022] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 08/24/2020] [Indexed: 12/17/2022] Open
Abstract
Transcriptome-wide association studies (TWASs) have been widely used to integrate gene expression and genetic data for studying complex traits. Due to the computational burden, existing TWAS methods do not assess distant trans-expression quantitative trait loci (eQTL) that are known to explain important expression variation for most genes. We propose a Bayesian genome-wide TWAS (BGW-TWAS) method that leverages both cis- and trans-eQTL information for a TWAS. Our BGW-TWAS method is based on Bayesian variable selection regression, which not only accounts for cis- and trans-eQTL of the target gene but also enables efficient computation by using summary statistics from standard eQTL analyses. Our simulation studies illustrated that BGW-TWASs achieved higher power compared to existing TWAS methods that do not assess trans-eQTL information. We further applied BWG-TWAS to individual-level GWAS data (N = ∼3.3K), which identified significant associations between the genetically regulated gene expression (GReX) of ZC3H12B and Alzheimer dementia (AD) (p value = 5.42 × 10-13), neurofibrillary tangle density (p value = 1.89 × 10-6), and global measure of AD pathology (p value = 9.59 × 10-7). These associations for ZC3H12B were completely driven by trans-eQTL. Additionally, the GReX of KCTD12 was found to be significantly associated with β-amyloid (p value = 3.44 × 10-8) which was driven by both cis- and trans-eQTL. Four of the top driven trans-eQTL of ZC3H12B are located within APOC1, a known major risk gene of AD and blood lipids. Additionally, by applying BGW-TWAS with summary-level GWAS data of AD (N = ∼54K), we identified 13 significant genes including known GWAS risk genes HLA-DRB1 and APOC1, as well as ZC3H12B.
Collapse
Affiliation(s)
- Justin M Luningham
- Department of Population Health Sciences, Georgia State University School of Public Health, Atlanta, GA 30303, USA; Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Junyu Chen
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Shizhen Tang
- Department of Biostatistics and Bioinformatics, Emory University School of Public Health, Atlanta, GA 30322, USA; Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Philip L De Jager
- Center for Translational and Computational Neuroimmunology, Department of Neurology and Taub Institute for Research on Alzheimer disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - David A Bennett
- Rush Alzheimer disease Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - Aron S Buchman
- Rush Alzheimer disease Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA.
| |
Collapse
|
19
|
Biswas S, Pal S, Majumder PP, Bhattacharjee S. A framework for pathway knowledge driven prioritization in genome-wide association studies. Genet Epidemiol 2020; 44:841-853. [PMID: 32779262 PMCID: PMC7116354 DOI: 10.1002/gepi.22345] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 06/18/2020] [Accepted: 07/10/2020] [Indexed: 12/27/2022]
Abstract
Many variants with low frequencies or with low to modest effects likely remain unidentified in genome-wide association studies (GWAS) because of stringent genome-wide thresholds for detection. To improve the power of detection, variant prioritization based on their functional annotations and epigenetic landmarks has been used successfully. Here, we propose a novel method of prioritization of a GWAS by exploiting gene-level knowledge (e.g., annotations to pathways and ontologies) and show that it further improves power. Often, disease associated variants are found near genes that are coinvolved in specific biological pathways relevant to disease process. Utilization of this knowledge to conduct a prioritized scan increases the power to detect loci that map to genes clustered in a few specific pathways. We have developed a computationally scalable framework based on penalized logistic regression (termed GKnowMTest-Genomic Knowledge-guided Multiplte Testing) to enable a prioritized pathway-guided GWAS scan with a very large number of gene-level annotations. We demonstrate that the proposed strategy improves overall power and maintains the Type 1 error globally. Our method works on genome-wide summary level data and a user-specified list of pathways (e.g., those extracted from large pathway databases without reference to biology of a specific disease). It automatically reweights the input p values by incorporating the pathway enrichments as "adaptively learned" from the data using a cross-validation technique to avoid overfitting. We used whole-genome simulations and some publicly available GWAS data sets to illustrate the application of our method. The GKnowMTest framework has been implemented as a user-friendly open-source R package.
Collapse
Affiliation(s)
| | - Soumen Pal
- National Institute of Biomedical Genomics, Kalyani, India
| | | | | |
Collapse
|
20
|
Testing and controlling for horizontal pleiotropy with probabilistic Mendelian randomization in transcriptome-wide association studies. Nat Commun 2020; 11:3861. [PMID: 32737316 PMCID: PMC7395774 DOI: 10.1038/s41467-020-17668-6] [Citation(s) in RCA: 91] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Accepted: 07/10/2020] [Indexed: 02/06/2023] Open
Abstract
Integrating results from genome-wide association studies (GWASs) and gene expression studies through transcriptome-wide association study (TWAS) has the potential to shed light on the causal molecular mechanisms underlying disease etiology. Here, we present a probabilistic Mendelian randomization (MR) method, PMR-Egger, for TWAS applications. PMR-Egger relies on a MR likelihood framework that unifies many existing TWAS and MR methods, accommodates multiple correlated instruments, tests the causal effect of gene on trait in the presence of horizontal pleiotropy, and is scalable to hundreds of thousands of individuals. In simulations, PMR-Egger provides calibrated type I error control for causal effect testing in the presence of horizontal pleiotropic effects, is reasonably robust under various types of model misspecifications, is more powerful than existing TWAS/MR approaches, and can directly test for horizontal pleiotropy. We illustrate the benefits of PMR-Egger in applications to 39 diseases and complex traits obtained from three GWASs including the UK Biobank. Transcriptome-wide association studies integrate GWAS and transcriptome data to examine the molecular mechanisms underlying disease etiology. Here the authors present PMR-Egger, a powerful TWAS method based on probabilistic Mendelian Randomization.
Collapse
|
21
|
Wei Z, Ren Z, Hu S, Gao Y, Sun R, Lv S, Yang G, Yu Z, Kan Q. Development and validation of a simple risk model to predict major cancers for patients with nonalcoholic fatty liver disease. Cancer Med 2020; 9:1254-1262. [PMID: 31860170 PMCID: PMC6997093 DOI: 10.1002/cam4.2777] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 10/29/2019] [Accepted: 12/01/2019] [Indexed: 02/06/2023] Open
Abstract
OBJECTIVE To recognize risk factors and build up and validate a simple risk model predicting 8-year cancer events after nonalcoholic fatty liver disease (NAFLD). METHODS This was a retrospective cohort study. Patients with NAFLD (n = 5561) were randomly divided into groups: training (n = 1254), test (n = 627), evaluation (n = 627), and validation (n = 3053). Risk factors were recognized by statistical method named as a Cox model with Markov chain Monte Carlo (MCMC) simulation. This prediction score was established based on the training group and was further validated based on the testing and evaluation group from January 1, 2007 to December 31, 2009 and another 3053 independent cases from January 1, 2010 to February 13, 2014. RESULTS The main outcomes were NAFLD-related cancer events, including those of the liver, breast, esophagus, stomach, pancreas, prostate and colon, within 8 years after hospitalization for NAFLD diagnosis. Seven risk factors (age (every 5 years),LDL, smoking, BMI, diabetes, OSAS, and aspartate aminotransferase (every 5 units)) were identified as independent indicators of cancer events. This risk model contained a predictive range of 0.4%-37.7%, 0.3%-39.6%, and 0.4%-39.3% in the training, test, evaluation group, respectively, with a range 0.4%-30.4% for validation groups. In the training group, 12.6%, 76.9%, and 10.5% of patients, which corresponded to the low -, moderate -, and high-risk groups, had probabilities of, <0.01, <0.1, and 0.23 for 8-year events. CONCLUSIONS Seven risk factors were recognized and a simple risk model were developed and validated to predict the risk of cancer events after NAFLD based on 8 years. This simple risk score system may recognize high-risk patients and reduce cancer incidence.
Collapse
Affiliation(s)
- Zihan Wei
- Department of GeriatricsThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
- Department of Infectious DiseasesThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
- Gene Hospital of Henan ProvincePrecision Medicine CenterThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
- Department of PharmacyThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
| | - Zhigang Ren
- Department of Infectious DiseasesThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
- Gene Hospital of Henan ProvincePrecision Medicine CenterThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
- Department of PharmacyThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
| | - Shuang Hu
- National Clinical Research Center of Cardiovascular DiseasesState Key Laboratory of Cardiovascular DiseaseFuwai HospitalNational Center for Cardiovascular DiseasesBeijingChina
| | - Yan Gao
- National Clinical Research Center of Cardiovascular DiseasesState Key Laboratory of Cardiovascular DiseaseFuwai HospitalNational Center for Cardiovascular DiseasesBeijingChina
| | - Ranran Sun
- Department of Infectious DiseasesThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
- Gene Hospital of Henan ProvincePrecision Medicine CenterThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
- Department of PharmacyThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
| | - Shuai Lv
- Department of gastroenterologyThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
| | - Guojie Yang
- Department of GeriatricsThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
| | - Zujiang Yu
- Department of Infectious DiseasesThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
- Gene Hospital of Henan ProvincePrecision Medicine CenterThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
- Department of PharmacyThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
| | - Quancheng Kan
- Department of Infectious DiseasesThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
- Gene Hospital of Henan ProvincePrecision Medicine CenterThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
- Department of PharmacyThe First Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
| |
Collapse
|
22
|
Duggal P, Ladd-Acosta C, Ray D, Beaty TH. The Evolving Field of Genetic Epidemiology: From Familial Aggregation to Genomic Sequencing. Am J Epidemiol 2019; 188:2069-2077. [PMID: 31509181 PMCID: PMC7036654 DOI: 10.1093/aje/kwz193] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 08/15/2019] [Accepted: 08/19/2019] [Indexed: 12/21/2022] Open
Abstract
The field of genetic epidemiology is relatively young and brings together genetics, epidemiology, and biostatistics to identify and implement the best study designs and statistical analyses for identifying genes controlling risk for complex and heterogeneous diseases (i.e., those where genes and environmental risk factors both contribute to etiology). The field has moved quickly over the past 40 years partly because the technology of genotyping and sequencing has forced it to adapt while adhering to the fundamental principles of genetics. In the last two decades, the available tools for genetic epidemiology have expanded from a genetic focus (considering 1 gene at a time) to a genomic focus (considering the entire genome), and now they must further expand to integrate information from other “-omics” (e.g., epigenomics, transcriptomics as measured by RNA expression) at both the individual and the population levels. Additionally, we can now also evaluate gene and environment interactions across populations to better understand exposure and the heterogeneity in disease risk. The future challenges facing genetic epidemiology are considerable both in scale and techniques, but the importance of the field will not diminish because by design it ties scientific goals with public health applications.
Collapse
Affiliation(s)
- Priya Duggal
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
- Department of International Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| | - Christine Ladd-Acosta
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| | - Debashree Ray
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| | - Terri H Beaty
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| |
Collapse
|
23
|
Moura EG, Pamplona AKA, Balestre M. Functional models in genome-wide selection. PLoS One 2019; 14:e0222699. [PMID: 31644532 PMCID: PMC6808424 DOI: 10.1371/journal.pone.0222699] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 09/05/2019] [Indexed: 11/29/2022] Open
Abstract
The development of sequencing technologies has enabled the discovery of markers that are abundantly distributed over the whole genome. Knowledge about the marker locations in reference genomes provides further insights in the search for causal regions and the prediction of genomic values. The present study proposes a Bayesian functional approach for incorporating the marker locations into genomic analysis using stochastic methods to search causal regions and predict genotypic values. For this, three scenarios were analyzed: F2 population with 300 individuals and three different heritability levels (0.2, 0.5, and 0.8), along with 12,150 SNP markers that were distributed through ten linkage groups; F∞ populations with 320 individuals and three different heritability levels (0.2, 0.5, and 0.8), along with 10,020 SNP markers that were distributed through ten linkage groups; and data related to Eucalyptus spp. to measure the model performance in a real LD setting, with 611 individuals whose phenotypes were simulated from QTLs distributed through a panel of 36,812 SNPs with known positions. The performance of the proposed method was compared with those of other genome selection models, namely, RR-BLUP, Bayes B and Bayesian Lasso. The Bayesian functional model presented higher or similar predictive ability when compared with those classical regressions methods in simulated and real scenarios on different LD structures. In general, the Bayesian functional model also achieved higher computational efficiency, using 12 SNPs per MCMC round. The model was efficient in the identification of causal regions and showed high flexibility of analysis, as it is easily adaptable to any genomic selection model.
Collapse
Affiliation(s)
- Ernandes Guedes Moura
- Federal Institute of Maranhão - Campus São João dos Patos, São João dos Patos, Maranhão, Brasil
| | | | - Marcio Balestre
- Department of Statistics - Federal University of Lavras, Lavras, Minas Gerais, Brazil
| |
Collapse
|
24
|
Abstract
Inflammation of the blood vessels that serve the central nervous system has been increasingly identified as an early and possibly initiating event among neurodegenerative conditions such as Alzheimer's disease and related dementias. However, the causal relevance of vascular inflammation to major retinal degenerative diseases is unresolved. Here, we describe how genetics, aging-associated changes, and environmental factors contribute to vascular inflammation in age-related macular degeneration, diabetic retinopathy, and glaucoma. We highlight the importance of mouse models in studying the underlying mechanisms and possible treatments for these diseases. We conclude that data support vascular inflammation playing a central if not primary role in retinal degenerative diseases, and this association should be a focus of future research.
Collapse
Affiliation(s)
- Ileana Soto
- Department of Molecular and Cellular Biosciences, Rowan University, Glassboro, New Jersey 08028, USA;
| | - Mark P Krebs
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA;
| | | | - Gareth R Howell
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA; .,Sackler School of Graduate Biomedical Sciences, Tufts University School of Medicine, Boston, Massachusetts 02111, USA.,Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine 04469, USA
| |
Collapse
|
25
|
Le Clerc S, Limou S, Zagury JF. Large-Scale "OMICS" Studies to Explore the Physiopatholgy of HIV-1 Infection. Front Genet 2019; 10:799. [PMID: 31572435 PMCID: PMC6754074 DOI: 10.3389/fgene.2019.00799] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 07/30/2019] [Indexed: 12/23/2022] Open
Abstract
In this review, we present the main large-scale experimental studies that have been performed in the HIV/AIDS field. These “omics” studies are based on several technologies including genotyping, RNA interference, and transcriptome or epigenome analysis. Due to the direct connection with disease evolution, there has been a large focus on genotyping cohorts of well-characterized patients through genome-wide association studies (GWASs), but there have also been several invitro studies such as small interfering RNA (siRNA) interference or transcriptome analyses of HIV-1–infected cells. After describing the major results obtained with these omics technologies—including some with a high relevance for HIV-1 treatment—we discuss the next steps that the community needs to embrace in order to derive new actionable therapeutic or diagnostic targets. Only integrative approaches that combine all big data results and consider their complex interactions will allow us to capture the global picture of HIV molecular pathogenesis. This novel challenge will require large collaborative efforts and represents a huge open field for innovative bioinformatics approaches.
Collapse
Affiliation(s)
- Sigrid Le Clerc
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, HESAM Université, Paris, France
| | - Sophie Limou
- Centre de Recherche en Transplantation et Immunologie UMR1064, INSERM, Université de Nantes, Nantes, France.,Institut de Transplantation en Urologie et Néphrologie (ITUN), CHU de Nantes, Nantes, France.,Computer Sciences and Mathematics Department, Ecole Centrale de Nantes, Nantes, France
| | - Jean-François Zagury
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, HESAM Université, Paris, France
| |
Collapse
|
26
|
O'Connor LJ, Schoech AP, Hormozdiari F, Gazal S, Patterson N, Price AL. Extreme Polygenicity of Complex Traits Is Explained by Negative Selection. Am J Hum Genet 2019; 105:456-476. [PMID: 31402091 PMCID: PMC6732528 DOI: 10.1016/j.ajhg.2019.07.003] [Citation(s) in RCA: 147] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 07/03/2019] [Indexed: 12/16/2022] Open
Abstract
Complex traits and common diseases are extremely polygenic, their heritability spread across thousands of loci. One possible explanation is that thousands of genes and loci have similarly important biological effects when mutated. However, we hypothesize that for most complex traits, relatively few genes and loci are critical, and negative selection-purging large-effect mutations in these regions-leaves behind common-variant associations in thousands of less critical regions instead. We refer to this phenomenon as flattening. To quantify its effects, we introduce a mathematical definition of polygenicity, the effective number of independently associated SNPs (Me), which describes how evenly the heritability of a trait is spread across the genome. We developed a method, stratified LD fourth moments regression (S-LD4M), to estimate Me, validating that it produces robust estimates in simulations. Analyzing 33 complex traits (average N = 361k), we determined that heritability is spread ∼4× more evenly among common SNPs than among low-frequency SNPs. This difference, together with evolutionary modeling of new mutations, suggests that complex traits would be orders of magnitude less polygenic if not for the influence of negative selection. We also determined that heritability is spread more evenly within functionally important regions in proportion to their heritability enrichment; functionally important regions do not harbor common SNPs with greatly increased causal effect sizes, due to selective constraint. Our results suggest that for most complex traits, the genes and loci with the most critical biological effects often differ from those with the strongest common-variant associations.
Collapse
Affiliation(s)
- Luke J O'Connor
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Bioinformatics and Integrative Genomics, Harvard Graduate School of Arts and Sciences, Boston, MA 02115, USA.
| | - Armin P Schoech
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Farhad Hormozdiari
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Nick Patterson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
27
|
Nagpal S, Meng X, Epstein MP, Tsoi LC, Patrick M, Gibson G, De Jager PL, Bennett DA, Wingo AP, Wingo TS, Yang J. TIGAR: An Improved Bayesian Tool for Transcriptomic Data Imputation Enhances Gene Mapping of Complex Traits. Am J Hum Genet 2019; 105:258-266. [PMID: 31230719 PMCID: PMC6698804 DOI: 10.1016/j.ajhg.2019.05.018] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Accepted: 05/23/2019] [Indexed: 12/22/2022] Open
Abstract
The transcriptome-wide association studies (TWASs) that test for association between the study trait and the imputed gene expression levels from cis-acting expression quantitative trait loci (cis-eQTL) genotypes have successfully enhanced the discovery of genetic risk loci for complex traits. By using the gene expression imputation models fitted from reference datasets that have both genetic and transcriptomic data, TWASs facilitate gene-based tests with GWAS data while accounting for the reference transcriptomic data. The existing TWAS tools like PrediXcan and FUSION use parametric imputation models that have limitations for modeling the complex genetic architecture of transcriptomic data. Therefore, to improve on this, we employ a nonparametric Bayesian method that was originally proposed for genetic prediction of complex traits, which assumes a data-driven nonparametric prior for cis-eQTL effect sizes. The nonparametric Bayesian method is flexible and general because it includes both of the parametric imputation models used by PrediXcan and FUSION as special cases. Our simulation studies showed that the nonparametric Bayesian model improved both imputation R2 for transcriptomic data and the TWAS power over PrediXcan when ≥1% cis-SNPs co-regulate gene expression and gene expression heritability ≤0.2. In real applications, the nonparametric Bayesian method fitted transcriptomic imputation models for 57.8% more genes over PrediXcan, thus improving the power of follow-up TWASs. We implement both parametric PrediXcan and nonparametric Bayesian methods in a convenient software tool "TIGAR" (Transcriptome-Integrated Genetic Association Resource), which imputes transcriptomic data and performs subsequent TWASs using individual-level or summary-level GWAS data.
Collapse
Affiliation(s)
- Sini Nagpal
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Xiaoran Meng
- Department of Biostatistics and Bioinformatics, Emory University School of Public Health, Atlanta, GA 30322, USA; Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Michael P Epstein
- Department of Biostatistics and Bioinformatics, Emory University School of Public Health, Atlanta, GA 30322, USA; Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Lam C Tsoi
- Department of Dermatology; Department of Computational Medicine & Bioinformatics; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Matthew Patrick
- Department of Dermatology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Greg Gibson
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Philip L De Jager
- Medical Center Neurological Institute, Columbia University, New York, NY 10032, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - Aliza P Wingo
- Division of Mental Health, Atlanta VA Medical Center, Decatur, GA, USA; Department of Psychiatry and Behavioral Sciences, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Thomas S Wingo
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA; Department of Neurology, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA.
| |
Collapse
|
28
|
Wei J, Xie W, Li R, Wang S, Qu H, Ma R, Zhou X, Jia Z. Analysis of trait heritability in functionally partitioned rice genomes. Heredity (Edinb) 2019; 124:485-498. [PMID: 31253955 DOI: 10.1038/s41437-019-0244-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2019] [Revised: 06/05/2019] [Accepted: 06/08/2019] [Indexed: 01/10/2023] Open
Abstract
Knowledge of the genetic architecture of importantly agronomical traits can speed up genetic improvement in cultivated rice (Oryza sativa L.). Many recent investigations have leveraged genome-wide association studies (GWAS) to identify single nucleotide polymorphisms (SNPs), associated with agronomic traits in various rice populations. The reported trait-relevant SNPs appear to be arbitrarily distributed along the genome, including genic and nongenic regions. Whether the SNPs in different genomic regions play different roles in trait heritability and which region is more responsible for phenotypic variation remains opaque. We analyzed a natural rice population of 524 accessions with 3,616,597 SNPs to compare the genetic contributions of functionally distinct genomic regions for five agronomic traits, i.e., yield, heading date, plant height, grain length, and grain width. An analysis of heritability in the functionally partitioned rice genome showed that regulatory or intergenic regions account for the most trait heritability. A close look at the trait-associated SNPs (TASs) indicated that the majority of the TASs are located in nongenic regions, and the genetic effects of the TASs in nongenic regions are generally greater than those in genic regions. We further compared the predictabilities using the genetic variants from genic regions with those using nongenic regions. The results revealed that nongenic regions play a more important role than genic regions in trait heritability in rice, which is consistent with findings in humans and maize. This conclusion not only offers clues for basic research to disclose genetics behind these agronomic traits, but also provides a new perspective to facilitate genomic selection in rice.
Collapse
Affiliation(s)
- Julong Wei
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, Jiangsu, China.,Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Weibo Xie
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Ruidong Li
- Department of Botany & Plant Sciences, University of California (Riverside), Riverside, CA, USA
| | - Shibo Wang
- Department of Botany & Plant Sciences, University of California (Riverside), Riverside, CA, USA
| | - Han Qu
- Department of Botany & Plant Sciences, University of California (Riverside), Riverside, CA, USA
| | - Renyuan Ma
- Department of Botany & Plant Sciences, University of California (Riverside), Riverside, CA, USA.,Department of Mathematics, Bowdoin College, Brunswick, ME, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Zhenyu Jia
- Department of Botany & Plant Sciences, University of California (Riverside), Riverside, CA, USA.
| |
Collapse
|
29
|
Zhao Y, Zhu H, Lu Z, Knickmeyer RC, Zou F. Structured Genome-Wide Association Studies with Bayesian Hierarchical Variable Selection. Genetics 2019; 212:397-415. [PMID: 31010934 PMCID: PMC6553832 DOI: 10.1534/genetics.119.301906] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 04/08/2019] [Indexed: 02/04/2023] Open
Abstract
It becomes increasingly important in using genome-wide association studies (GWAS) to select important genetic information associated with qualitative or quantitative traits. Currently, the discovery of biological association among SNPs motivates various strategies to construct SNP-sets along the genome and to incorporate such set information into selection procedure for a higher selection power, while facilitating more biologically meaningful results. The aim of this paper is to propose a novel Bayesian framework for hierarchical variable selection at both SNP-set (group) level and SNP (within group) level. We overcome a key limitation of existing posterior updating scheme in most Bayesian variable selection methods by proposing a novel sampling scheme to explicitly accommodate the ultrahigh-dimensionality of genetic data. Specifically, by constructing an auxiliary variable selection model under SNP-set level, the new procedure utilizes the posterior samples of the auxiliary model to subsequently guide the posterior inference for the targeted hierarchical selection model. We apply the proposed method to a variety of simulation studies and show that our method is computationally efficient and achieves substantially better performance than competing approaches in both SNP-set and SNP selection. Applying the method to the Alzheimers Disease Neuroimaging Initiative (ADNI) data, we identify biologically meaningful genetic factors under several neuroimaging volumetric phenotypes. Our method is general and readily to be applied to a wide range of biomedical studies.
Collapse
Affiliation(s)
- Yize Zhao
- Department of Healthcare Policy and Research, Cornell University Weill Cornell, New York, New York 10065
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Zhaohua Lu
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105
| | - Rebecca C Knickmeyer
- Department of Pediatrics and Human Development, Michigan State University, East Lansing, Michigan 48824
| | - Fei Zou
- Department of Biostatistics, University of Florida, Gainesville, Florida 32611
| |
Collapse
|
30
|
Fan CC, Smeland OB, Schork AJ, Chen CH, Holland D, Lo MT, Sundar VS, Frei O, Jernigan TL, Andreassen OA, Dale AM. Beyond heritability: improving discoverability in imaging genetics. Hum Mol Genet 2019. [PMID: 29522091 DOI: 10.1093/hmg/ddy082] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Structural neuroimaging measures based on magnetic resonance imaging have been at the forefront of imaging genetics. Global efforts to ensure homogeneity of measurements across study sites have enabled large-scale imaging genetic projects, accumulating nearly 50K samples for genome-wide association studies (GWAS). However, not many novel genetic variants have been identified by these GWAS, despite the high heritability of structural neuroimaging measures. Here, we discuss the limitations of using heritability as a guidance for assessing statistical power of GWAS, and highlight the importance of discoverability-which is the power to detect genetic variants for a given phenotype depending on its unique genomic architecture and GWAS sample size. Further, we present newly developed methods that boost genetic discovery in imaging genetics. By redefining imaging measures independent of traditional anatomical conventions, it is possible to improve discoverability, enabling identification of more genetic effects. Moreover, by leveraging enrichment priors from genomic annotations and independent GWAS of pleiotropic traits, we can better characterize effect size distributions, and identify reliable and replicable loci associated with structural neuroimaging measures. Statistical tools leveraging novel insights into the genetic discoverability of human traits, promises to accelerate the identification of genetic underpinnings underlying brain structural variation.
Collapse
Affiliation(s)
- Chun Chieh Fan
- Center for Multimodal Imaging and Genetics, School of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Olav B Smeland
- NORMENT, KG Jebsen Centre for Psychosis Research, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Andrew J Schork
- Institute for Biological Psychiatry, Mental Health Center Sct. Hans, Capital Region of Denmark, Denmark
| | - Chi-Hua Chen
- Center for Multimodal Imaging and Genetics, School of Medicine, University of California San Diego, La Jolla, CA 92093, USA.,Department of Radiology, School of Medicine, University of California San Diego, La Jolla, CA 92037, USA
| | - Dominic Holland
- Department of Neurosciences, School of Medicine, University of California San Diego, La Jolla, CA 92037, USA
| | - Min-Tzu Lo
- Center for Multimodal Imaging and Genetics, School of Medicine, University of California San Diego, La Jolla, CA 92093, USA.,Department of Radiology, School of Medicine, University of California San Diego, La Jolla, CA 92037, USA
| | - V S Sundar
- Center for Multimodal Imaging and Genetics, School of Medicine, University of California San Diego, La Jolla, CA 92093, USA.,Department of Radiology, School of Medicine, University of California San Diego, La Jolla, CA 92037, USA
| | - Oleksandr Frei
- NORMENT, KG Jebsen Centre for Psychosis Research, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Terry L Jernigan
- Center for Human Development, University of California San Diego, La Jolla, CA 92093, USA
| | - Ole A Andreassen
- NORMENT, KG Jebsen Centre for Psychosis Research, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Anders M Dale
- Center for Multimodal Imaging and Genetics, School of Medicine, University of California San Diego, La Jolla, CA 92093, USA.,Department of Radiology, School of Medicine, University of California San Diego, La Jolla, CA 92037, USA.,Department of Neurosciences, School of Medicine, University of California San Diego, La Jolla, CA 92037, USA
| |
Collapse
|
31
|
Léveillard T, Philp NJ, Sennlaub F. Is Retinal Metabolic Dysfunction at the Center of the Pathogenesis of Age-related Macular Degeneration? Int J Mol Sci 2019; 20:ijms20030762. [PMID: 30754662 PMCID: PMC6387069 DOI: 10.3390/ijms20030762] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 02/04/2019] [Accepted: 02/05/2019] [Indexed: 01/12/2023] Open
Abstract
The retinal pigment epithelium (RPE) forms the outer blood⁻retina barrier and facilitates the transepithelial transport of glucose into the outer retina via GLUT1. Glucose is metabolized in photoreceptors via the tricarboxylic acid cycle (TCA) and oxidative phosphorylation (OXPHOS) but also by aerobic glycolysis to generate glycerol for the synthesis of phospholipids for the renewal of their outer segments. Aerobic glycolysis in the photoreceptors also leads to a high rate of production of lactate which is transported out of the subretinal space to the choroidal circulation by the RPE. Lactate taken up by the RPE is converted to pyruvate and metabolized via OXPHOS. Excess lactate in the RPE is transported across the basolateral membrane to the choroid. The uptake of glucose by cone photoreceptor cells is enhanced by rod-derived cone viability factor (RdCVF) secreted by rods and by insulin signaling. Together, the three cells act as symbiotes: the RPE supplies the glucose from the choroidal circulation to the photoreceptors, the rods help the cones, and both produce lactate to feed the RPE. In age-related macular degeneration this delicate ménage à trois is disturbed by the chronic infiltration of inflammatory macrophages. These immune cells also rely on aerobic glycolysis and compete for glucose and produce lactate. We here review the glucose metabolism in the homeostasis of the outer retina and in macrophages and hypothesize what happens when the metabolism of photoreceptors and the RPE is disturbed by chronic inflammation.
Collapse
Affiliation(s)
- Thierry Léveillard
- . Department of Genetics, Sorbonne Université, INSERM, CNRS, Institut de la Vision, 17 rue Moreau, F-75012 Paris, France.
| | - Nancy J Philp
- . Department of Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, PA 19107, USA.
| | - Florian Sennlaub
- . Department of Therapeutics, Sorbonne Université, INSERM, CNRS, Institut de la Vision, 17 rue Moreau, F-75012 Paris, France.
| |
Collapse
|
32
|
Awany D, Allali I, Dalvie S, Hemmings S, Mwaikono KS, Thomford NE, Gomez A, Mulder N, Chimusa ER. Host and Microbiome Genome-Wide Association Studies: Current State and Challenges. Front Genet 2019; 9:637. [PMID: 30723493 PMCID: PMC6349833 DOI: 10.3389/fgene.2018.00637] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Accepted: 11/27/2018] [Indexed: 12/20/2022] Open
Abstract
The involvement of the microbiome in health and disease is well established. Microbiome genome-wide association studies (mGWAS) are used to elucidate the interaction of host genetic variation with the microbiome. The emergence of this relatively new field has been facilitated by the advent of next generation sequencing technologies that enable the investigation of the complex interaction between host genetics and microbial communities. In this paper, we review recent studies investigating host-microbiome interactions using mGWAS. Additionally, we highlight the marked disparity in the sampling population of mGWAS carried out to date and draw attention to the critical need for inclusion of diverse populations.
Collapse
Affiliation(s)
- Denis Awany
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Imane Allali
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Shareefa Dalvie
- Department of Psychiatry and Mental Health, University of Cape Town, Cape Town, South Africa
| | - Sian Hemmings
- Department of Psychiatry, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Kilaza S Mwaikono
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Nicholas E Thomford
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Andres Gomez
- Department of Animal Science, University of Minnesota-Twin Cities, St. Paul, MN, United States
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Emile R Chimusa
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
33
|
Liu L, Sanderford MD, Patel R, Chandrashekar P, Gibson G, Kumar S. Biological relevance of computationally predicted pathogenicity of noncoding variants. Nat Commun 2019; 10:330. [PMID: 30659175 PMCID: PMC6338804 DOI: 10.1038/s41467-018-08270-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 12/19/2018] [Indexed: 11/15/2022] Open
Abstract
Computational prediction of the phenotypic propensities of noncoding single nucleotide variants typically combines annotation of genomic, functional and evolutionary attributes into a single score. Here, we evaluate if the claimed excellent accuracies of these predictions translate into high rates of success in addressing questions important in biological research, such as fine mapping causal variants, distinguishing pathogenic allele(s) at a given position, and prioritizing variants for genetic risk assessment. A significant disconnect is found to exist between the statistical modelling and biological performance of predictive approaches. We discuss fundamental reasons underlying these deficiencies and suggest that future improvements of computational predictions need to address confounding of allelic, positional and regional effects as well as imbalance of the proportion of true positive variants in candidate lists. Researchers can make use of a variety of computational tools to prioritize genetic variants and predict their pathogenicity. Here, the authors evaluate the performance of six of these tools in three typical biological tasks and find generally low concordance of predictions and experimental confirmation.
Collapse
Affiliation(s)
- Li Liu
- College of Health Solutions, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Maxwell D Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.,Department of Biology, Temple University, Philadelphia, PA, USA
| | - Pramod Chandrashekar
- College of Health Solutions, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Greg Gibson
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA.
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA. .,Department of Biology, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
34
|
Kichaev G, Bhatia G, Loh PR, Gazal S, Burch K, Freund MK, Schoech A, Pasaniuc B, Price AL. Leveraging Polygenic Functional Enrichment to Improve GWAS Power. Am J Hum Genet 2019; 104:65-75. [PMID: 30595370 DOI: 10.1101/222265] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Accepted: 11/14/2018] [Indexed: 05/28/2023] Open
Abstract
Functional genomics data has the potential to increase GWAS power by identifying SNPs that have a higher prior probability of association. Here, we introduce a method that leverages polygenic functional enrichment to incorporate coding, conserved, regulatory, and LD-related genomic annotations into association analyses. We show via simulations with real genotypes that the method, functionally informed novel discovery of risk loci (FINDOR), correctly controls the false-positive rate at null loci and attains a 9%-38% increase in the number of independent associations detected at causal loci, depending on trait polygenicity and sample size. We applied FINDOR to 27 independent complex traits and diseases from the interim UK Biobank release (average N = 130K). Averaged across traits, we attained a 13% increase in genome-wide significant loci detected (including a 20% increase for disease traits) compared to unweighted raw p values that do not use functional data. We replicated the additional loci in independent UK Biobank and non-UK Biobank data, yielding a highly statistically significant replication slope (0.66-0.69) in each case. Finally, we applied FINDOR to the full UK Biobank release (average N = 416K), attaining smaller relative improvements (consistent with simulations) but larger absolute improvements, detecting an additional 583 GWAS loci. In conclusion, leveraging functional enrichment using our method robustly increases GWAS power.
Collapse
Affiliation(s)
- Gleb Kichaev
- Interdepartamental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA.
| | - Gaurav Bhatia
- Department of Epidemiology. Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Po-Ru Loh
- Department of Epidemiology. Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
| | - Steven Gazal
- Department of Epidemiology. Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
| | - Kathryn Burch
- Interdepartamental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA
| | - Malika K Freund
- Department of Human Genetics, University of California, Los Angeles, CA 90095, USA
| | - Armin Schoech
- Department of Epidemiology. Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
| | - Bogdan Pasaniuc
- Interdepartamental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA; Department of Human Genetics, University of California, Los Angeles, CA 90095, USA; Department Pathology and Laboratory Medicine, University of California, Los Angeles, CA 90095, USA
| | - Alkes L Price
- Department of Epidemiology. Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA; Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA.
| |
Collapse
|
35
|
Kichaev G, Bhatia G, Loh PR, Gazal S, Burch K, Freund MK, Schoech A, Pasaniuc B, Price AL. Leveraging Polygenic Functional Enrichment to Improve GWAS Power. Am J Hum Genet 2019; 104:65-75. [PMID: 30595370 PMCID: PMC6323418 DOI: 10.1016/j.ajhg.2018.11.008] [Citation(s) in RCA: 621] [Impact Index Per Article: 103.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Accepted: 11/14/2018] [Indexed: 12/24/2022] Open
Abstract
Functional genomics data has the potential to increase GWAS power by identifying SNPs that have a higher prior probability of association. Here, we introduce a method that leverages polygenic functional enrichment to incorporate coding, conserved, regulatory, and LD-related genomic annotations into association analyses. We show via simulations with real genotypes that the method, functionally informed novel discovery of risk loci (FINDOR), correctly controls the false-positive rate at null loci and attains a 9%-38% increase in the number of independent associations detected at causal loci, depending on trait polygenicity and sample size. We applied FINDOR to 27 independent complex traits and diseases from the interim UK Biobank release (average N = 130K). Averaged across traits, we attained a 13% increase in genome-wide significant loci detected (including a 20% increase for disease traits) compared to unweighted raw p values that do not use functional data. We replicated the additional loci in independent UK Biobank and non-UK Biobank data, yielding a highly statistically significant replication slope (0.66-0.69) in each case. Finally, we applied FINDOR to the full UK Biobank release (average N = 416K), attaining smaller relative improvements (consistent with simulations) but larger absolute improvements, detecting an additional 583 GWAS loci. In conclusion, leveraging functional enrichment using our method robustly increases GWAS power.
Collapse
Affiliation(s)
- Gleb Kichaev
- Interdepartamental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA.
| | - Gaurav Bhatia
- Department of Epidemiology. Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Po-Ru Loh
- Department of Epidemiology. Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
| | - Steven Gazal
- Department of Epidemiology. Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
| | - Kathryn Burch
- Interdepartamental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA
| | - Malika K Freund
- Department of Human Genetics, University of California, Los Angeles, CA 90095, USA
| | - Armin Schoech
- Department of Epidemiology. Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
| | - Bogdan Pasaniuc
- Interdepartamental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA; Department of Human Genetics, University of California, Los Angeles, CA 90095, USA; Department Pathology and Laboratory Medicine, University of California, Los Angeles, CA 90095, USA
| | - Alkes L Price
- Department of Epidemiology. Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA; Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA.
| |
Collapse
|
36
|
|
37
|
Fine-mapping and functional studies highlight potential causal variants for rheumatoid arthritis and type 1 diabetes. Nat Genet 2018; 50:1366-1374. [PMID: 30224649 DOI: 10.1038/s41588-018-0216-7] [Citation(s) in RCA: 107] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 07/30/2018] [Indexed: 12/19/2022]
Abstract
To define potentially causal variants for autoimmune disease, we fine-mapped1,2 76 rheumatoid arthritis (11,475 cases, 15,870 controls)3 and type 1 diabetes loci (9,334 cases, 11,111 controls)4. After sequencing 799 1-kilobase regulatory (H3K4me3) regions within these loci in 568 individuals, we observed accurate imputation for 89% of common variants. We defined credible sets of ≤5 causal variants at 5 rheumatoid arthritis and 10 type 1 diabetes loci. We identified potentially causal missense variants at DNASE1L3, PTPN22, SH2B3, and TYK2, and noncoding variants at MEG3, CD28-CTLA4, and IL2RA. We also identified potential candidate causal variants at SIRPG and TNFAIP3. Using functional assays, we confirmed allele-specific protein binding and differential enhancer activity for three variants: the CD28-CTLA4 rs117701653 SNP, MEG3 rs34552516 indel, and TNFAIP3 rs35926684 indel.
Collapse
|
38
|
Inshaw JRJ, Cutler AJ, Burren OS, Stefana MI, Todd JA. Approaches and advances in the genetic causes of autoimmune disease and their implications. Nat Immunol 2018; 19:674-684. [PMID: 29925982 DOI: 10.1038/s41590-018-0129-8] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2017] [Accepted: 04/04/2018] [Indexed: 12/18/2022]
Abstract
Genome-wide association studies are transformative in revealing the polygenetic basis of common diseases, with autoimmune diseases leading the charge. Although the field is just over 10 years old, advances in understanding the underlying mechanistic pathways of these conditions, which result from a dense multifactorial blend of genetic, developmental and environmental factors, have already been informative, including insights into therapeutic possibilities. Nevertheless, the challenge of identifying the actual causal genes and pathways and their biological effects on altering disease risk remains for many identified susceptibility regions. It is this fundamental knowledge that will underpin the revolution in patient stratification, the discovery of therapeutic targets and clinical trial design in the next 20 years. Here we outline recent advances in analytical and phenotyping approaches and the emergence of large cohorts with standardized gene-expression data and other phenotypic data that are fueling a bounty of discovery and improved understanding of human physiology.
Collapse
Affiliation(s)
- Jamie R J Inshaw
- JDRF/Wellcome Diabetes and Inflammation Laboratory, Wellcome Centre for Human Genetics, Nuffield Department of Medicine, NIHR Oxford Biomedical Research Centre, University of Oxford, Oxford, UK
| | - Antony J Cutler
- JDRF/Wellcome Diabetes and Inflammation Laboratory, Wellcome Centre for Human Genetics, Nuffield Department of Medicine, NIHR Oxford Biomedical Research Centre, University of Oxford, Oxford, UK
| | - Oliver S Burren
- Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK
| | - M Irina Stefana
- JDRF/Wellcome Diabetes and Inflammation Laboratory, Wellcome Centre for Human Genetics, Nuffield Department of Medicine, NIHR Oxford Biomedical Research Centre, University of Oxford, Oxford, UK
| | - John A Todd
- JDRF/Wellcome Diabetes and Inflammation Laboratory, Wellcome Centre for Human Genetics, Nuffield Department of Medicine, NIHR Oxford Biomedical Research Centre, University of Oxford, Oxford, UK.
| |
Collapse
|
39
|
Yang J, Chen S, Abecasis G. Improved score statistics for meta-analysis in single-variant and gene-level association studies. Genet Epidemiol 2018; 42:333-343. [PMID: 29696691 DOI: 10.1002/gepi.22123] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Revised: 03/04/2018] [Accepted: 03/16/2018] [Indexed: 01/09/2023]
Abstract
Meta-analysis is now an essential tool for genetic association studies, allowing them to combine large studies and greatly accelerating the pace of genetic discovery. Although the standard meta-analysis methods perform equivalently as the more cumbersome joint analysis under ideal settings, they result in substantial power loss under unbalanced settings with various case-control ratios. Here, we investigate the power loss problem by the standard meta-analysis methods for unbalanced studies, and further propose novel meta-analysis methods performing equivalently to the joint analysis under both balanced and unbalanced settings. We derive improved meta-score-statistics that can accurately approximate the joint-score-statistics with combined individual-level data, for both linear and logistic regression models, with and without covariates. In addition, we propose a novel approach to adjust for population stratification by correcting for known population structures through minor allele frequencies. In the simulated gene-level association studies under unbalanced settings, our method recovered up to 85% power loss caused by the standard methods. We further showed the power gain of our methods in gene-level tests with 26 unbalanced studies of age-related macular degeneration . In addition, we took the meta-analysis of three unbalanced studies of type 2 diabetes as an example to discuss the challenges of meta-analyzing multi-ethnic samples. In summary, our improved meta-score-statistics with corrections for population stratification can be used to construct both single-variant and gene-level association studies, providing a useful framework for ensuring well-powered, convenient, cross-study analyses.
Collapse
Affiliation(s)
- Jingjing Yang
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America.,Department of Human Genetics, Center for Computational and Quantitative Genetics, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Sai Chen
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Gonçalo Abecasis
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | | |
Collapse
|
40
|
Ming J, Dai M, Cai M, Wan X, Liu J, Yang C. LSMM: a statistical approach to integrating functional annotations with genome-wide association studies. Bioinformatics 2018; 34:2788-2796. [DOI: 10.1093/bioinformatics/bty187] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 03/27/2018] [Indexed: 01/27/2023] Open
Affiliation(s)
- Jingsi Ming
- Department of Mathematics, Hong Kong Baptist University, Hong Kong
| | - Mingwei Dai
- School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, China
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong
| | - Mingxuan Cai
- Department of Mathematics, Hong Kong Baptist University, Hong Kong
| | - Xiang Wan
- Shenzhen Research Institute of Big Data, Shenzhen, China
| | - Jin Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Can Yang
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong
| |
Collapse
|
41
|
Hao X, Zeng P, Zhang S, Zhou X. Identifying and exploiting trait-relevant tissues with multiple functional annotations in genome-wide association studies. PLoS Genet 2018; 14:e1007186. [PMID: 29377896 PMCID: PMC5805369 DOI: 10.1371/journal.pgen.1007186] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Revised: 02/08/2018] [Accepted: 01/04/2018] [Indexed: 12/18/2022] Open
Abstract
Genome-wide association studies (GWASs) have identified many disease associated loci, the majority of which have unknown biological functions. Understanding the mechanism underlying trait associations requires identifying trait-relevant tissues and investigating associations in a trait-specific fashion. Here, we extend the widely used linear mixed model to incorporate multiple SNP functional annotations from omics studies with GWAS summary statistics to facilitate the identification of trait-relevant tissues, with which to further construct powerful association tests. Specifically, we rely on a generalized estimating equation based algorithm for parameter inference, a mixture modeling framework for trait-tissue relevance classification, and a weighted sequence kernel association test constructed based on the identified trait-relevant tissues for powerful association analysis. We refer to our analytic procedure as the Scalable Multiple Annotation integration for trait-Relevant Tissue identification and usage (SMART). With extensive simulations, we show how our method can make use of multiple complementary annotations to improve the accuracy for identifying trait-relevant tissues. In addition, our procedure allows us to make use of the inferred trait-relevant tissues, for the first time, to construct more powerful SNP set tests. We apply our method for an in-depth analysis of 43 traits from 28 GWASs using tissue-specific annotations in 105 tissues derived from ENCODE and Roadmap. Our results reveal new trait-tissue relevance, pinpoint important annotations that are informative of trait-tissue relationship, and illustrate how we can use the inferred trait-relevant tissues to construct more powerful association tests in the Wellcome trust case control consortium study.
Collapse
Affiliation(s)
- Xingjie Hao
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, Hubei, China
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States of America
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, United States of America
| | - Ping Zeng
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States of America
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, United States of America
| | - Shujun Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States of America
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, United States of America
| |
Collapse
|