1
|
King EA, Dunbar F, Davis JW, Degner JF. Estimating colocalization probability from limited summary statistics. BMC Bioinformatics 2021; 22:254. [PMID: 34000989 PMCID: PMC8130535 DOI: 10.1186/s12859-021-04170-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 05/05/2021] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Colocalization is a statistical method used in genetics to determine whether the same variant is causal for multiple phenotypes, for example, complex traits and gene expression. It provides stronger mechanistic evidence than shared significance, which can be produced through separate causal variants in linkage disequilibrium. Current colocalization methods require full summary statistics for both traits, limiting their use with the majority of reported GWAS associations (e.g. GWAS Catalog). We propose a new approximation to the popular coloc method that can be applied when limited summary statistics are available. Our method (POint EstiMation of Colocalization, POEMColoc) imputes missing summary statistics for one or both traits using LD structure in a reference panel, and performs colocalization using the imputed summary statistics. RESULTS We evaluate the performance of POEMColoc using real (UK Biobank phenotypes and GTEx eQTL) and simulated datasets. We show good correlation between posterior probabilities of colocalization computed from imputed and observed datasets and similar accuracy in simulation. We evaluate scenarios that might reduce performance and show that multiple independent causal variants in a region and imputation from a limited subset of typed variants have a larger effect while mismatched ancestry in the reference panel has a modest effect. Further, we find that POEMColoc is a better approximation of coloc when the imputed association statistics are from a well powered study (e.g., relatively larger sample size or effect size). Applying POEMColoc to estimate colocalization of GWAS Catalog entries and GTEx eQTL, we find evidence for colocalization of 150,000 trait-gene-tissue triplets. CONCLUSIONS We find that colocalization analysis performed with full summary statistics can be closely approximated when only the summary statistics of the top SNP are available for one or both traits. When applied to the full GWAS Catalog and GTEx eQTL, we find that colocalized trait-gene pairs are enriched in tissues relevant to disease etiology and for matches to approved drug mechanisms. POEMColoc R package is available at https://github.com/AbbVie-ComputationalGenomics/POEMColoc .
Collapse
Affiliation(s)
- Emily A King
- AbbVie Genomics Research Center, North Chicago, IL, USA
| | | | | | - Jacob F Degner
- AbbVie Genomics Research Center, North Chicago, IL, USA.
| |
Collapse
|
2
|
Santos JD, Chebotarov D, McNally KL, Bartholomé J, Droc G, Billot C, Glaszmann JC. Fine Scale Genomic Signals of Admixture and Alien Introgression among Asian Rice Landraces. Genome Biol Evol 2019; 11:1358-1373. [PMID: 31002105 PMCID: PMC6499253 DOI: 10.1093/gbe/evz084] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/11/2019] [Indexed: 12/26/2022] Open
Abstract
Modern rice cultivars are adapted to a range of environmental conditions and human preferences. At the root of this diversity is a marked genetic structure, owing to multiple foundation events. Admixture and recurrent introgression from wild sources have played upon this base to produce the myriad adaptations existing today. Genome-wide studies bring support to this idea, but understanding the history and nature of particular genetic adaptations requires the identification of specific patterns of genetic exchange. In this study, we explore the patterns of haplotype similarity along the genomes of a subset of rice cultivars available in the 3,000 Rice Genomes data set. We begin by establishing a custom method of classification based on a combination of dimensionality reduction and kernel density estimation. Through simulations, the behavior of this classifier is studied under scenarios of varying genetic divergence, admixture, and alien introgression. Finally, the method is applied to local haplotypes along the genome of a Core set of Asian Landraces. Taking the Japonica, Indica, and cAus groups as references, we find evidence of reciprocal introgressions covering 2.6% of reference genomes on average. Structured signals of introgression among reference accessions are discussed. We extend the analysis to elucidate the genetic structure of the group circum-Basmati: we delimit regions of Japonica, cAus, and Indica origin, as well as regions outlier to these groups (13% on average). Finally, the approach used highlights regions of partial to complete loss of structure that can be attributed to selective pressures during domestication.
Collapse
Affiliation(s)
- João D Santos
- UMR AGAP, CIRAD, Montpellier, France
- UMR AGAP, Université de Montpellier, France
| | - Dmytro Chebotarov
- International Rice Research Institute (IRRI), Los Baños, Philippines
| | - Kenneth L McNally
- International Rice Research Institute (IRRI), Los Baños, Philippines
| | - Jérôme Bartholomé
- UMR AGAP, CIRAD, Montpellier, France
- UMR AGAP, Université de Montpellier, France
- International Rice Research Institute (IRRI), Los Baños, Philippines
| | - Gaëtan Droc
- UMR AGAP, CIRAD, Montpellier, France
- UMR AGAP, Université de Montpellier, France
| | - Claire Billot
- UMR AGAP, CIRAD, Montpellier, France
- UMR AGAP, Université de Montpellier, France
| | | |
Collapse
|
3
|
Martin AR, Teferra S, Möller M, Hoal EG, Daly MJ. The critical needs and challenges for genetic architecture studies in Africa. Curr Opin Genet Dev 2018; 53:113-120. [PMID: 30240950 PMCID: PMC6494470 DOI: 10.1016/j.gde.2018.08.005] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 08/17/2018] [Accepted: 08/31/2018] [Indexed: 12/11/2022]
Abstract
Human genetic studies have long been vastly Eurocentric, raising a key question about the generalizability of these study findings to other populations. Because humans originated in Africa, these populations retain more genetic diversity, and yet individuals of African descent have been tremendously underrepresented in genetic studies. The diversity in Africa affords ample opportunities to improve fine-mapping resolution for associated loci, discover novel genetic associations with phenotypes, build more generalizable genetic risk prediction models, and better understand the genetic architecture of complex traits and diseases subject to varying environmental pressures. Thus, it is both ethically and scientifically imperative that geneticists globally surmount challenges that have limited progress in African genetic studies to date. Additionally, African investigators need to be meaningfully included, as greater inclusivity and enhanced research capacity afford enormous opportunities to accelerate genomic discoveries that translate more effectively to all populations. We review the advantages, challenges, and examples of genetic architecture studies of complex traits and diseases in Africa. For example, with greater genetic diversity comes greater ancestral heterogeneity; this higher level of understudied diversity can yield novel genetic findings, but some methods that assume homogeneous population structure and work well in European populations may work less well in the presence of greater heterogeneity in African populations. Consequently, we advocate for methodological development that will accelerate studies important for all populations, especially those currently underrepresented in genetics.
Collapse
Affiliation(s)
- Alicia R Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.
| | - Solomon Teferra
- Department of Psychiatry, School of Medicine, College of Health Sciences, Addis Ababa University, Addis Ababa, Ethiopia; Department of Epidemiology, Harvard T. H. Chan School of Public Health, Harvard University, Boston, USA
| | - Marlo Möller
- DST-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg, Cape Town, South Africa
| | - Eileen G Hoal
- DST-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg, Cape Town, South Africa
| | - Mark J Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| |
Collapse
|
4
|
Zhang H, Wheeler W, Song L, Yu K. Proper joint analysis of summary association statistics requires the adjustment of heterogeneity in SNP coverage pattern. Brief Bioinform 2018; 19:1337-1343. [PMID: 28981575 DOI: 10.1093/bib/bbx072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Indexed: 11/12/2022] Open
Abstract
As meta-analysis results published by consortia of genome-wide association studies (GWASs) become increasingly available, many association summary statistics-based multi-locus tests have been developed to jointly evaluate multiple single-nucleotide polymorphisms (SNPs) to reveal novel genetic architectures of various complex traits. The validity of these approaches relies on the accurate estimate of z-score correlations at considered SNPs, which in turn requires knowledge on the set of SNPs assessed by each study participating in the meta-analysis. However, this exact SNP coverage information is usually unavailable from the meta-analysis results published by GWAS consortia. In the absence of the coverage information, researchers typically estimate the z-score correlations by making oversimplified coverage assumptions. We show through real studies that such a practice can generate highly inflated type I errors, and we demonstrate the proper way to incorporate correct coverage information into multi-locus analyses. We advocate that consortia should make SNP coverage information available when posting their meta-analysis results, and that investigators who develop analytic tools for joint analyses based on summary data should pay attention to the variation in SNP coverage and adjust for it appropriately.
Collapse
Affiliation(s)
- Han Zhang
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, USA
| | | | - Lei Song
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., USA
| | - Kai Yu
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, USA
| |
Collapse
|
5
|
Abstract
During the past decade, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyse summary association statistics. Here, we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.
Collapse
|
6
|
Abstract
Coronary artery disease (or coronary heart disease), is the leading cause of mortality in many of the developing as well as the developed countries of the world. Cholesterol-enriched plaques in the heart's blood vessels combined with inflammation lead to the lesion expansion, narrowing of blood vessels, reduced blood flow, and may subsequently cause lesion rupture and a heart attack. Even though several environmental risk factors have been established, such as high LDL-cholesterol, diabetes, and high blood pressure, the underlying genetic composition may substantially modify the disease risk; hence, genome composition and gene-environment interactions may be critical for disease progression. Ongoing scientific efforts have seen substantial advancements related to the fields of genetics and genomics, with the major breakthroughs yet to come. As genomics is the most rapidly advancing field in the life sciences, it is important to present a comprehensive overview of current efforts. Here, we present a summary of various genetic and genomics assays and approaches applied to coronary artery disease research.
Collapse
Affiliation(s)
- Milos Pjanic
- Department of Medicine, Division of Cardiovascular Medicine, Cardiovascular Institute, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA, 94305-5233, USA
| | - Clint L Miller
- Department of Medicine, Division of Cardiovascular Medicine, Cardiovascular Institute, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA, 94305-5233, USA
| | - Robert Wirka
- Department of Medicine, Division of Cardiovascular Medicine, Cardiovascular Institute, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA, 94305-5233, USA
| | - Juyong B Kim
- Department of Medicine, Division of Cardiovascular Medicine, Cardiovascular Institute, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA, 94305-5233, USA
| | - Daniel M DiRenzo
- Department of Medicine, Division of Cardiovascular Medicine, Cardiovascular Institute, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA, 94305-5233, USA
| | - Thomas Quertermous
- Department of Medicine, Division of Cardiovascular Medicine, Cardiovascular Institute, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA, 94305-5233, USA.
| |
Collapse
|
7
|
Brown BC, Ye CJ, Price AL, Zaitlen N; Asian Genetic Epidemiology Network Type 2 Diabetes Consortium. Transethnic Genetic-Correlation Estimates from Summary Statistics. Am J Hum Genet 2016; 99:76-88. [PMID: 27321947 DOI: 10.1016/j.ajhg.2016.05.001] [Citation(s) in RCA: 168] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2015] [Accepted: 05/03/2016] [Indexed: 11/22/2022] Open
Abstract
The increasing number of genetic association studies conducted in multiple populations provides an unprecedented opportunity to study how the genetic architecture of complex phenotypes varies between populations, a problem important for both medical and population genetics. Here, we have developed a method for estimating the transethnic genetic correlation: the correlation of causal-variant effect sizes at SNPs common in populations. This methods takes advantage of the entire spectrum of SNP associations and uses only summary-level data from genome-wide association studies. This avoids the computational costs and privacy concerns associated with genotype-level information while remaining scalable to hundreds of thousands of individuals and millions of SNPs. We applied our method to data on gene expression, rheumatoid arthritis, and type 2 diabetes and overwhelmingly found that the genetic correlation was significantly less than 1. Our method is implemented in a Python package called Popcorn.
Collapse
|
8
|
Lee D, Williamson VS, Bigdeli TB, Riley BP, Webb BT, Fanous AH, Kendler KS, Vladimirov VI, Bacanu SA. JEPEGMIX: gene-level joint analysis of functional SNPs in cosmopolitan cohorts. Bioinformatics 2016; 32:295-7. [PMID: 26428293 PMCID: PMC4708106 DOI: 10.1093/bioinformatics/btv567] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Revised: 09/01/2015] [Accepted: 09/22/2015] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION To increase detection power, gene level analysis methods are used to aggregate weak signals. To greatly increase computational efficiency, most methods use as input summary statistics from genome-wide association studies (GWAS). Subsequently, gene statistics are constructed using linkage disequilibrium (LD) patterns from a relevant reference panel. However, all methods, including our own Joint Effect on Phenotype of eQTL/functional single nucleotide polymorphisms (SNPs) associated with a Gene (JEPEG), assume homogeneous panels, e.g. European. However, this renders these tools unsuitable for the analysis of large cosmopolitan cohorts. RESULTS We propose a JEPEG extension, JEPEGMIX, which similar to one of our software tools, Direct Imputation of summary STatistics of unmeasured SNPs from MIXed ethnicity cohorts, is capable of estimating accurate LD patterns for cosmopolitan cohorts. JEPEGMIX uses this accurate LD estimates to (i) impute the summary statistics at unmeasured functional variants and (ii) test for the joint effect of all measured and imputed functional variants which are associated with a gene. We illustrate the performance of our tool by analyzing the GWAS meta-analysis summary statistics from the multi-ethnic Psychiatric Genomics Consortium Schizophrenia stage 2 cohort. This practical application supports the immune system being one of the main drivers of the process leading to schizophrenia. AVAILABILITY AND IMPLEMENTATION Software, annotation database and examples are available at http://dleelab.github.io/jepegmix/. CONTACT donghyung.lee@vcuhealth.org SUPPLEMENTARY INFORMATION Supplementary material is available at Bioinformatics online.
Collapse
Affiliation(s)
- Donghyung Lee
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Vernell S Williamson
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - T Bernard Bigdeli
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Brien P Riley
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Bradley T Webb
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Ayman H Fanous
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Kenneth S Kendler
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA
| | | | - Silviu-Alin Bacanu
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA
| |
Collapse
|
9
|
Mefford JA, Zaitlen NA, Witte JS. Comment: A Human Genetics Perspective. J Am Stat Assoc 2016. [DOI: 10.1080/01621459.2016.1149404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|