1
|
Corut AK, Wallace JG. kGWASflow: a modular, flexible, and reproducible Snakemake workflow for k-mers-based GWAS. G3 (BETHESDA, MD.) 2023; 14:jkad246. [PMID: 37976215 PMCID: PMC10755180 DOI: 10.1093/g3journal/jkad246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 10/15/2023] [Indexed: 11/19/2023]
Abstract
Genome-wide association studies (GWAS) have been widely used to identify genetic variation associated with complex traits. Despite its success and popularity, the traditional GWAS approach comes with a variety of limitations. For this reason, newer methods for GWAS have been developed, including the use of pan-genomes instead of a reference genome and the utilization of markers beyond single-nucleotide polymorphisms, such as structural variations and k-mers. The k-mers-based GWAS approach has especially gained attention from researchers in recent years. However, these new methodologies can be complicated and challenging to implement. Here, we present kGWASflow, a modular, user-friendly, and scalable workflow to perform GWAS using k-mers. We adopted an existing kmersGWAS method into an easier and more accessible workflow using management tools like Snakemake and Conda and eliminated the challenges caused by missing dependencies and version conflicts. kGWASflow increases the reproducibility of the kmersGWAS method by automating each step with Snakemake and using containerization tools like Docker. The workflow encompasses supplemental components such as quality control, read-trimming procedures, and generating summary statistics. kGWASflow also offers post-GWAS analysis options to identify the genomic location and context of trait-associated k-mers. kGWASflow can be applied to any organism and requires minimal programming skills. kGWASflow is freely available on GitHub (https://github.com/akcorut/kGWASflow) and Bioconda (https://anaconda.org/bioconda/kgwasflow).
Collapse
Affiliation(s)
- Adnan Kivanc Corut
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Jason G Wallace
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
- Institute of Plant Breeding, Genetics, and Genomics, University of Georgia, Athens, GA 30602, USA
- Department of Crop and Soil Sciences, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
2
|
Wang QS, Huang H. Methods for statistical fine-mapping and their applications to auto-immune diseases. Semin Immunopathol 2022; 44:101-113. [PMID: 35041074 PMCID: PMC8837575 DOI: 10.1007/s00281-021-00902-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 10/22/2021] [Indexed: 01/07/2023]
Abstract
Although genome-wide association studies (GWAS) have identified thousands of loci in the human genome that are associated with different traits, understanding the biological mechanisms underlying the association signals identified in GWAS remains challenging. Statistical fine-mapping is a method aiming to refine GWAS signals by evaluating which variant(s) are truly causal to the phenotype. Here, we review the types of statistical fine-mapping methods that have been widely used to date, with a focus on recently developed functionally informed fine-mapping (FIFM) methods that utilize functional annotations. We then systematically review the applications of statistical fine-mapping in autoimmune disease studies to highlight the value of statistical fine-mapping in biological contexts.
Collapse
Affiliation(s)
- Qingbo S Wang
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Osaka, Japan.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Hailiang Huang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
3
|
LaPierre N, Taraszka K, Huang H, He R, Hormozdiari F, Eskin E. Identifying causal variants by fine mapping across multiple studies. PLoS Genet 2021; 17:e1009733. [PMID: 34543273 PMCID: PMC8491908 DOI: 10.1371/journal.pgen.1009733] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 10/05/2021] [Accepted: 07/21/2021] [Indexed: 11/18/2022] Open
Abstract
Increasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of "fine mapping" methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. We demonstrate the efficacy of MsCAVIAR in both a simulation study and a trans-ethnic, trans-biobank fine mapping analysis of High Density Lipoprotein (HDL).
Collapse
Affiliation(s)
- Nathan LaPierre
- Department of Computer Science, University of California, Los Angeles, California, United States
| | - Kodi Taraszka
- Department of Computer Science, University of California, Los Angeles, California, United States
| | - Helen Huang
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, California, United States
| | - Rosemary He
- Department of Mathematics, University of California, Los Angeles, California, United States
| | - Farhad Hormozdiari
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, California, United States
- Department of Human Genetics, University of California, Los Angeles, California, United States
- Department of Computational Medicine, University of California, Los Angeles, California, United States
| |
Collapse
|
4
|
Huang CT, Klos KE, Huang YF. Genome-Wide Association Study Reveals the Genetic Architecture of Seed Vigor in Oats. G3 (BETHESDA, MD.) 2020; 10:4489-4503. [PMID: 33028627 PMCID: PMC7718755 DOI: 10.1534/g3.120.401602] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 10/02/2020] [Indexed: 12/29/2022]
Abstract
Seed vigor is crucial for crop early establishment in the field and is particularly important for forage crop production. Oat (Avena sativa L.) is a nutritious food crop and also a valuable forage crop. However, little is known about the genetics of seed vigor in oats. To investigate seed vigor-related traits and their genetic architecture in oats, we developed an easy-to-implement image-based phenotyping pipeline and applied it to 650 elite oat lines from the Collaborative Oat Research Enterprise (CORE). Root number, root surface area, and shoot length were measured in two replicates. Variables such as growth rate were derived. Using a genome-wide association (GWA) approach, we identified 34 and 16 unique loci associated with root traits and shoot traits, respectively, which corresponded to 41 and 16 unique SNPs at a false discovery rate < 0.1. Nine root-associated loci were organized into four sets of homeologous regions, while nine shoot-associated loci were organized into three sets of homeologous regions. The context sequences of five trait-associated markers matched to the sequences of rice, Brachypodium and maize (E-value < 10-10), including three markers matched to known gene models with potential involvement in seed vigor. These were a glucuronosyltransferase, a mitochondrial carrier protein domain containing protein, and an iron-sulfur cluster protein. This study presents the first GWA study on oat seed vigor and data of this study can provide guidelines and foundation for further investigations.
Collapse
Affiliation(s)
- Ching-Ting Huang
- Department of Agronomy, National Taiwan University, Taipei, 10617, Taiwan
| | - Kathy Esvelt Klos
- Small Grains and Potato Germplasm Research, USDA, ARS, Aberdeen, ID 83210
| | - Yung-Fen Huang
- Department of Agronomy, National Taiwan University, Taipei, 10617, Taiwan
| |
Collapse
|
5
|
Ainsworth HC, Howard TD, Langefeld CD. Intrinsic DNA topology as a prioritization metric in genomic fine-mapping studies. Nucleic Acids Res 2020; 48:11304-11321. [PMID: 33084892 PMCID: PMC7672465 DOI: 10.1093/nar/gkaa877] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 08/23/2020] [Accepted: 09/25/2020] [Indexed: 12/15/2022] Open
Abstract
In genomic fine-mapping studies, some approaches leverage annotation data to prioritize likely functional polymorphisms. However, existing annotation resources can present challenges as many lack information for novel variants and/or may be uninformative for non-coding regions. We propose a novel annotation source, sequence-dependent DNA topology, as a prioritization metric for fine-mapping. DNA topology and function are well-intertwined, and as an intrinsic DNA property, it is readily applicable to any genomic region. Here, we constructed and applied Minor Groove Width (MGW) as a prioritization metric. Using an established MGW-prediction method, we generated a MGW census for 199 038 197 SNPs across the human genome. Summarizing a SNP's change in MGW (ΔMGW) as a Euclidean distance, ΔMGW exhibited a strongly right-skewed distribution, highlighting the infrequency of SNPs that generate dissimilar shape profiles. We hypothesized that phenotypically-associated SNPs can be prioritized by ΔMGW. We tested this hypothesis in 116 regions analyzed by a Massively Parallel Reporter Assay and observed enrichment of large ΔMGW for functional polymorphisms (P = 0.0007). To illustrate application in fine-mapping studies, we applied our MGW-prioritization approach to three non-coding regions associated with systemic lupus erythematosus. Together, this study presents the first usage of sequence-dependent DNA topology as a prioritization metric in genomic association studies.
Collapse
Affiliation(s)
- Hannah C Ainsworth
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
- Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Timothy D Howard
- Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Carl D Langefeld
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
- Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
- Comprehensive Cancer Center of Wake Forest Baptist Medical Center, Winston-Salem, NC 27157, USA
| |
Collapse
|
6
|
Sequence-based GWAS, network and pathway analyses reveal genes co-associated with milk cheese-making properties and milk composition in Montbéliarde cows. Genet Sel Evol 2019; 51:34. [PMID: 31262251 PMCID: PMC6604208 DOI: 10.1186/s12711-019-0473-7] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 06/07/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Milk quality in dairy cattle is routinely assessed via analysis of mid-infrared (MIR) spectra; this approach can also be used to predict the milk's cheese-making properties (CMP) and composition. When this method of high-throughput phenotyping is combined with efficient imputations of whole-genome sequence data from cows' genotyping data, it provides a unique and powerful framework with which to carry out genomic analyses. The goal of this study was to use this approach to identify genes and gene networks associated with milk CMP and composition in the Montbéliarde breed. RESULTS Milk cheese yields, coagulation traits, milk pH and contents of proteins, fatty acids, minerals, citrate, and lactose were predicted from MIR spectra. Thirty-six phenotypes from primiparous Montbéliarde cows (1,442,371 test-day records from 189,817 cows) were adjusted for non-genetic effects and averaged per cow. 50 K genotypes, which were available for a subset of 19,586 cows, were imputed at the sequence level using Run6 of the 1000 Bull Genomes Project (comprising 2333 animals). The individual effects of 8.5 million variants were evaluated in a genome-wide association study (GWAS) which led to the detection of 59 QTL regions, most of which had highly significant effects on CMP and milk composition. The results of the GWAS were further subjected to an association weight matrix and the partial correlation and information theory approach and we identified a set of 736 co-associated genes. Among these, the well-known caseins, PAEP and DGAT1, together with dozens of other genes such as SLC37A1, ALPL, MGST1, SEL1L3, GPT, BRI3BP, SCD, GPAT4, FASN, and ANKH, explained from 12 to 30% of the phenotypic variance of CMP traits. We were further able to identify metabolic pathways (e.g., phosphate and phospholipid metabolism and inorganic anion transport) and key regulator genes, such as PPARA, ASXL3, and bta-mir-200c that are functionally linked to milk composition. CONCLUSIONS By using an approach that integrated GWAS with network and pathway analyses at the whole-genome sequence level, we propose candidate variants that explain a substantial proportion of the phenotypic variance of CMP traits and could thus be included in genomic evaluation models to improve milk CMP in Montbéliarde cows.
Collapse
|
7
|
Abstract
DNA methylation plays an important role in the regulation of transcription. Genetic control of DNA methylation is a potential candidate for explaining the many identified SNP associations with disease that are not found in coding regions. We replicated 52,916 cis and 2,025 trans DNA methylation quantitative trait loci (mQTL) using methylation from whole blood measured on Illumina HumanMethylation450 arrays in the Brisbane Systems Genetics Study (n = 614 from 177 families) and the Lothian Birth Cohorts of 1921 and 1936 (combined n = 1366). The trans mQTL SNPs were found to be over-represented in 1 Mbp subtelomeric regions, and on chromosomes 16 and 19. There was a significant increase in trans mQTL DNA methylation sites in upstream and 5′ UTR regions. The genetic heritability of a number of complex traits and diseases was partitioned into components due to mQTL and the remainder of the genome. Significant enrichment was observed for height (p = 2.1 × 10−10), ulcerative colitis (p = 2 × 10−5), Crohn’s disease (p = 6 × 10−8) and coronary artery disease (p = 5.5 × 10−6) when compared to a random sample of SNPs with matched minor allele frequency, although this enrichment is explained by the genomic location of the mQTL SNPs.
Collapse
|
8
|
Espin-Garcia O, Craiu RV, Bull SB. Two-phase designs for joint quantitative-trait-dependent and genotype-dependent sampling in post-GWAS regional sequencing. Genet Epidemiol 2017; 42:104-116. [PMID: 29239496 PMCID: PMC5814750 DOI: 10.1002/gepi.22099] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Revised: 10/23/2017] [Accepted: 10/23/2017] [Indexed: 11/09/2022]
Abstract
We evaluate two‐phase designs to follow‐up findings from genome‐wide association study (GWAS) when the cost of regional sequencing in the entire cohort is prohibitive. We develop novel expectation‐maximization‐based inference under a semiparametric maximum likelihood formulation tailored for post‐GWAS inference. A GWAS‐SNP (where SNP is single nucleotide polymorphism) serves as a surrogate covariate in inferring association between a sequence variant and a normally distributed quantitative trait (QT). We assess test validity and quantify efficiency and power of joint QT‐SNP‐dependent sampling and analysis under alternative sample allocations by simulations. Joint allocation balanced on SNP genotype and extreme‐QT strata yields significant power improvements compared to marginal QT‐ or SNP‐based allocations. We illustrate the proposed method and evaluate the sensitivity of sample allocation to sampling variation using data from a sequencing study of systolic blood pressure.
Collapse
Affiliation(s)
- Osvaldo Espin-Garcia
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Radu V Craiu
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Shelley B Bull
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| |
Collapse
|
9
|
Lopdell TJ, Tiplady K, Struchalin M, Johnson TJJ, Keehan M, Sherlock R, Couldrey C, Davis SR, Snell RG, Spelman RJ, Littlejohn MD. DNA and RNA-sequence based GWAS highlights membrane-transport genes as key modulators of milk lactose content. BMC Genomics 2017; 18:968. [PMID: 29246110 PMCID: PMC5731188 DOI: 10.1186/s12864-017-4320-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 11/21/2017] [Indexed: 12/30/2022] Open
Abstract
Background Lactose provides an easily-digested energy source for neonates, and is the primary carbohydrate in milk in most species. Bovine lactose is also a key component of many human food products. However, compared to analyses of other milk components, the genetic control of lactose has been little studied. Here we present the first GWAS focussed on analysis of milk lactose traits. Results Using a discovery population of 12,000 taurine dairy cattle, we detail 27 QTL for lactose concentration and yield, and subsequently validate the effects of 26 of these loci in a distinct population of 18,000 cows. We next present data implicating causative genes and variants for these QTL. Fine mapping of these regions using imputed, whole genome sequence-resolution genotypes reveals protein-coding candidate causative variants affecting the ABCG2, DGAT1, STAT5B, KCNH4, NPFFR2 and RNF214 genes. Eleven of the remaining QTL appear to be driven by regulatory effects, suggested by the presence of co-locating, co-segregating eQTL discovered using mammary RNA sequence data from a population of 357 lactating cows. Pathway analysis of genes representing all lactose-associated loci shows significant enrichment of genes located in the endoplasmic reticulum, with functions related to ion channel activity mediated through the LRRC8C, P2RX4, KCNJ2 and ANKH genes. A number of the validated QTL are also found to be associated with additional milk volume, fat and protein phenotypes. Conclusions Overall, these findings highlight novel candidate genes and variants involved in milk lactose regulation, whose impacts on membrane transport mechanisms reinforce the key osmo-regulatory roles of lactose in milk. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4320-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Thomas J Lopdell
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Newstead, Hamilton, New Zealand.,School of Biological Sciences, University of Auckland, Symonds Street, Auckland, New Zealand
| | - Kathryn Tiplady
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Newstead, Hamilton, New Zealand
| | - Maksim Struchalin
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Newstead, Hamilton, New Zealand
| | - Thomas J J Johnson
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Newstead, Hamilton, New Zealand
| | - Michael Keehan
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Newstead, Hamilton, New Zealand
| | - Ric Sherlock
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Newstead, Hamilton, New Zealand
| | - Christine Couldrey
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Newstead, Hamilton, New Zealand
| | - Stephen R Davis
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Newstead, Hamilton, New Zealand
| | - Russell G Snell
- School of Biological Sciences, University of Auckland, Symonds Street, Auckland, New Zealand
| | - Richard J Spelman
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Newstead, Hamilton, New Zealand
| | - Mathew D Littlejohn
- Research and Development, Livestock Improvement Corporation, Ruakura Road, Newstead, Hamilton, New Zealand.
| |
Collapse
|
10
|
Bull SB, Andrulis IL, Paterson AD. Statistical challenges in high-dimensional molecular and genetic epidemiology. CAN J STAT 2017. [DOI: 10.1002/cjs.11342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Shelley B. Bull
- Lunenfeld-Tanenbaum Research Institute; Sinai Health System; Toronto Ontario, Canada M5T 3L9
- Dalla Lana School of Public Health; University of Toronto; Toronto, Ontario Canada M5T 3M7
| | - Irene L. Andrulis
- Lunenfeld-Tanenbaum Research Institute; Sinai Health System; Toronto Ontario, Canada M5T 3L9
- Department of Molecular Genetics; University of Toronto; Toronto, Ontario Canada M5S 1A8
| | - Andrew D. Paterson
- Dalla Lana School of Public Health; University of Toronto; Toronto, Ontario Canada M5T 3M7
- Genetics and Genome Biology Program; The Hospital for Sick Children; Toronto, Ontario Canada M5G 0A4
| |
Collapse
|
11
|
Grinde KE, Arbet J, Green A, O'Connell M, Valcarcel A, Westra J, Tintle N. Illustrating, Quantifying, and Correcting for Bias in Post-hoc Analysis of Gene-Based Rare Variant Tests of Association. Front Genet 2017; 8:117. [PMID: 28959274 PMCID: PMC5603735 DOI: 10.3389/fgene.2017.00117] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2017] [Accepted: 08/25/2017] [Indexed: 11/13/2022] Open
Abstract
To date, gene-based rare variant testing approaches have focused on aggregating information across sets of variants to maximize statistical power in identifying genes showing significant association with diseases. Beyond identifying genes that are associated with diseases, the identification of causal variant(s) in those genes and estimation of their effect is crucial for planning replication studies and characterizing the genetic architecture of the locus. However, we illustrate that straightforward single-marker association statistics can suffer from substantial bias introduced by conditioning on gene-based test significance, due to the phenomenon often referred to as "winner's curse." We illustrate the ramifications of this bias on variant effect size estimation and variant prioritization/ranking approaches, outline parameters of genetic architecture that affect this bias, and propose a bootstrap resampling method to correct for this bias. We find that our correction method significantly reduces the bias due to winner's curse (average two-fold decrease in bias, p < 2.2 × 10-6) and, consequently, substantially improves mean squared error and variant prioritization/ranking. The method is particularly helpful in adjustment for winner's curse effects when the initial gene-based test has low power and for relatively more common, non-causal variants. Adjustment for winner's curse is recommended for all post-hoc estimation and ranking of variants after a gene-based test. Further work is necessary to continue seeking ways to reduce bias and improve inference in post-hoc analysis of gene-based tests under a wide variety of genetic architectures.
Collapse
Affiliation(s)
- Kelsey E Grinde
- Department of Biostatistics, University of WashingtonSeattle, WA, United States
| | - Jaron Arbet
- Department of Biostatistics, University of MinnesotaMinneapolis, MN, United States
| | - Alden Green
- Department of Statistics, Carnegie Mellon UniversityPittsburgh, PA, United States
| | - Michael O'Connell
- Department of Biostatistics, University of MinnesotaMinneapolis, MN, United States
| | - Alessandra Valcarcel
- Department of Biostatistics and Epidemiology, University of PennsylvaniaPhiladelphia, PA, United States
| | - Jason Westra
- Department of Statistics, Iowa State UniversityAmes, IA, United States.,Department of Mathematics, Statistics, and Computer Science, Dordt CollegeSioux Center, IA, United States
| | - Nathan Tintle
- Department of Mathematics, Statistics, and Computer Science, Dordt CollegeSioux Center, IA, United States
| |
Collapse
|
12
|
Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat Genet 2017; 49:1421-1427. [PMID: 28892061 DOI: 10.1038/ng.3954] [Citation(s) in RCA: 322] [Impact Index Per Article: 40.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Accepted: 08/16/2017] [Indexed: 12/14/2022]
Abstract
Recent work has hinted at the linkage disequilibrium (LD)-dependent architecture of human complex traits, where SNPs with low levels of LD (LLD) have larger per-SNP heritability. Here we analyzed summary statistics from 56 complex traits (average N = 101,401) by extending stratified LD score regression to continuous annotations. We determined that SNPs with low LLD have significantly larger per-SNP heritability and that roughly half of this effect can be explained by functional annotations negatively correlated with LLD, such as DNase I hypersensitivity sites (DHSs). The remaining signal is largely driven by our finding that more recent common variants tend to have lower LLD and to explain more heritability (P = 2.38 × 10-104); the youngest 20% of common SNPs explain 3.9 times more heritability than the oldest 20%, consistent with the action of negative selection. We also inferred jointly significant effects of other LD-related annotations and confirmed via forward simulations that they jointly predict deleterious effects.
Collapse
|
13
|
Pasaniuc B, Price AL. Dissecting the genetics of complex traits using summary association statistics. Nat Rev Genet 2017; 18:117-127. [PMID: 27840428 PMCID: PMC5449190 DOI: 10.1038/nrg.2016.142] [Citation(s) in RCA: 276] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
During the past decade, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyse summary association statistics. Here, we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.
Collapse
Affiliation(s)
- Bogdan Pasaniuc
- Departments of Human Genetics, and Pathology and Laboratory Medicine, University of California, Los Angeles, California 90095, USA
| | - Alkes L Price
- Departments of Epidemiology and Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts 02142, USA
| |
Collapse
|
14
|
Sequence-based Association Analysis Reveals an MGST1 eQTL with Pleiotropic Effects on Bovine Milk Composition. Sci Rep 2016; 6:25376. [PMID: 27146958 PMCID: PMC4857175 DOI: 10.1038/srep25376] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 04/15/2016] [Indexed: 11/08/2022] Open
Abstract
The mammary gland is a prolific lipogenic organ, synthesising copious amounts of triglycerides for secretion into milk. The fat content of milk varies widely both between and within species, and recent independent genome-wide association studies have highlighted a milk fat percentage quantitative trait locus (QTL) of large effect on bovine chromosome 5. Although both EPS8 and MGST1 have been proposed to underlie these signals, the causative status of these genes has not been functionally confirmed. To investigate this QTL in detail, we report genome sequence-based imputation and association mapping in a population of 64,244 taurine cattle. This analysis reveals a cluster of 17 non-coding variants spanning MGST1 that are highly associated with milk fat percentage, and a range of other milk composition traits. Further, we exploit a high-depth mammary RNA sequence dataset to conduct expression QTL (eQTL) mapping in 375 lactating cows, revealing a strong MGST1 eQTL underpinning these effects. These data demonstrate the utility of DNA and RNA sequence-based association mapping, and implicate MGST1, a gene with no obvious mechanistic relationship to milk composition regulation, as causally involved in these processes.
Collapse
|
15
|
Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat Methods 2016; 13:366-70. [PMID: 26950747 DOI: 10.1038/nmeth.3799] [Citation(s) in RCA: 209] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Accepted: 01/26/2016] [Indexed: 12/22/2022]
Abstract
Mapping perturbed molecular circuits that underlie complex diseases remains a great challenge. We developed a comprehensive resource of 394 cell type- and tissue-specific gene regulatory networks for human, each specifying the genome-wide connectivity among transcription factors, enhancers, promoters and genes. Integration with 37 genome-wide association studies (GWASs) showed that disease-associated genetic variants--including variants that do not reach genome-wide significance--often perturb regulatory modules that are highly specific to disease-relevant cell types or tissues. Our resource opens the door to systematic analysis of regulatory programs across hundreds of human cell types and tissues (http://regulatorycircuits.org).
Collapse
|
16
|
Stell L, Sabatti C. Genetic Variant Selection: Learning Across Traits and Sites. Genetics 2016; 202:439-55. [PMID: 26680660 PMCID: PMC4788227 DOI: 10.1534/genetics.115.184572] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 11/30/2015] [Indexed: 11/18/2022] Open
Abstract
We consider resequencing studies of associated loci and the problem of prioritizing sequence variants for functional follow-up. Working within the multivariate linear regression framework helps us to account for the joint effects of multiple genes; and adopting a Bayesian approach leads to posterior probabilities that coherently incorporate all information about the variants' function. We describe two novel prior distributions that facilitate learning the role of each variable site by borrowing evidence across phenotypes and across mutations in the same gene. We illustrate their potential advantages with simulations and reanalyzing a data set of sequencing variants.
Collapse
Affiliation(s)
- Laurel Stell
- Department of Health Research and Policy, Stanford University, Stanford, California 94305
| | - Chiara Sabatti
- Department of Health Research and Policy, Stanford University, Stanford, California 94305 Department of Statistics, Stanford University, Stanford, California 94305
| |
Collapse
|
17
|
Li MJ, Liu Z, Wang P, Wong MP, Nelson MR, Kocher JPA, Yeager M, Sham PC, Chanock SJ, Xia Z, Wang J. GWASdb v2: an update database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res 2015; 44:D869-76. [PMID: 26615194 PMCID: PMC4702921 DOI: 10.1093/nar/gkv1317] [Citation(s) in RCA: 151] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 11/10/2015] [Indexed: 12/19/2022] Open
Abstract
Genome-wide association studies (GWASs), now as a routine approach to study single-nucleotide polymorphism (SNP)-trait association, have uncovered over ten thousand significant trait/disease associated SNPs (TASs). Here, we updated GWASdb (GWASdb v2, http://jjwanglab.org/gwasdb) which provides comprehensive data curation and knowledge integration for GWAS TASs. These updates include: (i) Up to August 2015, we collected 2479 unique publications from PubMed and other resources; (ii) We further curated moderate SNP-trait associations (P-value < 1.0×10−3) from each original publication, and generated a total of 252 530 unique TASs in all GWASdb v2 collected studies; (iii) We manually mapped 1610 GWAS traits to 501 Human Phenotype Ontology (HPO) terms, 435 Disease Ontology (DO) terms and 228 Disease Ontology Lite (DOLite) terms. For each ontology term, we also predicted the putative causal genes; (iv) We curated the detailed sub-populations and related sample size for each study; (v) Importantly, we performed extensive function annotation for each TAS by incorporating gene-based information, ENCODE ChIP-seq assays, eQTL, population haplotype, functional prediction across multiple biological domains, evolutionary signals and disease-related annotation; (vi) Additionally, we compiled a SNP-drug response association dataset for 650 pharmacogenetic studies involving 257 drugs in this update; (vii) Last, we improved the user interface of website.
Collapse
Affiliation(s)
- Mulin Jun Li
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Zipeng Liu
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China Department of Anaesthesiology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Panwen Wang
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Maria P Wong
- Department of Pathology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Matthew R Nelson
- Quantitative Sciences, GlaxoSmithKline, Research Triangle Park, NC, USA
| | - Jean-Pierre A Kocher
- Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN, USA
| | - Meredith Yeager
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Pak Chung Sham
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China State Key Laboratory of Brain and Cognitive Sciences and Department of Psychiatry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Stephen J Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Zhengyuan Xia
- Department of Anaesthesiology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Junwen Wang
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
18
|
Poirier JG, Faye LL, Dimitromanolakis A, Paterson AD, Sun L, Bull SB. Resampling to Address the Winner's Curse in Genetic Association Analysis of Time to Event. Genet Epidemiol 2015; 39:518-28. [PMID: 26411674 PMCID: PMC4609263 DOI: 10.1002/gepi.21920] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2014] [Revised: 06/10/2015] [Accepted: 07/17/2015] [Indexed: 01/27/2023]
Abstract
The “winner's curse” is a subtle and difficult problem in interpretation of genetic association, in which association estimates from large‐scale gene detection studies are larger in magnitude than those from subsequent replication studies. This is practically important because use of a biased estimate from the original study will yield an underestimate of sample size requirements for replication, leaving the investigators with an underpowered study. Motivated by investigation of the genetics of type 1 diabetes complications in a longitudinal cohort of participants in the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications (DCCT/EDIC) Genetics Study, we apply a bootstrap resampling method in analysis of time to nephropathy under a Cox proportional hazards model, examining 1,213 single‐nucleotide polymorphisms (SNPs) in 201 candidate genes custom genotyped in 1,361 white probands. Among 15 top‐ranked SNPs, bias reduction in log hazard ratio estimates ranges from 43.1% to 80.5%. In simulation studies based on the observed DCCT/EDIC genotype data, genome‐wide bootstrap estimates for false‐positive SNPs and for true‐positive SNPs with low‐to‐moderate power are closer to the true values than uncorrected naïve estimates, but tend to overcorrect SNPs with high power. This bias‐reduction technique is generally applicable for complex trait studies including quantitative, binary, and time‐to‐event traits.
Collapse
Affiliation(s)
- Julia G Poirier
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Canada
| | - Laura L Faye
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Canada.,Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | | | - Andrew D Paterson
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada.,Hospital for Sick Children Research Institute, Toronto, Canada
| | - Lei Sun
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada.,Department of Statistical Sciences, University of Toronto, Toronto, Canada
| | - Shelley B Bull
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Canada.,Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| |
Collapse
|
19
|
Grünewald TGP, Bernard V, Gilardi-Hebenstreit P, Raynal V, Surdez D, Aynaud MM, Mirabeau O, Cidre-Aranaz F, Tirode F, Zaidi S, Perot G, Jonker AH, Lucchesi C, Le Deley MC, Oberlin O, Marec-Bérard P, Véron AS, Reynaud S, Lapouble E, Boeva V, Rio Frio T, Alonso J, Bhatia S, Pierron G, Cancel-Tassin G, Cussenot O, Cox DG, Morton LM, Machiela MJ, Chanock SJ, Charnay P, Delattre O. Chimeric EWSR1-FLI1 regulates the Ewing sarcoma susceptibility gene EGR2 via a GGAA microsatellite. Nat Genet 2015. [PMID: 26214589 DOI: 10.1038/ng.3363] [Citation(s) in RCA: 126] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Deciphering the ways in which somatic mutations and germline susceptibility variants cooperate to promote cancer is challenging. Ewing sarcoma is characterized by fusions between EWSR1 and members of the ETS gene family, usually EWSR1-FLI1, leading to the generation of oncogenic transcription factors that bind DNA at GGAA motifs. A recent genome-wide association study identified susceptibility variants near EGR2. Here we found that EGR2 knockdown inhibited proliferation, clonogenicity and spheroidal growth in vitro and induced regression of Ewing sarcoma xenografts. Targeted germline deep sequencing of the EGR2 locus in affected subjects and controls identified 291 Ewing-associated SNPs. At rs79965208, the A risk allele connected adjacent GGAA repeats by converting an interspaced GGAT motif into a GGAA motif, thereby increasing the number of consecutive GGAA motifs and thus the EWSR1-FLI1-dependent enhancer activity of this sequence, with epigenetic characteristics of an active regulatory element. EWSR1-FLI1 preferentially bound to the A risk allele, which increased global and allele-specific EGR2 expression. Collectively, our findings establish cooperation between a dominant oncogene and a susceptibility variant that regulates a major driver of Ewing sarcomagenesis.
Collapse
Affiliation(s)
- Thomas G P Grünewald
- Genetics and Biology of Cancers Unit, Institut Curie, PSL Research University, Paris, France.,INSERM U830, Institut Curie Research Center, Paris, France
| | - Virginie Bernard
- Institut Curie Genomics of Excellence (ICGex) Platform, Institut Curie Research Center, Paris, France
| | - Pascale Gilardi-Hebenstreit
- École Normale Supérieure (ENS), Institut de Biologie de l'ENS (IBENS), INSERM U1024, CNRS UMR8197, Paris, France
| | - Virginie Raynal
- Genetics and Biology of Cancers Unit, Institut Curie, PSL Research University, Paris, France.,INSERM U830, Institut Curie Research Center, Paris, France.,Institut Curie Genomics of Excellence (ICGex) Platform, Institut Curie Research Center, Paris, France
| | - Didier Surdez
- Genetics and Biology of Cancers Unit, Institut Curie, PSL Research University, Paris, France.,INSERM U830, Institut Curie Research Center, Paris, France
| | - Marie-Ming Aynaud
- Genetics and Biology of Cancers Unit, Institut Curie, PSL Research University, Paris, France.,INSERM U830, Institut Curie Research Center, Paris, France
| | - Olivier Mirabeau
- Genetics and Biology of Cancers Unit, Institut Curie, PSL Research University, Paris, France.,INSERM U830, Institut Curie Research Center, Paris, France
| | - Florencia Cidre-Aranaz
- Instituto de Investigación de Enfermedades Raras, Instituto de Salud Carlos III, Madrid, Spain
| | - Franck Tirode
- Genetics and Biology of Cancers Unit, Institut Curie, PSL Research University, Paris, France.,INSERM U830, Institut Curie Research Center, Paris, France
| | - Sakina Zaidi
- Genetics and Biology of Cancers Unit, Institut Curie, PSL Research University, Paris, France.,INSERM U830, Institut Curie Research Center, Paris, France
| | - Gaëlle Perot
- INSERM U916 Biology of Sarcomas, Institut Bergonié, Bordeaux, France
| | - Anneliene H Jonker
- Genetics and Biology of Cancers Unit, Institut Curie, PSL Research University, Paris, France.,INSERM U830, Institut Curie Research Center, Paris, France
| | - Carlo Lucchesi
- Genetics and Biology of Cancers Unit, Institut Curie, PSL Research University, Paris, France.,INSERM U830, Institut Curie Research Center, Paris, France
| | - Marie-Cécile Le Deley
- Département d'Epidémiologie et de Biostatistiques, Institut Gustave Roussy, Villejuif, France
| | - Odile Oberlin
- Département de Pédiatrie, Institut Gustave Roussy, Villejuif, France
| | - Perrine Marec-Bérard
- Institute for Pediatric Hematology and Oncology, Leon-Bérard Cancer Center, University of Lyon, Lyon, France
| | - Amélie S Véron
- INSERM U1052, Léon-Bérard Cancer Centre, Cancer Research Center of Lyon, Lyon, France
| | - Stephanie Reynaud
- Unité Génétique Somatique (UGS), Institut Curie Centre Hospitalier, Paris, France
| | - Eve Lapouble
- Unité Génétique Somatique (UGS), Institut Curie Centre Hospitalier, Paris, France
| | - Valentina Boeva
- INSERM U900, Bioinformatics, Biostatistics, Epidemiology and Computational Systems Biology of Cancer, Institut Curie Research Center, Paris, France.,Mines ParisTech, Fontainebleau, France
| | - Thomas Rio Frio
- Institut Curie Genomics of Excellence (ICGex) Platform, Institut Curie Research Center, Paris, France
| | - Javier Alonso
- Instituto de Investigación de Enfermedades Raras, Instituto de Salud Carlos III, Madrid, Spain
| | - Smita Bhatia
- Institute for Cancer Outcomes and Survivorship, School of Medicine, University of Alabama, Birmingham, Alabama, USA
| | - Gaëlle Pierron
- Unité Génétique Somatique (UGS), Institut Curie Centre Hospitalier, Paris, France
| | - Geraldine Cancel-Tassin
- Centre de Recherche sur les Pathologies Prostatiques (CeRePP)-Laboratory for Urology, Research Team 2, UPMC, Hôpital Tenon, Paris, France
| | - Olivier Cussenot
- Centre de Recherche sur les Pathologies Prostatiques (CeRePP)-Laboratory for Urology, Research Team 2, UPMC, Hôpital Tenon, Paris, France
| | - David G Cox
- INSERM U1052, Léon-Bérard Cancer Centre, Cancer Research Center of Lyon, Lyon, France
| | - Lindsay M Morton
- Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute (NCI), Bethesda, Maryland, USA
| | - Mitchell J Machiela
- Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute (NCI), Bethesda, Maryland, USA
| | - Stephen J Chanock
- Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute (NCI), Bethesda, Maryland, USA
| | - Patrick Charnay
- École Normale Supérieure (ENS), Institut de Biologie de l'ENS (IBENS), INSERM U1024, CNRS UMR8197, Paris, France
| | - Olivier Delattre
- Genetics and Biology of Cancers Unit, Institut Curie, PSL Research University, Paris, France.,INSERM U830, Institut Curie Research Center, Paris, France.,Institut Curie Genomics of Excellence (ICGex) Platform, Institut Curie Research Center, Paris, France.,Unité Génétique Somatique (UGS), Institut Curie Centre Hospitalier, Paris, France
| |
Collapse
|
20
|
Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics. Genetics 2015; 200:719-36. [PMID: 25948564 DOI: 10.1534/genetics.115.176107] [Citation(s) in RCA: 155] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Accepted: 05/04/2015] [Indexed: 01/08/2023] Open
Abstract
Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf.
Collapse
|
21
|
del Rosario RCH, Poschmann J, Rouam SL, Png E, Khor CC, Hibberd ML, Prabhakar S. Sensitive detection of chromatin-altering polymorphisms reveals autoimmune disease mechanisms. Nat Methods 2015; 12:458-64. [PMID: 25799442 DOI: 10.1038/nmeth.3326] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2014] [Accepted: 02/06/2015] [Indexed: 12/30/2022]
Abstract
Most disease associations detected by genome-wide association studies (GWAS) lie outside coding genes, but very few have been mapped to causal regulatory variants. Here, we present a method for detecting regulatory quantitative trait loci (QTLs) that does not require genotyping or whole-genome sequencing. The method combines deep, long-read chromatin immunoprecipitation-sequencing (ChIP-seq) with a statistical test that simultaneously scores peak height correlation and allelic imbalance: the genotype-independent signal correlation and imbalance (G-SCI) test. We performed histone acetylation ChIP-seq on 57 human lymphoblastoid cell lines and used the resulting reads to call 500,066 single-nucleotide polymorphisms de novo within regulatory elements. The G-SCI test annotated 8,764 of these as histone acetylation QTLs (haQTLs)—an order of magnitude larger than the set of candidates detected by expression QTL analysis. Lymphoblastoid haQTLs were highly predictive of autoimmune disease mechanisms. Thus, our method facilitates large-scale regulatory variant detection in any moderately sized cohort for which functional profiling data can be generated, thereby simplifying identification of causal variants within GWAS loci.
Collapse
Affiliation(s)
| | - Jeremie Poschmann
- Computational and Systems Biology Group, Genome Institute of Singapore, Singapore
| | - Sigrid Laure Rouam
- Computational and Systems Biology Group, Genome Institute of Singapore, Singapore
| | - Eileen Png
- Infectious Diseases Group, Genome Institute of Singapore, Singapore
| | - Chiea Chuen Khor
- 1] Human Genetics Group, Genome Institute of Singapore, Singapore. [2] Singapore Eye Research Institute, Singapore. [3] Department of Opthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Martin Lloyd Hibberd
- 1] Infectious Diseases Group, Genome Institute of Singapore, Singapore. [2] Department of Pathogen Molecular Biology, London School of Hygiene &Tropical Medicine, London, UK
| | - Shyam Prabhakar
- Computational and Systems Biology Group, Genome Institute of Singapore, Singapore
| |
Collapse
|
22
|
Vrijenhoek T, Kraaijeveld K, Elferink M, de Ligt J, Kranendonk E, Santen G, Nijman IJ, Butler D, Claes G, Costessi A, Dorlijn W, van Eyndhoven W, Halley DJJ, van den Hout MCGN, van Hove S, Johansson LF, Jongbloed JDH, Kamps R, Kockx CEM, de Koning B, Kriek M, Lekanne Dit Deprez R, Lunstroo H, Mannens M, Mook OR, Nelen M, Ploem C, Rijnen M, Saris JJ, Sinke R, Sistermans E, van Slegtenhorst M, Sleutels F, van der Stoep N, van Tienhoven M, Vermaat M, Vogel M, Waisfisz Q, Marjan Weiss J, van den Wijngaard A, van Workum W, Ijntema H, van der Zwaag B, van IJcken WFJ, den Dunnen J, Veltman JA, Hennekam R, Cuppen E. Next-generation sequencing-based genome diagnostics across clinical genetics centers: implementation choices and their effects. Eur J Hum Genet 2015; 23:1142-50. [PMID: 25626705 PMCID: PMC4538197 DOI: 10.1038/ejhg.2014.279] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Revised: 11/26/2014] [Accepted: 11/28/2014] [Indexed: 12/30/2022] Open
Abstract
Implementation of next-generation DNA sequencing (NGS) technology into routine diagnostic genome care requires strategic choices. Instead of theoretical discussions on the consequences of such choices, we compared NGS-based diagnostic practices in eight clinical genetic centers in the Netherlands, based on genetic testing of nine pre-selected patients with cardiomyopathy. We highlight critical implementation choices, including the specific contributions of laboratory and medical specialists, bioinformaticians and researchers to diagnostic genome care, and how these affect interpretation and reporting of variants. Reported pathogenic mutations were consistent for all but one patient. Of the two centers that were inconsistent in their diagnosis, one reported to have found 'no causal variant', thereby underdiagnosing this patient. The other provided an alternative diagnosis, identifying another variant as causal than the other centers. Ethical and legal analysis showed that informed consent procedures in all centers were generally adequate for diagnostic NGS applications that target a limited set of genes, but not for exome- and genome-based diagnosis. We propose changes to further improve and align these procedures, taking into account the blurring boundary between diagnostics and research, and specific counseling options for exome- and genome-based diagnostics. We conclude that alternative diagnoses may infer a certain level of 'greediness' to come to a positive diagnosis in interpreting sequencing results. Moreover, there is an increasing interdependence of clinic, diagnostics and research departments for comprehensive diagnostic genome care. Therefore, we invite clinical geneticists, physicians, researchers, bioinformatics experts and patients to reconsider their role and position in future diagnostic genome care.
Collapse
Affiliation(s)
- Terry Vrijenhoek
- Department of Medical Genetics, Centre for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Ken Kraaijeveld
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Martin Elferink
- Department of Medical Genetics, Centre for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Joep de Ligt
- Department of Human Genetics, Radboud University Nijmegen Medical Center, Nijmegen, The Netherlands
| | - Elcke Kranendonk
- Department of Public Health, Academic Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
| | - Gijs Santen
- Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Isaac J Nijman
- Department of Medical Genetics, Centre for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| | | | - Godelieve Claes
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands
| | | | - Wim Dorlijn
- Agilent Technologies Netherlands B.V., Amstelveen, The Netherlands
| | | | - Dicky J J Halley
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, The Netherlands
| | | | | | - Lennart F Johansson
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Jan D H Jongbloed
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Rick Kamps
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Christel E M Kockx
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Bart de Koning
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Marjolein Kriek
- Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Ronald Lekanne Dit Deprez
- Department of Human Genetics, Academic Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
| | | | - Marcel Mannens
- Department of Human Genetics, Academic Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
| | - Olaf R Mook
- Department of Human Genetics, Academic Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
| | - Marcel Nelen
- Department of Human Genetics, Radboud University Nijmegen Medical Center, Nijmegen, The Netherlands
| | - Corrette Ploem
- Department of Public Health, Academic Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
| | - Marco Rijnen
- Life Technologies Europe B.V., Bleiswijk, The Netherlands
| | - Jasper J Saris
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Richard Sinke
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Erik Sistermans
- Department of Clinical Genetics, VU University Medical Center, Amsterdam, The Netherlands
| | | | - Frank Sleutels
- Center for Biomics, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Nienke van der Stoep
- Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | | | - Martijn Vermaat
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Maartje Vogel
- Department of Medical Genetics, Centre for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Quinten Waisfisz
- Department of Clinical Genetics, VU University Medical Center, Amsterdam, The Netherlands
| | - Janneke Marjan Weiss
- Department of Clinical Genetics, VU University Medical Center, Amsterdam, The Netherlands
| | - Arthur van den Wijngaard
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands
| | | | - Helger Ijntema
- Department of Human Genetics, Radboud University Nijmegen Medical Center, Nijmegen, The Netherlands
| | - Bert van der Zwaag
- Department of Medical Genetics, Centre for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| | | | - Johan den Dunnen
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Joris A Veltman
- Department of Human Genetics, Radboud University Nijmegen Medical Center, Nijmegen, The Netherlands
| | - Raoul Hennekam
- 1] Department of Human Genetics, Academic Medical Centre, University of Amsterdam, Amsterdam, The Netherlands [2] Department of Pediatrics, Academic Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
| | - Edwin Cuppen
- Department of Medical Genetics, Centre for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
23
|
Roux PF, Marthey S, Djari A, Moroldo M, Esquerré D, Estellé J, Klopp C, Lagarrigue S, Demeure O. Comparison of whole-genome (13X) and capture (87X) resequencing methods for SNP and genotype callings. Anim Genet 2014; 46:82-6. [PMID: 25515399 DOI: 10.1111/age.12248] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/03/2014] [Indexed: 12/30/2022]
Abstract
The number of polymorphisms identified with next-generation sequencing approaches depends directly on the sequencing depth and therefore on the experimental cost. Although higher levels of depth ensure more sensitive and more specific SNP calls, economic constraints limit the increase of depth for whole-genome resequencing (WGS). For this reason, capture resequencing is used for studies focusing on only some specific regions of the genome. However, several biases in capture resequencing are known to have a negative impact on the sensitivity of SNP detection. Within this framework, the aim of this study was to compare the accuracy of WGS and capture resequencing on SNP detection and genotype calling, which differ in terms of both sequencing depth and biases. Indeed, we have evaluated the SNP calling and genotyping accuracy in a WGS dataset (13X) and in a capture resequencing dataset (87X) performed on 11 individuals. The percentage of SNPs not identified due to a sevenfold sequencing depth decrease was estimated at 7.8% using a down-sampling procedure on the capture sequencing dataset. A comparison of the 87X capture sequencing dataset with the WGS dataset revealed that capture-related biases were leading with the loss of 5.2% of SNPs detected with WGS. Nevertheless, when considering the SNPs detected by both approaches, capture sequencing appears to achieve far better SNP genotyping, with about 4.4% of the WGS genotypes that can be considered as erroneous and even 10% focusing on heterozygous genotypes. In conclusion, WGS and capture deep sequencing can be considered equivalent strategies for SNP detection, as the rate of SNPs not identified because of a low sequencing depth in the former is quite similar to SNPs missed because of method biases of the latter. On the other hand, capture deep sequencing clearly appears more adapted for studies requiring great accuracy in genotyping.
Collapse
Affiliation(s)
- P F Roux
- INRA, UMR1348 PEGASE, Saint-Gilles, F-35590, France; Agrocampus Ouest, UMR1348 PEGASE, Rennes, F-35000, France; Université Européenne de Bretagne, Rennes, France
| | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Hormozdiari F, Kostem E, Kang EY, Pasaniuc B, Eskin E. Identifying causal variants at loci with multiple signals of association. Genetics 2014; 198:497-508. [PMID: 25104515 PMCID: PMC4196608 DOI: 10.1534/genetics.114.167908] [Citation(s) in RCA: 302] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2014] [Accepted: 07/18/2014] [Indexed: 12/22/2022] Open
Abstract
Although genome-wide association studies have successfully identified thousands of risk loci for complex traits, only a handful of the biologically causal variants, responsible for association at these loci, have been successfully identified. Current statistical methods for identifying causal variants at risk loci either use the strength of the association signal in an iterative conditioning framework or estimate probabilities for variants to be causal. A main drawback of existing methods is that they rely on the simplifying assumption of a single causal variant at each risk locus, which is typically invalid at many risk loci. In this work, we propose a new statistical framework that allows for the possibility of an arbitrary number of causal variants when estimating the posterior probability of a variant being causal. A direct benefit of our approach is that we predict a set of variants for each locus that under reasonable assumptions will contain all of the true causal variants with a high confidence level (e.g., 95%) even when the locus contains multiple causal variants. We use simulations to show that our approach provides 20-50% improvement in our ability to identify the causal variants compared to the existing methods at loci harboring multiple causal variants. We validate our approach using empirical data from an expression QTL study of CHI3L2 to identify new causal variants that affect gene expression at this locus. CAVIAR is publicly available online at http://genetics.cs.ucla.edu/caviar/.
Collapse
Affiliation(s)
- Farhad Hormozdiari
- Department of Computer Science, University of California, Los Angeles, California 90095
| | - Emrah Kostem
- Department of Computer Science, University of California, Los Angeles, California 90095
| | - Eun Yong Kang
- Department of Computer Science, University of California, Los Angeles, California 90095
| | - Bogdan Pasaniuc
- Department of Human Genetics, University of California, Los Angeles, California 90095 Department of Pathology and Laboratory Medicine, University of California, Los Angeles, California 90095
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, California 90095 Department of Human Genetics, University of California, Los Angeles, California 90095
| |
Collapse
|
25
|
Kichaev G, Yang WY, Lindstrom S, Hormozdiari F, Eskin E, Price AL, Kraft P, Pasaniuc B. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet 2014; 10:e1004722. [PMID: 25357204 PMCID: PMC4214605 DOI: 10.1371/journal.pgen.1004722] [Citation(s) in RCA: 363] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2014] [Accepted: 09/01/2014] [Indexed: 11/18/2022] Open
Abstract
Standard statistical approaches for prioritization of variants for functional testing in fine-mapping studies either use marginal association statistics or estimate posterior probabilities for variants to be causal under simplifying assumptions. Here, we present a probabilistic framework that integrates association strength with functional genomic annotation data to improve accuracy in selecting plausible causal variants for functional validation. A key feature of our approach is that it empirically estimates the contribution of each functional annotation to the trait of interest directly from summary association statistics while allowing for multiple causal variants at any risk locus. We devise efficient algorithms that estimate the parameters of our model across all risk loci to further increase performance. Using simulations starting from the 1000 Genomes data, we find that our framework consistently outperforms the current state-of-the-art fine-mapping methods, reducing the number of variants that need to be selected to capture 90% of the causal variants from an average of 13.3 to 10.4 SNPs per locus (as compared to the next-best performing strategy). Furthermore, we introduce a cost-to-benefit optimization framework for determining the number of variants to be followed up in functional assays and assess its performance using real and simulation data. We validate our findings using a large scale meta-analysis of four blood lipids traits and find that the relative probability for causality is increased for variants in exons and transcription start sites and decreased in repressed genomic regions at the risk loci of these traits. Using these highly predictive, trait-specific functional annotations, we estimate causality probabilities across all traits and variants, reducing the size of the 90% confidence set from an average of 17.5 to 13.5 variants per locus in this data.
Collapse
Affiliation(s)
- Gleb Kichaev
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, California, United States of America
| | - Wen-Yun Yang
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
| | - Sara Lindstrom
- Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Farhad Hormozdiari
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
| | - Eleazar Eskin
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
| | - Alkes L. Price
- Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Peter Kraft
- Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
| |
Collapse
|
26
|
Chen Z, Craiu RV, Bull SB. A note on the efficiencies of sampling strategies in two-stage Bayesian regional fine mapping of a quantitative trait. Genet Epidemiol 2014; 38:599-609. [PMID: 25132153 DOI: 10.1002/gepi.21845] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2013] [Revised: 06/12/2014] [Accepted: 06/16/2014] [Indexed: 11/09/2022]
Abstract
In focused studies designed to follow up associations detected in a genome-wide association study (GWAS), investigators can proceed to fine-map a genomic region by targeted sequencing or dense genotyping of all variants in the region, aiming to identify a functional sequence variant. For the analysis of a quantitative trait, we consider a Bayesian approach to fine-mapping study design that incorporates stratification according to a promising GWAS tag SNP in the same region. Improved cost-efficiency can be achieved when the fine-mapping phase incorporates a two-stage design, with identification of a smaller set of more promising variants in a subsample taken in stage 1, followed by their evaluation in an independent stage 2 subsample. To avoid the potential negative impact of genetic model misspecification on inference we incorporate genetic model selection based on posterior probabilities for each competing model. Our simulation study shows that, compared to simple random sampling that ignores genetic information from GWAS, tag-SNP-based stratified sample allocation methods reduce the number of variants continuing to stage 2 and are more likely to promote the functional sequence variant into confirmation studies.
Collapse
Affiliation(s)
- Zhijian Chen
- Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital, Toronto, Ontario, Canada
| | | | | |
Collapse
|
27
|
Thomas DC, Yang Z, Yang F. Two-phase and family-based designs for next-generation sequencing studies. Front Genet 2013; 4:276. [PMID: 24379824 PMCID: PMC3861783 DOI: 10.3389/fgene.2013.00276] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2013] [Accepted: 11/19/2013] [Indexed: 12/21/2022] Open
Abstract
The cost of next-generation sequencing is now approaching that of early GWAS panels, but is still out of reach for large epidemiologic studies and the millions of rare variants expected poses challenges for distinguishing causal from non-causal variants. We review two types of designs for sequencing studies: two-phase designs for targeted follow-up of genomewide association studies using unrelated individuals; and family-based designs exploiting co-segregation for prioritizing variants and genes. Two-phase designs subsample subjects for sequencing from a larger case-control study jointly on the basis of their disease and carrier status; the discovered variants are then tested for association in the parent study. The analysis combines the full sequence data from the substudy with the more limited SNP data from the main study. We discuss various methods for selecting this subset of variants and describe the expected yield of true positive associations in the context of an on-going study of second breast cancers following radiotherapy. While the sharing of variants within families means that family-based designs are less efficient for discovery than sequencing unrelated individuals, the ability to exploit co-segregation of variants with disease within families helps distinguish causal from non-causal ones. Furthermore, by enriching for family history, the yield of causal variants can be improved and use of identity-by-descent information improves imputation of genotypes for other family members. We compare the relative efficiency of these designs with those using unrelated individuals for discovering and prioritizing variants or genes for testing association in larger studies. While associations can be tested with single variants, power is low for rare ones. Recent generalizations of burden or kernel tests for gene-level associations to family-based data are appealing. These approaches are illustrated in the context of a family-based study of colorectal cancer.
Collapse
Affiliation(s)
- Duncan C Thomas
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Zhao Yang
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Fan Yang
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| |
Collapse
|