1
|
Walters K, Yaacob H. Bayesian multivariant fine mapping using the Laplace prior. Genet Epidemiol 2023; 47:249-260. [PMID: 36739616 DOI: 10.1002/gepi.22517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 01/13/2023] [Accepted: 01/24/2023] [Indexed: 02/07/2023]
Abstract
Currently, the only effect size prior that is routinely implemented in a Bayesian fine-mapping multi-single-nucleotide polymorphism (SNP) analysis is the Gaussian prior. Here, we show how the Laplace prior can be deployed in Bayesian multi-SNP fine mapping studies. We compare the ranking performance of the posterior inclusion probability (PIP) using a Laplace prior with the ranking performance of the corresponding Gaussian prior and FINEMAP. Our results indicate that, for the simulation scenarios we consider here, the Laplace prior can lead to higher PIPs than either the Gaussian prior or FINEMAP, particularly for moderately sized fine-mapping studies. The Laplace prior also appears to have better worst-case scenario properties. We reanalyse the iCOGS case-control data from the CASP8 region on Chromosome 2. Even though this study has a total sample size of nearly 90,000 individuals, there are still some differences in the top few ranked SNPs if the Laplace prior is used rather than the Gaussian prior. R code to implement the Laplace (and Gaussian) prior is available at https://github.com/Kevin-walters/lapmapr.
Collapse
Affiliation(s)
- Kevin Walters
- School of Mathematics and Statistics, University of Sheffield, Sheffield, UK
| | - Hannuun Yaacob
- School of Mathematics and Statistics, University of Sheffield, Sheffield, UK.,Department of Economics and Applied Statistics, Faculty of Business and Economics, Universiti Malaya, Kuala Lumpur, Malaysia
| |
Collapse
|
2
|
Semi-parametric empirical Bayes factor for genome-wide association studies. Eur J Hum Genet 2021; 29:800-807. [PMID: 33495595 PMCID: PMC8110551 DOI: 10.1038/s41431-020-00800-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Revised: 11/02/2020] [Accepted: 12/09/2020] [Indexed: 01/29/2023] Open
Abstract
Bayes factor analysis has the attractive property of accommodating the risks of both false negatives and false positives when identifying susceptibility gene variants in genome-wide association studies (GWASs). For a particular SNP, the critical aspect of this analysis is that it incorporates the probability of obtaining the observed value of a statistic on disease association under the alternative hypotheses of non-null association. An approximate Bayes factor (ABF) was proposed by Wakefield (Genetic Epidemiology 2009;33:79-86) based on a normal prior for the underlying effect-size distribution. However, misspecification of the prior can lead to failure in incorporating the probability under the alternative hypothesis. In this paper, we propose a semi-parametric, empirical Bayes factor (SP-EBF) based on a nonparametric effect-size distribution estimated from the data. Analysis of several GWAS datasets revealed the presence of substantial numbers of SNPs with small effect sizes, and the SP-EBF attributed much greater significance to such SNPs than the ABF. Overall, the SP-EBF incorporates an effect-size distribution that is estimated from the data, and it has the potential to improve the accuracy of Bayes factor analysis in GWASs.
Collapse
|
3
|
Walters K, Cox A, Yaacob H. The utility of the Laplace effect size prior distribution in Bayesian fine-mapping studies. Genet Epidemiol 2021; 45:386-401. [PMID: 33410201 DOI: 10.1002/gepi.22375] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 11/28/2020] [Accepted: 12/16/2020] [Indexed: 11/10/2022]
Abstract
The Gaussian distribution is usually the default causal single-nucleotide polymorphism (SNP) effect size prior in Bayesian population-based fine-mapping association studies, but a recent study showed that the heavier-tailed Laplace prior distribution provided a better fit to breast cancer top hits identified in genome-wide association studies. We investigate the utility of the Laplace prior as an effect size prior in univariate fine-mapping studies. We consider ranking SNPs using Bayes factors and other summaries of the effect size posterior distribution, the effect of prior choice on credible set size based on the posterior probability of causality, and on the noteworthiness of SNPs in univariate analyses. Across a wide range of fine-mapping scenarios the Laplace prior generally leads to larger 90% credible sets than the Gaussian prior. These larger credible sets for the Laplace prior are due to relatively high prior mass around zero which can yield many noncausal SNPs with relatively large Bayes factors. If using conventional credible sets, the Gaussian prior generally yields a better trade off between including the causal SNP with high probability and keeping the set size reasonable. Interestingly when using the less well utilised measure of noteworthiness, the Laplace prior performs well, leading to causal SNPs being declared noteworthy with high probability, whilst generally declaring fewer than 5% of noncausal SNPs as being noteworthy. In contrast, the Gaussian prior leads to the causal SNP being declared noteworthy with very low probability.
Collapse
Affiliation(s)
- Kevin Walters
- School of Mathematics and Statistics, University of Sheffield, Sheffield, UK
| | - Angela Cox
- Department of Oncology, Sheffield Cancer Research Centre, University of Sheffield Medical School, Sheffield, UK
| | - Hannuun Yaacob
- School of Mathematics and Statistics, University of Sheffield, Sheffield, UK
| |
Collapse
|
4
|
Hutchinson A, Asimit J, Wallace C. Fine-mapping genetic associations. Hum Mol Genet 2020; 29:R81-R88. [PMID: 32744321 PMCID: PMC7733401 DOI: 10.1093/hmg/ddaa148] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 06/04/2020] [Accepted: 07/09/2020] [Indexed: 02/07/2023] Open
Abstract
Whilst thousands of genetic variants have been associated with human traits, identifying the subset of those variants that are causal requires a further 'fine-mapping' step. We review the basic fine-mapping approach, which is computationally fast and requires only summary data, but depends on an assumption of a single causal variant per associated region which is recognized as biologically unrealistic. We discuss different ways that the approach has been built upon to accommodate multiple causal variants in a region and to incorporate additional layers of functional annotation data. We further review methods for simultaneous fine-mapping of multiple datasets, either exploiting different linkage disequilibrium (LD) structures across ancestries or borrowing information between distinct but related traits. Finally, we look to the future and the opportunities that will be offered by increasingly accurate maps of causal variants for a multitude of human traits.
Collapse
Affiliation(s)
- Anna Hutchinson
- MRC Biostatistics Unit, Cambridge Biomedical Campus, Cambridge Institute of Public Health, Cambridge CB2 0SR, UK
| | - Jennifer Asimit
- MRC Biostatistics Unit, Cambridge Biomedical Campus, Cambridge Institute of Public Health, Cambridge CB2 0SR, UK
| | - Chris Wallace
- MRC Biostatistics Unit, Cambridge Biomedical Campus, Cambridge Institute of Public Health, Cambridge CB2 0SR, UK
- Cambridge Institute of Therapeutic Immunology & Infectious Disease (CITIID), Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, CB2 0AW, UK
- Department of Medicine, University of Cambridge School of Clinical Medicine, Cambridge Biomedical Campus, Cambridge, CB2 2QQ, UK
| |
Collapse
|
5
|
Xu J, Xu W, Briollais L. A Bayes factor approach with informative prior for rare genetic variant analysis from next generation sequencing data. Biometrics 2020; 77:316-328. [PMID: 32277476 DOI: 10.1111/biom.13278] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Revised: 02/15/2020] [Accepted: 04/01/2020] [Indexed: 11/28/2022]
Abstract
The discovery of rare genetic variants through next generation sequencing is a very challenging issue in the field of human genetics. We propose a novel region-based statistical approach based on a Bayes Factor (BF) to assess evidence of association between a set of rare variants (RVs) located on the same genomic region and a disease outcome in the context of case-control design. Marginal likelihoods are computed under the null and alternative hypotheses assuming a binomial distribution for the RV count in the region and a beta or mixture of Dirac and beta prior distribution for the probability of RV. We derive the theoretical null distribution of the BF under our prior setting and show that a Bayesian control of the false Discovery Rate can be obtained for genome-wide inference. Informative priors are introduced using prior evidence of association from a Kolmogorov-Smirnov test statistic. We use our simulation program, sim1000G, to generate RV data similar to the 1000 genomes sequencing project. Our simulation studies showed that the new BF statistic outperforms standard methods (SKAT, SKAT-O, Burden test) in case-control studies with moderate sample sizes and is equivalent to them under large sample size scenarios. Our real data application to a lung cancer case-control study found enrichment for RVs in known and novel cancer genes. It also suggests that using the BF with informative prior improves the overall gene discovery compared to the BF with noninformative prior.
Collapse
Affiliation(s)
- Jingxiong Xu
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Canada
| | - Wei Xu
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada.,Princess Margaret Cancer Center, Toronto, Canada
| | - Laurent Briollais
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Canada
| |
Collapse
|
6
|
Vsevolozhskaya OA, Zaykin DV. Quantifying posterior effect size distribution of susceptibility loci by common summary statistics. Genet Epidemiol 2020; 44:339-351. [PMID: 32100375 DOI: 10.1002/gepi.22286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 12/25/2019] [Accepted: 01/27/2020] [Indexed: 11/06/2022]
Abstract
Testing millions of single nucleotide polymorphisms (SNPs) in genetic association studies has become a standard routine for disease gene discovery. In light of recent re-evaluation of statistical practice, it has been suggested that p-values are unfit as summaries of statistical evidence. Despite this criticism, p-values contain information that can be utilized to address the concerns about their flaws. We present a new method for utilizing evidence summarized by p-values for estimating odds ratio (OR) based on its approximate posterior distribution. In our method, only p-values, sample size, and standard deviation for ln(OR) are needed as summaries of data, accompanied by a suitable prior distribution for ln(OR) that can assume any shape. The parameter of interest, ln(OR), is the only parameter with a specified prior distribution, hence our model is a mix of classical and Bayesian approaches. We show that our method retains the main advantages of the Bayesian approach: it yields direct probability statements about hypotheses for OR and is resistant to biases caused by selection of top-scoring SNPs. Our method enjoys greater flexibility than similarly inspired methods in the assumed distribution for the summary statistic and in the form of the prior for the parameter of interest. We illustrate our method by presenting interval estimates of effect size for reported genetic associations with lung cancer. Although we focus on OR, the method is not limited to this particular measure of effect size and can be used broadly for assessing reliability of findings in studies testing multiple predictors.
Collapse
Affiliation(s)
| | - Dmitri V Zaykin
- Biostatistics and Computational Biology, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina
| |
Collapse
|
7
|
Alenazi AA, Cox A, Juarez M, Lin W, Walters K. Bayesian variable selection using partially observed categorical prior information in fine‐mapping association studies. Genet Epidemiol 2019; 43:690-703. [DOI: 10.1002/gepi.22213] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 04/24/2019] [Accepted: 05/07/2019] [Indexed: 12/22/2022]
Affiliation(s)
- Abdulaziz A. Alenazi
- School of Mathematics and StatisticsUniversity of SheffieldSheffield UK
- Department of MathematicsNorthern Border UniversityArar Saudi Arabia
| | - Angela Cox
- Department of Oncology, Sheffield Cancer Research CentreUniversity of Sheffield Medical SchoolSheffield UK
| | - Miguel Juarez
- School of Mathematics and StatisticsUniversity of SheffieldSheffield UK
| | - Wei‐Yu Lin
- Department of Oncology, Sheffield Cancer Research CentreUniversity of Sheffield Medical SchoolSheffield UK
- Northern Institute for Cancer Research, Medical SchoolUniversity of NewcastleNewcastle UK
| | - Kevin Walters
- School of Mathematics and StatisticsUniversity of SheffieldSheffield UK
| |
Collapse
|
8
|
Walters K, Cox A, Yaacob H. Using GWAS top hits to inform priors in Bayesian fine‐mapping association studies. Genet Epidemiol 2019; 43:675-689. [DOI: 10.1002/gepi.22212] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 04/04/2019] [Accepted: 05/07/2019] [Indexed: 11/07/2022]
Affiliation(s)
- Kevin Walters
- School of Mathematics and StatisticsUniversity of SheffieldSheffield UK
| | - Angela Cox
- Department of Oncology, Sheffield Cancer Research CentreUniversity of Sheffield Medical SchoolSheffield UK
| | - Hannuun Yaacob
- School of Mathematics and StatisticsUniversity of SheffieldSheffield UK
| |
Collapse
|
9
|
Spencer AV, Cox A, Lin W, Easton DF, Michailidou K, Walters K. Incorporating Functional Genomic Information in Genetic Association Studies Using an Empirical Bayes Approach. Genet Epidemiol 2016; 40:176-87. [PMID: 26833494 PMCID: PMC4832271 DOI: 10.1002/gepi.21956] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Revised: 12/04/2015] [Accepted: 12/14/2015] [Indexed: 01/01/2023]
Abstract
There is a large amount of functional genetic data available, which can be used to inform fine-mapping association studies (in diseases with well-characterised disease pathways). Single nucleotide polymorphism (SNP) prioritization via Bayes factors is attractive because prior information can inform the effect size or the prior probability of causal association. This approach requires the specification of the effect size. If the information needed to estimate a priori the probability density for the effect sizes for causal SNPs in a genomic region isn't consistent or isn't available, then specifying a prior variance for the effect sizes is challenging. We propose both an empirical method to estimate this prior variance, and a coherent approach to using SNP-level functional data, to inform the prior probability of causal association. Through simulation we show that when ranking SNPs by our empirical Bayes factor in a fine-mapping study, the causal SNP rank is generally as high or higher than the rank using Bayes factors with other plausible values of the prior variance. Importantly, we also show that assigning SNP-specific prior probabilities of association based on expert prior functional knowledge of the disease mechanism can lead to improved causal SNPs ranks compared to ranking with identical prior probabilities of association. We demonstrate the use of our methods by applying the methods to the fine mapping of the CASP8 region of chromosome 2 using genotype data from the Collaborative Oncological Gene-Environment Study (COGS) Consortium. The data we analysed included approximately 46,000 breast cancer case and 43,000 healthy control samples.
Collapse
Affiliation(s)
- Amy V. Spencer
- Advanced Analytics CentreGlobal Medicines DevelopmentAstraZenecaAlderley ParkMacclesfieldUnited Kingdom
- School of Mathematics and StatisticsUniversity of SheffieldSheffieldUnited Kingdom
| | - Angela Cox
- Department of OncologySheffield Cancer Research CentreUniversity of Sheffield Medical SchoolBeech Hill RoadSheffieldUnited Kingdom
| | - Wei‐Yu Lin
- Department of OncologySheffield Cancer Research CentreUniversity of Sheffield Medical SchoolBeech Hill RoadSheffieldUnited Kingdom
- Cardiovascular Epidemiology UnitDepartment of Public Health and Primary CareUniversity of CambridgeCambridgeUnited Kingdom
| | - Douglas F. Easton
- Department of Public Health and Primary CareCentre for Cancer Genetic EpidemiologyUniversity of CambridgeCambridgeUnited Kingdom
- Department of OncologyCentre for Cancer Genetic EpidemiologyUniversity of CambridgeCambridgeUnited Kingdom
| | - Kyriaki Michailidou
- Department of Public Health and Primary CareCentre for Cancer Genetic EpidemiologyUniversity of CambridgeCambridgeUnited Kingdom
| | - Kevin Walters
- School of Mathematics and StatisticsUniversity of SheffieldSheffieldUnited Kingdom
| |
Collapse
|