1
|
Xu S, Williams J, Ferreira MAR. BG2: Bayesian variable selection in generalized linear mixed models with nonlocal priors for non-Gaussian GWAS data. BMC Bioinformatics 2023; 24:343. [PMID: 37715138 PMCID: PMC10503129 DOI: 10.1186/s12859-023-05468-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 09/05/2023] [Indexed: 09/17/2023] Open
Abstract
BACKGROUND Genome-wide association studies (GWASes) aim to identify single nucleotide polymorphisms (SNPs) associated with a given phenotype. A common approach for the analysis of GWAS is single marker analysis (SMA) based on linear mixed models (LMMs). However, LMM-based SMA usually yields a large number of false discoveries and cannot be directly applied to non-Gaussian phenotypes such as count data. RESULTS We present a novel Bayesian method to find SNPs associated with non-Gaussian phenotypes. To that end, we use generalized linear mixed models (GLMMs) and, thus, call our method Bayesian GLMMs for GWAS (BG2). To deal with the high dimensionality of GWAS analysis, we propose novel nonlocal priors specifically tailored for GLMMs. In addition, we develop related fast approximate Bayesian computations. BG2 uses a two-step procedure: first, BG2 screens for candidate SNPs; second, BG2 performs model selection that considers all screened candidate SNPs as possible regressors. A simulation study shows favorable performance of BG2 when compared to GLMM-based SMA. We illustrate the usefulness and flexibility of BG2 with three case studies on cocaine dependence (binary data), alcohol consumption (count data), and number of root-like structures in a model plant (count data).
Collapse
Affiliation(s)
- Shuangshuang Xu
- Department of Statistics, Virginia Tech, Blacksburg, VA, 24061, USA
| | - Jacob Williams
- Department of Statistics, Virginia Tech, Blacksburg, VA, 24061, USA
| | | |
Collapse
|
2
|
Karhunen V, Launonen I, Järvelin MR, Sebert S, Sillanpää MJ. Genetic fine-mapping from summary data using a nonlocal prior improves the detection of multiple causal variants. Bioinformatics 2023; 39:btad396. [PMID: 37348543 PMCID: PMC10326304 DOI: 10.1093/bioinformatics/btad396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 06/09/2023] [Accepted: 06/20/2023] [Indexed: 06/24/2023] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) have been successful in identifying genomic loci associated with complex traits. Genetic fine-mapping aims to detect independent causal variants from the GWAS-identified loci, adjusting for linkage disequilibrium patterns. RESULTS We present "FiniMOM" (fine-mapping using a product inverse-moment prior), a novel Bayesian fine-mapping method for summarized genetic associations. For causal effects, the method uses a nonlocal inverse-moment prior, which is a natural prior distribution to model non-null effects in finite samples. A beta-binomial prior is set for the number of causal variants, with a parameterization that can be used to control for potential misspecifications in the linkage disequilibrium reference. The results of simulations studies aimed to mimic a typical GWAS on circulating protein levels show improved credible set coverage and power of the proposed method over current state-of-the-art fine-mapping method SuSiE, especially in the case of multiple causal variants within a locus. AVAILABILITY AND IMPLEMENTATION https://vkarhune.github.io/finimom/.
Collapse
Affiliation(s)
- Ville Karhunen
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, P.O.Box 8000, FI-90014, Finland
- Research Unit of Population Health, University of Oulu, Oulu, Finland
| | - Ilkka Launonen
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, P.O.Box 8000, FI-90014, Finland
| | - Marjo-Riitta Järvelin
- Research Unit of Population Health, University of Oulu, Oulu, Finland
- Department of Epidemiology and Biostatistics, Imperial College London, London, United Kingdom
- Department of Life Sciences, College of Health and Life Sciences, Brunel University, London, United Kingdom
| | - Sylvain Sebert
- Research Unit of Population Health, University of Oulu, Oulu, Finland
| | - Mikko J Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, P.O.Box 8000, FI-90014, Finland
| |
Collapse
|
3
|
Williams J, Xu S, Ferreira MAR. BGWAS: Bayesian variable selection in linear mixed models with nonlocal priors for genome-wide association studies. BMC Bioinformatics 2023; 24:194. [PMID: 37170185 PMCID: PMC10176706 DOI: 10.1186/s12859-023-05316-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 04/30/2023] [Indexed: 05/13/2023] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) seek to identify single nucleotide polymorphisms (SNPs) that cause observed phenotypes. However, with highly correlated SNPs, correlated observations, and the number of SNPs being two orders of magnitude larger than the number of observations, GWAS procedures often suffer from high false positive rates. RESULTS We propose BGWAS, a novel Bayesian variable selection method based on nonlocal priors for linear mixed models specifically tailored for genome-wide association studies. Our proposed method BGWAS uses a novel nonlocal prior for linear mixed models (LMMs). BGWAS has two steps: screening and model selection. The screening step scans through all the SNPs fitting one LMM for each SNP and then uses Bayesian false discovery control to select a set of candidate SNPs. After that, a model selection step searches through the space of LMMs that may have any number of SNPs from the candidate set. A simulation study shows that, when compared to popular GWAS procedures, BGWAS greatly reduces false positives while maintaining the same ability to detect true positive SNPs. We show the utility and flexibility of BGWAS with two case studies: a case study on salt stress in plants, and a case study on alcohol use disorder. CONCLUSIONS BGWAS maintains and in some cases increases the recall of true SNPs while drastically lowering the number of false positives compared to popular SMA procedures.
Collapse
Affiliation(s)
- Jacob Williams
- Department of Statistics, Virginia Tech, Blacksburg, 24061, USA.
| | - Shuangshuang Xu
- Department of Statistics, Virginia Tech, Blacksburg, 24061, USA
| | | |
Collapse
|
4
|
Williams J, Ferreira MAR, Ji T. BICOSS: Bayesian iterative conditional stochastic search for GWAS. BMC Bioinformatics 2022; 23:475. [DOI: 10.1186/s12859-022-05030-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 10/31/2022] [Indexed: 11/15/2022] Open
Abstract
Abstract
Background
Single marker analysis (SMA) with linear mixed models for genome wide association studies has uncovered the contribution of genetic variants to many observed phenotypes. However, SMA has weak false discovery control. In addition, when a few variants have large effect sizes, SMA has low statistical power to detect small and medium effect sizes, leading to low recall of true causal single nucleotide polymorphisms (SNPs).
Results
We present the Bayesian Iterative Conditional Stochastic Search (BICOSS) method that controls false discovery rate and increases recall of variants with small and medium effect sizes. BICOSS iterates between a screening step and a Bayesian model selection step. A simulation study shows that, when compared to SMA, BICOSS dramatically reduces false discovery rate and allows for smaller effect sizes to be discovered. Finally, two real world applications show the utility and flexibility of BICOSS.
Conclusions
When compared to widely used SMA, BICOSS provides higher recall of true SNPs while dramatically reducing false discovery rate.
Collapse
|
5
|
Mallick H, Alhamzawi R, Paul E, Svetnik V. The reciprocal Bayesian LASSO. Stat Med 2021; 40:4830-4849. [PMID: 34126655 DOI: 10.1002/sim.9098] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 05/19/2021] [Accepted: 05/27/2021] [Indexed: 11/08/2022]
Abstract
A reciprocal LASSO (rLASSO) regularization employs a decreasing penalty function as opposed to conventional penalization approaches that use increasing penalties on the coefficients, leading to stronger parsimony and superior model selection relative to traditional shrinkage methods. Here we consider a fully Bayesian formulation of the rLASSO problem, which is based on the observation that the rLASSO estimate for linear regression parameters can be interpreted as a Bayesian posterior mode estimate when the regression parameters are assigned independent inverse Laplace priors. Bayesian inference from this posterior is possible using an expanded hierarchy motivated by a scale mixture of double Pareto or truncated normal distributions. On simulated and real datasets, we show that the Bayesian formulation outperforms its classical cousin in estimation, prediction, and variable selection across a wide range of scenarios while offering the advantage of posterior inference. Finally, we discuss other variants of this new approach and provide a unified framework for variable selection using flexible reciprocal penalties. All methods described in this article are publicly available as an R package at: https://github.com/himelmallick/BayesRecipe.
Collapse
Affiliation(s)
- Himel Mallick
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, USA
| | - Rahim Alhamzawi
- Department of Statistics, University of Al-Qadisiyah, Al Diwaniyah, Iraq.,Center for Scientific Research and Development, Nawroz University, Duhok, Iraq
| | - Erina Paul
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, USA
| | - Vladimir Svetnik
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, USA
| |
Collapse
|
6
|
Ren W, Liang Z, He S, Xiao J. Hybrid of Restricted and Penalized Maximum Likelihood Method for Efficient Genome-Wide Association Study. Genes (Basel) 2020; 11:genes11111286. [PMID: 33138126 PMCID: PMC7692801 DOI: 10.3390/genes11111286] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 10/26/2020] [Accepted: 10/27/2020] [Indexed: 11/16/2022] Open
Abstract
In genome-wide association studies, linear mixed models (LMMs) have been widely used to explore the molecular mechanism of complex traits. However, typical association approaches suffer from several important drawbacks: estimation of variance components in LMMs with large scale individuals is computationally slow; single-locus model is unsatisfactory to handle complex confounding and causes loss of statistical power. To address these issues, we propose an efficient two-stage method based on hybrid of restricted and penalized maximum likelihood, named HRePML. Firstly, we performed restricted maximum likelihood (REML) on single-locus LMM to remove unrelated markers, where spectral decomposition on covariance matrix was used to fast estimate variance components. Secondly, we carried out penalized maximum likelihood (PML) on multi-locus LMM for markers with reasonably large effects. To validate the effectiveness of HRePML, we conducted a series of simulation studies and real data analyses. As a result, our method always had the highest average statistical power compared with multi-locus mixed-model (MLMM), fixed and random model circulating probability unification (FarmCPU), and genome-wide efficient mixed model association (GEMMA). More importantly, HRePML can provide higher accuracy estimation of marker effects. HRePML also identifies 41 previous reported genes associated with development traits in Arabidopsis, which is more than was detected by the other methods.
Collapse
Affiliation(s)
- Wenlong Ren
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, Nantong 226019, China; (W.R.); (S.H.)
| | - Zhikai Liang
- Plant and Microbial Biology Department, University of Minnesota, Saint Paul, MN 55108, USA;
| | - Shu He
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, Nantong 226019, China; (W.R.); (S.H.)
| | - Jing Xiao
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, Nantong 226019, China; (W.R.); (S.H.)
- Correspondence:
| |
Collapse
|
7
|
Cao X, Xing L, He H, Zhang X. Views on GWAS statistical analysis. Bioinformation 2020; 16:393-397. [PMID: 32831520 PMCID: PMC7434950 DOI: 10.6026/97320630016393] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 04/15/2020] [Accepted: 04/17/2020] [Indexed: 11/23/2022] Open
Abstract
Genome-wide association study (GWAS) is a popular approach to investigate relationships between genetic information and diseases. A number of associations are tested in a study and the results are often corrected using multiple adjustment methods. It is observed that GWAS studies suffer adequate statistical power for reliability. Hence, we document known models for reliability assessment using improved statistical power in GWAS analysis.
Collapse
Affiliation(s)
- Xiaowen Cao
- Department of Mathematics, Hebei University of Technology, Tianjin, China
- Department of Mathematics and Statistics, University of Victoria, BC, Canada
| | - Li Xing
- Department of Mathematics and Statistics, University of Saskatchewan, Saskatoon, SK, Canada
| | - Hua He
- Department of Mathematics, Hebei University of Technology, Tianjin, China
| | - Xuekui Zhang
- Department of Mathematics and Statistics, University of Victoria, BC, Canada
| |
Collapse
|
8
|
Kaplan A, Lock EF, Fiecas M, for the Alzheimer’s Disease Neuroimaging Initiative. Bayesian GWAS with Structured and Non-Local Priors. Bioinformatics 2020; 36:17-25. [PMID: 31651034 PMCID: PMC6956774 DOI: 10.1093/bioinformatics/btz518] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 05/20/2019] [Accepted: 06/18/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The flexibility of a Bayesian framework is promising for GWAS, but current approaches can benefit from more informative prior models. We introduce a novel Bayesian approach to GWAS, called Structured and Non-Local Priors (SNLPs) GWAS, that improves over existing methods in two important ways. First, we describe a model that allows for a marker's gene-parent membership and other characteristics to influence its probability of association with an outcome. Second, we describe a non-local alternative model for differential minor allele rates at each marker, in which the null and alternative hypotheses have no common support. RESULTS We employ a non-parametric model that allows for clustering of the genes in tandem with a regression model for marker-level covariates, and demonstrate how incorporating these additional characteristics can improve power. We further demonstrate that our non-local alternative model gives symmetric rates of convergence for the null and alternative hypotheses, whereas commonly used local alternative models have asymptotic rates that favor the alternative hypothesis over the null. We demonstrate the robustness and flexibility of our structured and non-local model for different data generating scenarios and signal-to-noise ratios. We apply our Bayesian GWAS method to single nucleotide polymorphisms data collected from a pool of Alzheimer's disease and cognitively normal patients from the Alzheimer's Database Neuroimaging Initiative. AVAILABILITY AND IMPLEMENTATION R code to perform the SNLPs method is available at https://github.com/lockEF/BayesianScreening.
Collapse
Affiliation(s)
- Adam Kaplan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Eric F Lock
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Mark Fiecas
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | | |
Collapse
|
9
|
Xu Y, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Sci Rep 2019; 9:13686. [PMID: 31548641 PMCID: PMC6757104 DOI: 10.1038/s41598-019-50229-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Accepted: 09/09/2019] [Indexed: 12/18/2022] Open
Abstract
Genome-wide association studies (GWASs) aim to detect genetic risk factors for complex human diseases by identifying disease-associated single-nucleotide polymorphisms (SNPs). The traditional SNP-wise approach along with multiple testing adjustment is over-conservative and lack of power in many GWASs. In this article, we proposed a model-based clustering method that transforms the challenging high-dimension-small-sample-size problem to low-dimension-large-sample-size problem and borrows information across SNPs by grouping SNPs into three clusters. We pre-specify the patterns of clusters by minor allele frequencies of SNPs between cases and controls, and enforce the patterns with prior distributions. In the simulation studies our proposed novel model outperforms traditional SNP-wise approach by showing better controls of false discovery rate (FDR) and higher sensitivity. We re-analyzed two real studies to identifying SNPs associated with severe bortezomib-induced peripheral neuropathy (BiPN) in patients with multiple myeloma (MM). The original analysis in the literature failed to identify SNPs after FDR adjustment. Our proposed method not only detected the reported SNPs after FDR adjustment but also discovered a novel BiPN-associated SNP rs4351714 that has been reported to be related to MM in another study.
Collapse
Affiliation(s)
- Yan Xu
- Department of Mathematics and Statistics, University of Victoria, Victoria, BC, Canada
| | - Li Xing
- Department of Mathematics and Statistics, University of Saskatchewan, Saskatoon, SK, Canada
| | - Jessica Su
- Channing Division of Network Medicine, Brigham and Women's Hospital/Harvard Medical School, Boston, MA, USA
| | - Xuekui Zhang
- Department of Mathematics and Statistics, University of Victoria, Victoria, BC, Canada.
| | - Weiliang Qiu
- Channing Division of Network Medicine, Brigham and Women's Hospital/Harvard Medical School, Boston, MA, USA
| |
Collapse
|