1
|
Deng Y, He Y, Xu G, Pan W. Speeding up Monte Carlo simulations for the adaptive sum of powered score test with importance sampling. Biometrics 2022; 78:261-273. [PMID: 33215683 PMCID: PMC8134502 DOI: 10.1111/biom.13407] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 08/30/2020] [Accepted: 10/29/2020] [Indexed: 12/21/2022]
Abstract
A central but challenging problem in genetic studies is to test for (usually weak) associations between a complex trait (e.g., a disease status) and sets of multiple genetic variants. Due to the lack of a uniformly most powerful test, data-adaptive tests, such as the adaptive sum of powered score (aSPU) test, are advantageous in maintaining high power against a wide range of alternatives. However, there is often no closed-form to accurately and analytically calculate the p-values of many adaptive tests like aSPU, thus Monte Carlo (MC) simulations are often used, which can be time consuming to achieve a stringent significance level (e.g., 5e-8) used in genome-wide association studies (GWAS). To estimate such a small p-value, we need a huge number of MC simulations (e.g., 1e+10). As an alternative, we propose using importance sampling to speed up such calculations. We develop some theory to motivate a proposed algorithm for the aSPU test, and show that the proposed method is computationally more efficient than the standard MC simulations. Using both simulated and real data, we demonstrate the superior performance of the new method over the standard MC simulations.
Collapse
Affiliation(s)
- Yangqing Deng
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA,Department of Mathematics, University of North Texas, Denton, TX 76203, USA
| | - Yinqiu He
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Gongjun Xu
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA,Corresponding author:
| |
Collapse
|
2
|
Yang T, Kim J, Wu C, Ma Y, Wei P, Pan W. An adaptive test for meta-analysis of rare variant association studies. Genet Epidemiol 2020; 44:104-116. [PMID: 31830326 PMCID: PMC6980317 DOI: 10.1002/gepi.22273] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 11/12/2019] [Accepted: 11/25/2019] [Indexed: 01/02/2023]
Abstract
Single genome-wide studies may be underpowered to detect trait-associated rare variants with moderate or weak effect sizes. As a viable alternative, meta-analysis is widely used to increase power by combining different studies. The power of meta-analysis critically depends on the underlying association patterns and heterogeneity levels, which are unknown and vary from locus to locus. However, existing methods mainly focus on one or only a few combinations of the association pattern and heterogeneity level, thus may lose power in many situations. To address this issue, we propose a general and unified framework by combining a class of tests including and beyond some existing ones, leading to high power across a wide range of scenarios. We demonstrate that the proposed test is more powerful than some existing methods in simulation studies, then show their performance with the NHLBI Exome-Sequencing Project (ESP) data. One gene (B4GALNT2) was found by our proposed test, but not by others, to be statistically significantly associated with plasma triglyceride. The signal was driven by African-ancestry subjects but it was previously reported to be associated with coronary artery disease among European-ancestry subjects. We implemented our method in an R package aSPUmeta, publicly available at https://github.com/ytzhong/metaRV and will be on CRAN soon.
Collapse
Affiliation(s)
- Tianzhong Yang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Junghi Kim
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Yiding Ma
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
3
|
Abstract
Background We propose a gene-level association test that accounts for individual relatedness and population structures in pedigree data in the framework of linear mixed models (LMMs). Our method data-adaptively combines the results across a class of score-based tests, only requiring fitting a single null model (under the null hypothesis) for the whole genome, thereby being computationally efficient. Results We applied our approach to test for association with the high-density lipoprotein (HDL) ratio of post- and pretreatments in GAW20 data. Using the LMM similar to that used by Aslibekyan et al. (PLos One, 7:48663, 2012), our method identified 2 nearly significant genes (APOA5 and ZNF259) near rs964184, whereas neither the other gene-level tests nor the standard test on each individual single-nucleotide polymorphism (SNP) detected any significant gene in a genome-wide scan. Conclusions Gene-level association testing can be a complementary approach to the SNP-level association testing and our method is adaptive and efficient compared to several other existing gene-level association tests.
Collapse
Affiliation(s)
- Jun Young Park
- Division of Biostatistics, University of Minnesota, 420 Delaware Street SE, Minneapolis, MN, 55455, USA
| | - Chong Wu
- Division of Biostatistics, University of Minnesota, 420 Delaware Street SE, Minneapolis, MN, 55455, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, 420 Delaware Street SE, Minneapolis, MN, 55455, USA.
| |
Collapse
|
4
|
Wu C, Pan W. Integrating eQTL data with GWAS summary statistics in pathway-based analysis with application to schizophrenia. Genet Epidemiol 2018; 42:303-316. [PMID: 29411426 PMCID: PMC5851843 DOI: 10.1002/gepi.22110] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Revised: 01/04/2018] [Accepted: 01/04/2018] [Indexed: 12/11/2022]
Abstract
Many genetic variants affect complex traits through gene expression, which can be exploited to boost statistical power and enhance interpretation in genome-wide association studies (GWASs) as demonstrated by the transcriptome-wide association study (TWAS) approach. Furthermore, due to polygenic inheritance, a complex trait is often affected by multiple genes with similar functions as annotated in gene pathways. Here, we extend TWAS from gene-based analysis to pathway-based analysis: we integrate public pathway collections, expression quantitative trait locus (eQTL) data and GWAS summary association statistics (or GWAS individual-level data) to identify gene pathways associated with complex traits. The basic idea is to weight the SNPs of the genes in a pathway based on their estimated cis-effects on gene expression, then adaptively test for association of the pathway with a GWAS trait by effectively aggregating possibly weak association signals across the genes in the pathway. The P values can be calculated analytically and thus fast. We applied our proposed test with the KEGG and GO pathways to two schizophrenia (SCZ) GWAS summary association data sets, denoted by SCZ1 and SCZ2 with about 20,000 and 150,000 subjects, respectively. Most of the significant pathways identified by analyzing the SCZ1 data were reproduced by the SCZ2 data. Importantly, we identified 15 novel pathways associated with SCZ, such as GABA receptor complex (GO:1902710), which could not be uncovered by the standard single SNP-based analysis or gene-based TWAS. The newly identified pathways may help us gain insights into the biological mechanism underlying SCZ. Our results showcase the power of incorporating gene expression information and gene functional annotations into pathway-based association testing for GWAS.
Collapse
Affiliation(s)
- Chong Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
5
|
Park JY, Wu C, Basu S, McGue M, Pan W. Adaptive SNP-Set Association Testing in Generalized Linear Mixed Models with Application to Family Studies. Behav Genet 2018; 48:55-66. [PMID: 29150721 PMCID: PMC5754233 DOI: 10.1007/s10519-017-9883-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Accepted: 11/07/2017] [Indexed: 10/18/2022]
Abstract
In genome-wide association studies (GWAS), it has been increasingly recognized that, as a complementary approach to standard single SNP analyses, it may be beneficial to analyze a group of functionally related SNPs together. Among the existent population-based SNP-set association tests, two adaptive tests, the aSPU test and the aSPUpath test, offer a powerful and general approach at the gene- and pathway-levels by data-adaptively combining the results across multiple SNPs (and genes) such that high statistical power can be maintained across a wide range of scenarios. We extend the aSPU and the aSPUpath test to familial data under the framework of the generalized linear mixed models (GLMMs), which can take account of both subject relatedness and possible population structure. As in population-based GWAS, the proposed aSPU and aSPUpath tests require only fitting a single and common GLMM (under the null hypothesis) for all the SNPs, thus are computationally efficient and feasible for large GWAS data. We illustrate our approaches in identifying genes and pathways associated with alcohol dependence in the Minnesota Twin Family Study. The aSPU test detected a gene associated with the trait, in contrast to none by the standard single SNP analysis. Our aSPU test also controlled Type I errors satisfactorily in a small simulation study. We provide R code to conduct the aSPU and aSPUpath tests for familial and other correlated data.
Collapse
Affiliation(s)
- Jun Young Park
- Division of Biostatistics, University of Minnesota, A460 Mayo Building, MMC 303, 420 Delaware St. SE, Minneapolis, MN, 55455, USA
| | - Chong Wu
- Division of Biostatistics, University of Minnesota, A460 Mayo Building, MMC 303, 420 Delaware St. SE, Minneapolis, MN, 55455, USA
| | - Saonli Basu
- Division of Biostatistics, University of Minnesota, A460 Mayo Building, MMC 303, 420 Delaware St. SE, Minneapolis, MN, 55455, USA
| | - Matt McGue
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, A460 Mayo Building, MMC 303, 420 Delaware St. SE, Minneapolis, MN, 55455, USA.
| |
Collapse
|
6
|
Kim J, Pan W. Adaptive testing for multiple traits in a proportional odds model with applications to detect SNP-brain network associations. Genet Epidemiol 2017; 41:259-277. [PMID: 28191669 DOI: 10.1002/gepi.22033] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Revised: 10/07/2016] [Accepted: 10/31/2016] [Indexed: 12/15/2022]
Abstract
There has been increasing interest in developing more powerful and flexible statistical tests to detect genetic associations with multiple traits, as arising from neuroimaging genetic studies. Most of existing methods treat a single trait or multiple traits as response while treating an SNP as a predictor coded under an additive inheritance mode. In this paper, we follow an earlier approach in treating an SNP as an ordinal response while treating traits as predictors in a proportional odds model (POM). In this way, it is not only easier to handle mixed types of traits, e.g., some quantitative and some binary, but it is also potentially more robust to the commonly adopted additive inheritance mode. More importantly, we develop an adaptive test in a POM so that it can maintain high power across many possible situations. Compared to the existing methods treating multiple traits as responses, e.g., in a generalized estimating equation (GEE) approach, the proposed method can be applied to a high dimensional setting where the number of phenotypes (p) can be larger than the sample size (n), in addition to a usual small P setting. The promising performance of the proposed method was demonstrated with applications to the Alzheimer's Disease Neuroimaging Initiative (ADNI) data, in which either structural MRI driven phenotypes or resting-state functional MRI (rs-fMRI) derived brain functional connectivity measures were used as phenotypes. The applications led to the identification of several top SNPs of biological interest. Furthermore, simulation studies showed competitive performance of the new method, especially for p>n.
Collapse
Affiliation(s)
- Junghi Kim
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | -
- Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http: //adni.loni.usc.edu/wp-content/uploads/how to apply/ADNI Acknowledgement List.pdf
| |
Collapse
|