1
|
Liang X, Sun H. Weighted Selection Probability to Prioritize Susceptible Rare Variants in Multi-Phenotype Association Studies with Application to a Soybean Genetic Data Set. J Comput Biol 2023; 30:1075-1088. [PMID: 37871292 DOI: 10.1089/cmb.2022.0487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023] Open
Abstract
Rare variant association studies with multiple traits or diseases have drawn a lot of attention since association signals of rare variants can be boosted if more than one phenotype outcome is associated with the same rare variants. Most of the existing statistical methods to identify rare variants associated with multiple phenotypes are based on a group test, where a pre-specified genetic region is tested one at a time. However, these methods are not designed to locate susceptible rare variants within the genetic region. In this article, we propose new statistical methods to prioritize rare variants within a genetic region when a group test for the genetic region identifies a statistical association with multiple phenotypes. It computes the weighted selection probability (WSP) of individual rare variants and ranks them from largest to smallest according to their WSP. In simulation studies, we demonstrated that the proposed method outperforms other statistical methods in terms of true positive selection, when multiple phenotypes are correlated with each other. We also applied it to our soybean single nucleotide polymorphism (SNP) data with 13 highly correlated amino acids, where we identified some potentially susceptible rare variants in chromosome 19.
Collapse
Affiliation(s)
- Xianglong Liang
- Department of Statistic, Pusan National University, Busan, Korea
| | - Hokeun Sun
- Department of Statistic, Pusan National University, Busan, Korea
| |
Collapse
|
2
|
Blumhagen RZ, Schwartz DA, Langefeld CD, Fingerlin TE. Identification of Influential Variants in Significant Aggregate Rare Variant Tests. Hum Hered 2021; 85:1-13. [PMID: 33567433 PMCID: PMC8353006 DOI: 10.1159/000513290] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 11/19/2020] [Indexed: 12/17/2022] Open
Abstract
INTRODUCTION Studies that examine the role of rare variants in both simple and complex disease are increasingly common. Though the usual approach of testing rare variants in aggregate sets is more powerful than testing individual variants, it is of interest to identify the variants that are plausible drivers of the association. We present a novel method for prioritization of rare variants after a significant aggregate test by quantifying the influence of the variant on the aggregate test of association. METHODS In addition to providing a measure used to rank variants, we use outlier detection methods to present the computationally efficient Rare Variant Influential Filtering Tool (RIFT) to identify a subset of variants that influence the disease association. We evaluated several outlier detection methods that vary based on the underlying variance measure: interquartile range (Tukey fences), median absolute deviation, and SD. We performed 1,000 simulations for 50 regions of size 3 kb and compared the true and false positive rates. We compared RIFT using the Inner Tukey to 2 existing methods: adaptive combination of p values (ADA) and a Bayesian hierarchical model (BeviMed). Finally, we applied this method to data from our targeted resequencing study in idiopathic pulmonary fibrosis (IPF). RESULTS All outlier detection methods observed higher sensitivity to detect uncommon variants (0.001 < minor allele frequency, MAF > 0.03) compared to very rare variants (MAF <0.001). For uncommon variants, RIFT had a lower median false positive rate compared to the ADA. ADA and RIFT had significantly higher true positive rates than that observed for BeviMed. When applied to 2 regions found previously associated with IPF including 100 rare variants, we identified 6 polymorphisms with the greatest evidence for influencing the association with IPF. DISCUSSION In summary, RIFT has a high true positive rate while maintaining a low false positive rate for identifying polymorphisms influencing rare variant association tests. This work provides an approach to obtain greater resolution of the rare variant signals within significant aggregate sets; this information can provide an objective measure to prioritize variants for follow-up experimental studies and insight into the biological pathways involved.
Collapse
Affiliation(s)
- Rachel Z Blumhagen
- Center for Genes, Environment and Health, National Jewish Health, Denver, Colorado, USA,
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, Colorado, USA,
| | - David A Schwartz
- School of Medicine, University of Colorado, Aurora, Colorado, USA
| | - Carl D Langefeld
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
- Comprehensive Cancer Center, Wake Forest Baptist Medical Center, Winston-Salem, North Carolina, USA
- Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Tasha E Fingerlin
- Center for Genes, Environment and Health, National Jewish Health, Denver, Colorado, USA
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, Colorado, USA
- School of Medicine, University of Colorado, Aurora, Colorado, USA
| |
Collapse
|
3
|
Marceau West R, Lu W, Rotroff DM, Kuenemann MA, Chang SM, Wu MC, Wagner MJ, Buse JB, Motsinger-Reif AA, Fourches D, Tzeng JY. Identifying individual risk rare variants using protein structure guided local tests (POINT). PLoS Comput Biol 2019; 15:e1006722. [PMID: 30779729 PMCID: PMC6396946 DOI: 10.1371/journal.pcbi.1006722] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 03/01/2019] [Accepted: 12/17/2018] [Indexed: 01/08/2023] Open
Abstract
Rare variants are of increasing interest to genetic association studies because of their etiological contributions to human complex diseases. Due to the rarity of the mutant events, rare variants are routinely analyzed on an aggregate level. While aggregation analyses improve the detection of global-level signal, they are not able to pinpoint causal variants within a variant set. To perform inference on a localized level, additional information, e.g., biological annotation, is often needed to boost the information content of a rare variant. Following the observation that important variants are likely to cluster together on functional domains, we propose a protein structure guided local test (POINT) to provide variant-specific association information using structure-guided aggregation of signal. Constructed under a kernel machine framework, POINT performs local association testing by borrowing information from neighboring variants in the 3-dimensional protein space in a data-adaptive fashion. Besides merely providing a list of promising variants, POINT assigns each variant a p-value to permit variant ranking and prioritization. We assess the selection performance of POINT using simulations and illustrate how it can be used to prioritize individual rare variants in PCSK9, ANGPTL4 and CETP in the Action to Control Cardiovascular Risk in Diabetes (ACCORD) clinical trial data.
Collapse
Affiliation(s)
- Rachel Marceau West
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Daniel M. Rotroff
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Melaine A. Kuenemann
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Sheng-Mao Chang
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
| | - Michael C. Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Michael J. Wagner
- Center for Pharmacogenomics and Individualized Therapy, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - John B. Buse
- Department of Medicine, University of North Carolina School of Medicine, Chapel Hill, North Carolina, United States of America
| | - Alison A. Motsinger-Reif
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Denis Fourches
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Jung-Ying Tzeng
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
4
|
Zhu B, Mirabello L, Chatterjee N. A subregion-based burden test for simultaneous identification of susceptibility loci and subregions within. Genet Epidemiol 2018; 42:673-683. [PMID: 29931698 PMCID: PMC6185783 DOI: 10.1002/gepi.22134] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 04/14/2018] [Accepted: 05/04/2018] [Indexed: 01/08/2023]
Abstract
In rare variant association studies, aggregating rare and/or low frequency variants, may increase statistical power for detection of the underlying susceptibility gene or region. However, it is unclear which variants, or class of them, in a gene contribute most to the association. We proposed a subregion-based burden test (REBET) to simultaneously select susceptibility genes and identify important underlying subregions. The subregions are predefined by shared common biologic characteristics, such as the protein domain or functional impact. Based on a subset-based approach considering local correlations between combinations of test statistics of subregions, REBET is able to properly control the type I error rate while adjusting for multiple comparisons in a computationally efficient manner. Simulation studies show that REBET can achieve power competitive to alternative methods when rare variants cluster within subregions. In two case studies, REBET is able to identify known disease susceptibility genes, and more importantly pinpoint the unreported most susceptible subregions, which represent protein domains essential for gene function. R package REBET is available at https://dceg.cancer.gov/tools/analysis/rebet.
Collapse
Affiliation(s)
- Bin Zhu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Lisa Mirabello
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD 21205, USA
| |
Collapse
|
5
|
Abstract
In human genome research, genetic association studies of rare variants have been widely studied since the advent of high-throughput DNA sequencing platforms. However, detection of outcome-related rare variants still remains a statistically challenging problem because the number of observed genetic mutations is extremely rare. Recently, a power set-based statistical selection procedure has been proposed to locate both risk and protective rare variants within the outcome-related genes or genetic regions. Although it can perform an individual selection of rare variants, the procedure has a limitation that it cannot measure the certainty of selected rare variants. In this article, we propose a selection probability of individual rare variants, where selection frequencies of rare variants are computed based on bootstrap resampling. Therefore, it can quantify the certainty of both selected and unselected rare variants. Also, a new selection approach using a threshold of selection probability is introduced and compared with some existing selection procedures from extensive simulation studies and real sequencing data analysis. We have demonstrated that the proposed approach outperforms the existing methods in terms of a selection power.
Collapse
Affiliation(s)
- Gira Lee
- Department of Statistics, Pusan National University , Busan, Korea
| | - Hokeun Sun
- Department of Statistics, Pusan National University , Busan, Korea
| |
Collapse
|
6
|
Kim S, Lee K, Sun H. Statistical selection strategy for risk and protective rare variants associated with complex traits. J Comput Biol 2015; 22:1034-43. [PMID: 26469994 DOI: 10.1089/cmb.2015.0091] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
In genetic association studies with deep sequencing data, it is a challenging statistical problem to precisely locate rare variants associated with complex diseases or traits due to the limited number of observed genetic mutations. In particular, both risk and protective rare variants can be present in the same gene or genetic region. There currently exist very few statistical methods to separate casual rare variants from noncausal variants within a disease/trait-related gene or a genetic region, while there are relatively many statistical tests to detect a phenotypic association of a group of rare variants such as a gene or a genetic region. In this article, we propose a new statistical selection strategy that is able to locate causal rare variants within the disease/trait-related gene or a genetic region. The proposed procedure is to linearly combine potential risk and protective variants in order to find the optimal combination of rare variants that can have the strongest association signal. It is also computationally very efficient since the procedure is based on forward selection. In simulation studies we demonstrate that the selection performance of the proposed procedure is more powerful than other existing methods when both risk and protective variants are present. We also applied it to the real sequencing data on the ANGPTL gene family from the Dallas Heart Study.
Collapse
Affiliation(s)
- Sera Kim
- Department of Statistics, Pusan National University , Busan, Korea
| | - Kyeongjun Lee
- Department of Statistics, Pusan National University , Busan, Korea
| | - Hokeun Sun
- Department of Statistics, Pusan National University , Busan, Korea
| |
Collapse
|
7
|
Lee W, Lee D, Pawitan Y. Likelihood ratio and score burden tests for detecting disease-associated rare variants. Stat Appl Genet Mol Biol 2015; 14:481-95. [DOI: 10.1515/sagmb-2014-0089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractThis paper presents two simple rare variant (RV) burden tests based on the likelihood ratio test (LRT) and score statistics. LRT is one of the commonly used tests in practical data analysis, and we show here that there is no reason to ignore it in testing RV associations. With the Bartlett correction, we have numerically shown that the LRT-based test can have a reliable distribution. Our simulation study indicates that if the non-null variants are as common as the null variants, then the LRT and score statistics have comparable performance to the C-alpha test, and if the former is rarer than the null variants, then they outperform the C-alpha test.
Collapse
|