1
|
Waring A, Harper A, Salatino S, Kramer C, Neubauer S, Thomson K, Watkins H, Farrall M. Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy. J Med Genet 2021; 58:556-564. [PMID: 32732227 PMCID: PMC8327322 DOI: 10.1136/jmedgenet-2020-106922] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 06/17/2020] [Accepted: 06/20/2020] [Indexed: 12/11/2022]
Abstract
BACKGROUND Although rare missense variants in Mendelian disease genes often cluster in specific regions of proteins, it is unclear how to consider this when evaluating the pathogenicity of a gene or variant. Here we introduce methods for gene association and variant interpretation that use this powerful signal. METHODS We present statistical methods to detect missense variant clustering (BIN-test) combined with burden information (ClusterBurden). We introduce a flexible generalised additive modelling (GAM) framework to identify mutational hotspots using burden and clustering information (hotspot model) and supplemented by in silico predictors (hotspot+ model). The methods were applied to synthetic data and a case-control dataset, comprising 5338 hypertrophic cardiomyopathy patients and 125 748 population reference samples over 34 putative cardiomyopathy genes. RESULTS In simulations, the BIN-test was almost twice as powerful as the Anderson-Darling or Kolmogorov-Smirnov tests; ClusterBurden was computationally faster and more powerful than alternative position-informed methods. For 6/8 sarcomeric genes with strong clustering, Clusterburden showed enhanced power over burden-alone, equivalent to increasing the sample size by 50%. Hotspot+ models that combine burden, clustering and in silico predictors outperform generic pathogenicity predictors and effectively integrate ACMG criteria PM1 and PP3 to yield strong or moderate evidence of pathogenicity for 31.8% of examined variants of uncertain significance. CONCLUSION GAMs represent a unified statistical modelling framework to combine burden, clustering and functional information. Hotspot models can refine maps of regional burden and hotspot+ models can be powerful predictors of variant pathogenicity. The BIN-test is a fast powerful approach to detect missense variant clustering that when combined with burden information (ClusterBurden) may enhance disease-gene discovery.
Collapse
Affiliation(s)
- Adam Waring
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Andrew Harper
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Silvia Salatino
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Christopher Kramer
- Department of Medicine, University of Virginia, Charlottesville, Virginia, USA
| | - Stefan Neubauer
- Radcliffe Department of Medicine, University of Oxford, Oxford, UK
| | - Kate Thomson
- Oxford Medical Genetics Laboratories, Churchill Hospital, Oxford, UK
| | - Hugh Watkins
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- Radcliffe Department of Medicine, University of Oxford, Oxford, UK
| | - Martin Farrall
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- Radcliffe Department of Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
2
|
Marceau West R, Lu W, Rotroff DM, Kuenemann MA, Chang SM, Wu MC, Wagner MJ, Buse JB, Motsinger-Reif AA, Fourches D, Tzeng JY. Identifying individual risk rare variants using protein structure guided local tests (POINT). PLoS Comput Biol 2019; 15:e1006722. [PMID: 30779729 PMCID: PMC6396946 DOI: 10.1371/journal.pcbi.1006722] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 03/01/2019] [Accepted: 12/17/2018] [Indexed: 01/08/2023] Open
Abstract
Rare variants are of increasing interest to genetic association studies because of their etiological contributions to human complex diseases. Due to the rarity of the mutant events, rare variants are routinely analyzed on an aggregate level. While aggregation analyses improve the detection of global-level signal, they are not able to pinpoint causal variants within a variant set. To perform inference on a localized level, additional information, e.g., biological annotation, is often needed to boost the information content of a rare variant. Following the observation that important variants are likely to cluster together on functional domains, we propose a protein structure guided local test (POINT) to provide variant-specific association information using structure-guided aggregation of signal. Constructed under a kernel machine framework, POINT performs local association testing by borrowing information from neighboring variants in the 3-dimensional protein space in a data-adaptive fashion. Besides merely providing a list of promising variants, POINT assigns each variant a p-value to permit variant ranking and prioritization. We assess the selection performance of POINT using simulations and illustrate how it can be used to prioritize individual rare variants in PCSK9, ANGPTL4 and CETP in the Action to Control Cardiovascular Risk in Diabetes (ACCORD) clinical trial data.
Collapse
Affiliation(s)
- Rachel Marceau West
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Daniel M. Rotroff
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Melaine A. Kuenemann
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Sheng-Mao Chang
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
| | - Michael C. Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Michael J. Wagner
- Center for Pharmacogenomics and Individualized Therapy, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - John B. Buse
- Department of Medicine, University of North Carolina School of Medicine, Chapel Hill, North Carolina, United States of America
| | - Alison A. Motsinger-Reif
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Denis Fourches
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Jung-Ying Tzeng
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
3
|
The impact of a fine-scale population stratification on rare variant association test results. PLoS One 2018; 13:e0207677. [PMID: 30521541 PMCID: PMC6283567 DOI: 10.1371/journal.pone.0207677] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 11/05/2018] [Indexed: 12/28/2022] Open
Abstract
Population stratification is a well-known confounding factor in both common and rare variant association analyses. Rare variants tend to be more geographically clustered than common variants, because of their more recent origin. However, it is not yet clear if population stratification at a very fine scale (neighboring administrative regions within a country) would lead to statistical bias in rare variant analyses. As the inclusion of convenience controls from external studies is indeed a common procedure, in order to increase the power to detect genetic associations, this problem is important. We studied through simulation the impact of a fine scale population structure on different rare variant association strategies, assessing type I error and power. We showed that principal component analysis (PCA) based methods of adjustment for population stratification adequately corrected type I error inflation at the largest geographical scales, but not at finest scales. We also showed in our simulations that adding controls obviously increased power, but at a considerably lower level when controls were drawn from another population.
Collapse
|
4
|
Kwon M, Leem S, Yoon J, Park T. GxGrare: gene-gene interaction analysis method for rare variants from high-throughput sequencing data. BMC SYSTEMS BIOLOGY 2018; 12:19. [PMID: 29560826 PMCID: PMC5861485 DOI: 10.1186/s12918-018-0543-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Background With the rapid advancement of array-based genotyping techniques, genome-wide association studies (GWAS) have successfully identified common genetic variants associated with common complex diseases. However, it has been shown that only a small proportion of the genetic etiology of complex diseases could be explained by the genetic factors identified from GWAS. This missing heritability could possibly be explained by gene-gene interaction (epistasis) and rare variants. There has been an exponential growth of gene-gene interaction analysis for common variants in terms of methodological developments and practical applications. Also, the recent advancement of high-throughput sequencing technologies makes it possible to conduct rare variant analysis. However, little progress has been made in gene-gene interaction analysis for rare variants. Results Here, we propose GxGrare which is a new gene-gene interaction method for the rare variants in the framework of the multifactor dimensionality reduction (MDR) analysis. The proposed method consists of three steps; 1) collapsing the rare variants, 2) MDR analysis for the collapsed rare variants, and 3) detect top candidate interaction pairs. GxGrare can be used for the detection of not only gene-gene interactions, but also interactions within a single gene. The proposed method is illustrated with 1080 whole exome sequencing data of the Korean population in order to identify causal gene-gene interaction for rare variants for type 2 diabetes. Conclusion The proposed GxGrare performs well for gene-gene interaction detection with collapsing of rare variants. GxGrare is available at http://bibs.snu.ac.kr/software/gxgrare which contains simulation data and documentation. Supported operating systems include Linux and OS X. Electronic supplementary material The online version of this article (10.1186/s12918-018-0543-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Minseok Kwon
- Department of Biomedical Informatics, Harvard Medical School, Boston, 02115, MA, USA
| | - Sangseob Leem
- Department of Statistics, Seoul National University, Seoul, 08826, South Korea
| | - Joon Yoon
- Interdisciplinary program, Seoul National University, Seoul, 08826, South Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, 08826, South Korea.
| |
Collapse
|
5
|
Radder JE, Zhang Y, Gregory AD, Yu S, Kelly NJ, Leader JK, Kaminski N, Sciurba FC, Shapiro SD. Extreme Trait Whole-Genome Sequencing Identifies PTPRO as a Novel Candidate Gene in Emphysema with Severe Airflow Obstruction. Am J Respir Crit Care Med 2017; 196:159-171. [PMID: 28199135 DOI: 10.1164/rccm.201606-1147oc] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
RATIONALE Genetic association studies in chronic obstructive pulmonary disease have primarily tested for association with common variants, the results of which explain only a portion of disease heritability. Because rare variation is also likely to contribute to susceptibility, we used whole-genome sequencing of subjects with clinically extreme phenotypes to identify genomic regions enriched for rare variation contributing to chronic obstructive pulmonary disease susceptibility. OBJECTIVES To identify regions of rare genetic variation contributing to emphysema with severe airflow obstruction. METHODS We identified heavy smokers that were resistant (n = 65) or susceptible (n = 64) to emphysema with severe airflow obstruction in the Pittsburgh Specialized Center of Clinically Oriented Research cohort. We filtered whole-genome sequencing results to include only rare variants and conducted single variant tests, region-based tests across the genome, gene-based tests, and exome-wide tests. MEASUREMENTS AND MAIN RESULTS We identified several suggestive associations with emphysema with severe airflow obstruction, including a suggestive association of all rare variation in a region within the gene ZNF816 (19q13.41; P = 4.5 × 10-6), and a suggestive association of nonsynonymous coding rare variation in the gene PTPRO (P = 4.0 × 10-5). Association of rs61754411, a rare nonsynonymous variant in PTPRO, with emphysema and obstruction was demonstrated in all non-Hispanic white individuals in the Pittsburgh Specialized Center of Clinically Oriented Research cohort. We found that cells containing this variant have decreased signaling in cellular pathways necessary for survival and proliferation. CONCLUSIONS PTPRO is a novel candidate gene in emphysema with severe airflow obstruction, and rs61754411 is a previously unreported rare variant contributing to emphysema susceptibility. Other suggestive candidate genes, such as ZNF816, are of interest for future studies.
Collapse
Affiliation(s)
- Josiah E Radder
- 1 Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, and
| | - Yingze Zhang
- 1 Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, and
| | - Alyssa D Gregory
- 1 Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, and
| | - Shibing Yu
- 1 Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, and
| | - Neil J Kelly
- 1 Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, and
| | - Joseph K Leader
- 2 Department of Radiology, University of Pittsburgh, Pittsburgh, Pennsylvania; and
| | - Naftali Kaminski
- 3 Section of Pulmonary, Critical Care, and Sleep Medicine, Department of Internal Medicine, Yale University, New Haven, Connecticut
| | - Frank C Sciurba
- 1 Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, and
| | - Steven D Shapiro
- 1 Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, and
| |
Collapse
|
6
|
Persyn E, Karakachoff M, Le Scouarnec S, Le Clézio C, Campion D, Consortium FE, Schott JJ, Redon R, Bellanger L, Dina C. DoEstRare: A statistical test to identify local enrichments in rare genomic variants associated with disease. PLoS One 2017; 12:e0179364. [PMID: 28742119 PMCID: PMC5524342 DOI: 10.1371/journal.pone.0179364] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 05/29/2017] [Indexed: 01/01/2023] Open
Abstract
Next-generation sequencing technologies made it possible to assay the effect of rare variants on complex diseases. As an extension of the "common disease-common variant" paradigm, rare variant studies are necessary to get a more complete insight into the genetic architecture of human traits. Association studies of these rare variations show new challenges in terms of statistical analysis. Due to their low frequency, rare variants must be tested by groups. This approach is then hindered by the fact that an unknown proportion of the variants could be neutral. The risk level of a rare variation may be determined by its impact but also by its position in the protein sequence. More generally, the molecular mechanisms underlying the disease architecture may involve specific protein domains or inter-genic regulatory regions. While a large variety of methods are optimizing functionality weights for each single marker, few evaluate variant position differences between cases and controls. Here, we propose a test called DoEstRare, which aims to simultaneously detect clusters of disease risk variants and global allele frequency differences in genomic regions. This test estimates, for cases and controls, variant position densities in the genetic region by a kernel method, weighted by a function of allele frequencies. We compared DoEstRare with previously published strategies through simulation studies as well as re-analysis of real datasets. Based on simulation under various scenarios, DoEstRare was the sole to consistently show highest performance, in terms of type I error and power both when variants were clustered or not. DoEstRare was also applied to Brugada syndrome and early-onset Alzheimer's disease data and provided complementary results to other existing tests. DoEstRare, by integrating variant position information, gives new opportunities to explain disease susceptibility. DoEstRare is implemented in a user-friendly R package.
Collapse
Affiliation(s)
- Elodie Persyn
- INSERM, CNRS, UNIV Nantes, l’institut du thorax, Nantes, France
| | - Matilde Karakachoff
- INSERM, CNRS, UNIV Nantes, l’institut du thorax, Nantes, France
- CHU Nantes, l’institut du thorax, Nantes, France
| | | | - Camille Le Clézio
- Inserm U1079, Rouen University, Normandy Center for Genomic Medicine and Personalized Medicine, Normandy University, Rouen, France
| | - Dominique Campion
- Inserm U1079, Rouen University, Normandy Center for Genomic Medicine and Personalized Medicine, Normandy University, Rouen, France
| | | | - Jean-Jacques Schott
- INSERM, CNRS, UNIV Nantes, l’institut du thorax, Nantes, France
- CHU Nantes, l’institut du thorax, Nantes, France
| | - Richard Redon
- INSERM, CNRS, UNIV Nantes, l’institut du thorax, Nantes, France
- CHU Nantes, l’institut du thorax, Nantes, France
| | - Lise Bellanger
- Laboratoire de Mathématiques Jean Leray, UMR CNRS 6629, Nantes, France
- * E-mail: (LB); (CD)
| | - Christian Dina
- INSERM, CNRS, UNIV Nantes, l’institut du thorax, Nantes, France
- CHU Nantes, l’institut du thorax, Nantes, France
- * E-mail: (LB); (CD)
| |
Collapse
|
7
|
Lin WY, Liang YC. Conditioning adaptive combination of P-values method to analyze case-parent trios with or without population controls. Sci Rep 2016; 6:28389. [PMID: 27341039 PMCID: PMC4920030 DOI: 10.1038/srep28389] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 06/02/2016] [Indexed: 11/24/2022] Open
Abstract
Detection of rare causal variants can help uncover the etiology of complex diseases. Recruiting case-parent trios is a popular study design in family-based studies. If researchers can obtain data from population controls, utilizing them in trio analyses can improve the power of methods. The transmission disequilibrium test (TDT) is a well-known method to analyze case-parent trio data. It has been extended to rare-variant association testing (abbreviated as "rvTDT"), with the flexibility to incorporate population controls. The rvTDT method is robust to population stratification. However, power loss may occur in the conditioning process. Here we propose a "conditioning adaptive combination of P-values method" (abbreviated as "conADA"), to analyze trios with/without unrelated controls. By first truncating the variants with larger P-values, we decrease the vulnerability of conADA to the inclusion of neutral variants. Moreover, because the test statistic is developed by conditioning on parental genotypes, conADA generates valid statistical inference in the presence of population stratification. With regard to statistical methods for next-generation sequencing data analyses, validity may be hampered by population stratification, whereas power may be affected by the inclusion of neutral variants. We recommend conADA for its robustness to these two factors (population stratification and the inclusion of neutral variants).
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
- Department of Public Health, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Yun-Chieh Liang
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
8
|
Lin WY. Beyond Rare-Variant Association Testing: Pinpointing Rare Causal Variants in Case-Control Sequencing Study. Sci Rep 2016; 6:21824. [PMID: 26903168 PMCID: PMC4763184 DOI: 10.1038/srep21824] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 02/01/2016] [Indexed: 12/31/2022] Open
Abstract
Rare-variant association testing usually requires some method of aggregation. The next important step is to pinpoint individual rare causal variants among a large number of variants within a genetic region. Recently Ionita-Laza et al. propose a backward elimination (BE) procedure that can identify individual causal variants among the many variants in a gene. The BE procedure removes a variant if excluding this variant can lead to a smaller P-value for the BURDEN test (referred to as "BE-BURDEN") or the SKAT test (referred to as "BE-SKAT"). We here use the adaptive combination of P-values (ADA) method to pinpoint causal variants. Unlike most gene-based association tests, the ADA statistic is built upon per-site P-values of individual variants. It is straightforward to select important variants given the optimal P-value truncation threshold found by ADA. We performed comprehensive simulations to compare ADA with BE-SKAT and BE-BURDEN. Ranking these three approaches according to positive predictive values (PPVs), the percentage of truly causal variants among the total selected variants, we found ADA > BE-SKAT > BE-BURDEN across all simulation scenarios. We therefore recommend using ADA to pinpoint plausible rare causal variants in a gene.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
- Department of Public Health, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
9
|
Page CM, Baranzini SE, Mevik BH, Bos SD, Harbo HF, Andreassen BK. Assessing the Power of Exome Chips. PLoS One 2015; 10:e0139642. [PMID: 26437075 PMCID: PMC4593624 DOI: 10.1371/journal.pone.0139642] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 09/14/2015] [Indexed: 12/20/2022] Open
Abstract
Genotyping chips for rare and low-frequent variants have recently gained popularity with the introduction of exome chips, but the utility of these chips remains unclear. These chips were designed using exome sequencing data from mainly American-European individuals, enriched for a narrow set of common diseases. In addition, it is well-known that the statistical power of detecting associations with rare and low-frequent variants is much lower compared to studies exclusively involving common variants. We developed a simulation program adaptable to any exome chip design to empirically evaluate the power of the exome chips. We implemented the main properties of the Illumina HumanExome BeadChip array. The simulated data sets were used to assess the power of exome chip based studies for varying effect sizes and causal variant scenarios. We applied two widely-used statistical approaches for rare and low-frequency variants, which collapse the variants into genetic regions or genes. Under optimal conditions, we found that a sample size between 20,000 to 30,000 individuals were needed in order to detect modest effect sizes (0.5% < PAR > 1%) with 80% power. For small effect sizes (PAR <0.5%), 60,000–100,000 individuals were needed in the presence of non-causal variants. In conclusion, we found that at least tens of thousands of individuals are necessary to detect modest effects under optimal conditions. In addition, when using rare variant chips on cohorts or diseases they were not originally designed for, the identification of associated variants or genes will be even more challenging.
Collapse
Affiliation(s)
- Christian Magnus Page
- Institute of Clinical Medicine, University of Oslo, 0316, Oslo, Norway
- Department of Neurology, Oslo University Hospital, 0424, Oslo, Norway
| | - Sergio E. Baranzini
- Department of Neurology, University of California San Francisco, San Francisco, California, 94158, United States of America
| | - Bjørn-Helge Mevik
- University Center for Information Technology, University of Oslo, 0316, Oslo, Norway
| | - Steffan Daniel Bos
- Institute of Clinical Medicine, University of Oslo, 0316, Oslo, Norway
- Department of Neurology, Oslo University Hospital, 0424, Oslo, Norway
| | - Hanne F. Harbo
- Institute of Clinical Medicine, University of Oslo, 0316, Oslo, Norway
- Department of Neurology, Oslo University Hospital, 0424, Oslo, Norway
| | - Bettina Kulle Andreassen
- Institute of Clinical Medicine, University of Oslo, 0316, Oslo, Norway
- Department of Research, Cancer Registry of Norway, 0304, Oslo, Norway
- * E-mail:
| |
Collapse
|
10
|
Fouladi R, Bessonov K, Van Lishout F, Van Steen K. Model-Based Multifactor Dimensionality Reduction for Rare Variant Association Analysis. Hum Hered 2015. [PMID: 26201701 DOI: 10.1159/000381286] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Genome-wide association studies have revealed a vast amount of common loci associated to human complex diseases. Still, a large proportion of heritability remains unexplained. The extent to which rare genetic variants (RVs) are able to explain a relevant portion of the genetic heritability for complex traits leaves room for several debates and paves the way to the collection of RV databases and the development of novel analytic tools to analyze these. To date, several statistical methods have been proposed to uncover the association of RVs with complex diseases, but none of them is the clear winner in all possible scenarios of study design and assumed underlying disease model. The latter may involve differences in the distributions of effect sizes, proportions of causal variants, and ratios of protective to deleterious variants at distinct regions throughout the genome. Therefore, there is a need for robust scalable methods with acceptable overall performance in terms of power and type I error under various realistic scenarios. In this paper, we propose a novel RV association analysis strategy, which satisfies several of the desired properties that a RV analysis tool should exhibit.
Collapse
Affiliation(s)
- Ramouna Fouladi
- Systems and Modeling Unit, Montefiore Institute, and Bioinformatics and Modeling, GIGA-R, University of Liège, Liège, Belgium
| | | | | | | |
Collapse
|
11
|
Lin WY. Adaptive combination of P-values for family-based association testing with sequence data. PLoS One 2014; 9:e115971. [PMID: 25541952 PMCID: PMC4277421 DOI: 10.1371/journal.pone.0115971] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Accepted: 12/01/2014] [Indexed: 12/24/2022] Open
Abstract
Family-based study design will play a key role in identifying rare causal variants, because rare causal variants can be enriched in families with multiple affected subjects. Furthermore, different from population-based studies, family studies are robust to bias induced by population substructure. It is well known that rare causal variants are difficult to detect from single-locus tests. Therefore, burden tests and non-burden tests have been developed, by combining signals of multiple variants in a chromosomal region or a functional unit. This inevitably incorporates some neutral variants into the test statistics, which can dilute the power of statistical methods. To guard against the noise caused by neutral variants, we here propose an 'adaptive combination of P-values method' (abbreviated as 'ADA'). This method combines per-site P-values of variants that are more likely to be causal. Variants with large P-values (which are more likely to be neutral variants) are discarded from the combined statistic. In addition to performing extensive simulation studies, we applied these tests to the Genetic Analysis Workshop 17 data sets, where real sequence data were generated according to the 1000 Genomes Project. Compared with some existing methods, ADA is more robust to the inclusion of neutral variants. This is a merit especially when dichotomous traits are analyzed. However, there are some limitations for ADA. First, it is more computationally intensive. Second, pedigree structures and founders' sequence data are required for the permutation procedure. Third, unrelated controls cannot be included. We here show that, for family-based studies, the application of ADA is limited to dichotomous trait analyses with full pedigree information.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|