1
|
Chien LC. Testing for association between ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods. Int J Biostat 2023; 0:ijb-2022-0123. [PMID: 37743670 DOI: 10.1515/ijb-2022-0123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 07/28/2023] [Indexed: 09/26/2023]
Abstract
In genome-wide association studies (GWAS), logistic regression is one of the most popular analytics methods for binary traits. Multinomial regression is an extension of binary logistic regression that allows for multiple categories. However, many GWAS methods have been limited application to binary traits. These methods have improperly often been used to account for ordinal traits, which causes inappropriate type I error rates and poor statistical power. Owing to the lack of analysis methods, GWAS of ordinal traits has been known to be problematic and gaining attention. In this paper, we develop a general framework for identifying ordinal traits associated with genetic variants in pedigree-structured samples by collapsing and kernel methods. We use the local odds ratios GEE technology to account for complicated correlation structures between family members and ordered categorical traits. We use the retrospective idea to treat the genetic markers as random variables for calculating genetic correlations among markers. The proposed genetic association method can accommodate ordinal traits and allow for the covariate adjustment. We conduct simulation studies to compare the proposed tests with the existing models for analyzing the ordered categorical data under various configurations. We illustrate application of the proposed tests by simultaneously analyzing a family study and a cross-sectional study from the Genetic Analysis Workshop 19 (GAW19) data.
Collapse
Affiliation(s)
- Li-Chu Chien
- Center for Fundamental Science, Kaohsiung Medical University, Kaohsiung, Taiwan, ROC
| |
Collapse
|
2
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Rare variant association on unrelated individuals in case-control studies using aggregation tests: existing methods and current limitations. Brief Bioinform 2023; 24:bbad412. [PMID: 37974506 DOI: 10.1093/bib/bbad412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 10/14/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Over the past years, progress made in next-generation sequencing technologies and bioinformatics have sparked a surge in association studies. Especially, genome-wide association studies (GWASs) have demonstrated their effectiveness in identifying disease associations with common genetic variants. Yet, rare variants can contribute to additional disease risk or trait heterogeneity. Because GWASs are underpowered for detecting association with such variants, numerous statistical methods have been recently proposed. Aggregation tests collapse multiple rare variants within a genetic region (e.g. gene, gene set, genomic loci) to test for association. An increasing number of studies using such methods successfully identified trait-associated rare variants and led to a better understanding of the underlying disease mechanism. In this review, we compare existing aggregation tests, their statistical features and scope of application, splitting them into the five classical classes: burden, adaptive burden, variance-component, omnibus and other. Finally, we describe some limitations of current aggregation tests, highlighting potential direction for further investigations.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- WELBIO department, WEL Research Institute, avenue Pasteur, 6, 1300 Wavre, Belgium
| |
Collapse
|
3
|
Su J, Yuan J, Xu L, Xing S, Sun M, Yao Y, Ma Y, Chen F, Jiang L, Li K, Yu X, Xue Z, Zhang Y, Fan D, Zhang J, Liu H, Liu X, Zhang G, Wang H, Zhou M, Lyu F, An G, Yu X, Xue Y, Yang J, Qu J. Sequencing of 19,219 exomes identifies a low-frequency variant in FKBP5 promoter predisposing to high myopia in a Han Chinese population. Cell Rep 2023; 42:112510. [PMID: 37171956 DOI: 10.1016/j.celrep.2023.112510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 12/13/2022] [Accepted: 04/28/2023] [Indexed: 05/14/2023] Open
Abstract
High myopia (HM) is one of the leading causes of visual impairment and blindness worldwide. Here, we report a whole-exome sequencing (WES) study in 9,613 HM cases and 9,606 controls of Han Chinese ancestry to pinpoint HM-associated risk variants. Single-variant association analysis identified three newly identified -genetic loci associated with HM, including an East Asian ancestry-specific low-frequency variant (rs533280354) in FKBP5. Multi-ancestry meta-analysis with WES data of 2,696 HM cases and 7,186 controls of European ancestry from the UK Biobank discerned a newly identified European ancestry-specific rare variant in FOLH1. Functional experiments revealed a mechanism whereby a single G-to-A transition at rs533280354 disrupted the binding of transcription activator KLF15 to the promoter of FKBP5, resulting in decreased transcription of FKBP5. Furthermore, burden tests showed a significant excess of rare protein-truncating variants among HM cases involved in retinal blood vessel morphogenesis and neurotransmitter transport.
Collapse
Affiliation(s)
- Jianzhong Su
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Wenzhou 325101, Zhejiang, China; Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325011, China.
| | - Jian Yuan
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Liangde Xu
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Shilai Xing
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; Institute of PSI Genomics, Wenzhou 325024, China
| | - Mengru Sun
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100190, China
| | - Yinghao Yao
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Wenzhou 325101, Zhejiang, China; Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325011, China
| | - Yunlong Ma
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Fukun Chen
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Longda Jiang
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310030, China
| | - Kai Li
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325011, China
| | - Xiangyi Yu
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Zhengbo Xue
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Yaru Zhang
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Dandan Fan
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Ji Zhang
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Hui Liu
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Xinting Liu
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Guosi Zhang
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Hong Wang
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Meng Zhou
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Fan Lyu
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Wenzhou 325101, Zhejiang, China
| | - Gang An
- Institute of PSI Genomics, Wenzhou 325024, China
| | - Xiaoguang Yu
- Institute of PSI Genomics, Wenzhou 325024, China
| | - Yuanchao Xue
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100190, China.
| | - Jian Yang
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310030, China; Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang 310024, China.
| | - Jia Qu
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Wenzhou 325101, Zhejiang, China; Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325011, China.
| |
Collapse
|
4
|
Menard GN, Eastmond PJ. Burden tests can be used to map causal genes for a simple metabolic trait in an exome-sequenced polyploid mutant population. Plant Biotechnol J 2022; 20:1850-1852. [PMID: 35810345 PMCID: PMC9491453 DOI: 10.1111/pbi.13890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 06/30/2022] [Indexed: 06/15/2023]
|
5
|
Sun R, Zhu L, Li Y, Yasui Y, Robison L. Inference for set-based effects in genetic association studies with interval-censored outcomes. Biometrics 2022. [PMID: 35165890 DOI: 10.1111/biom.13636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 01/24/2022] [Accepted: 02/07/2022] [Indexed: 11/28/2022]
Abstract
The rapid acceleration of genetic data collection in biomedical settings has recently resulted in the rise of genetic compendiums filled with rich longitudinal disease data. One common feature of these datasets is their plethora of interval-censored outcomes. However, very few tools are available for the analysis of genetic datasets with interval-censored outcomes, and in particular, there is a lack of methodology available for set-based inference. Set-based inference is used to associate a gene, biological pathway, or other genetic construct with outcomes and is one of the most popular strategies in genetics research. This work develops three such tests for interval-censored settings beginning with a variance components test for interval-censored outcomes, the interval censored sequence kernel association test (ICSKAT). We also provide the interval-censored version of the Burden test, and then we integrate ICSKAT and Burden to construct the interval censored sequence kernel association test - optimal (ICSKATO) combination. These tests unlock set-based analysis of interval-censored datasets with analogs of three highly popular set-based tools commonly applied to continuous and binary outcomes. Simulation studies illustrate the advantages of the developed methods over ad-hoc alternatives, including protection of the type I error rate at very low levels and increased power. The proposed approaches are applied to the investigation that motivated this study, an examination of the genes associated with bone mineral density deficiency and fracture risk. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Ryan Sun
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA
| | - Liang Zhu
- Division of Clinical and Translational Sciences, Department of Internal Medicine, University of Texas Health Science Center at Houston, Houston, Texas, 77030, USA
| | - Yimei Li
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee, 38105, USA
| | - Yutaka Yasui
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, Tennessee, 38105, USA
| | - Leslie Robison
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, Tennessee, 38105, USA
| |
Collapse
|
6
|
Fan Q, Sun S, Li YJ. Precisely modeling zero-inflated count phenotype for rare variants. Genet Epidemiol 2022; 46:73-86. [PMID: 34779034 PMCID: PMC9615426 DOI: 10.1002/gepi.22438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 08/12/2021] [Accepted: 10/11/2021] [Indexed: 02/03/2023]
Abstract
Count data with excessive zeros are increasingly ubiquitous in genetic association studies, such as neuritic plaques in brain pathology for Alzheimer's disease. Here, we developed gene-based association tests to model such data by a mixture of two distributions, one for the structural zeros contributed by the Binomial distribution, and the other for the counts from the Poisson distribution. We derived the score statistics of the corresponding parameter of the rare variants in the zero-inflated Poisson regression model, and then constructed burden (ZIP-b) and kernel (ZIP-k) tests for the association tests. We evaluated omnibus tests that combined both ZIP-b and ZIP-k tests. Through simulated sequence data, we illustrated the potential power gain of our proposed method over a two-stage method that analyzes binary and non-zero continuous data separately for both burden and kernel tests. The ZIP burden test outperformed the kernel test as expected in all scenarios except for the scenario of variants with a mixture of directions in the genetic effects. We further demonstrated its applications to analyses of the neuritic plaque data in the ROSMAP cohort. We expect our proposed test to be useful in practice as more powerful than or complementary to the two-stage method.
Collapse
Affiliation(s)
- Qiao Fan
- Duke-NUS Medical School, Centre for Quantitative Medicine, National University of Singapore, Singapore, Singapore
| | - Shuming Sun
- Duke Molecular Physiology Institute, Duke University School of Medicine, Durham, North Carolina, USA
| | - Yi-Ju Li
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, North Carolina, USA
| |
Collapse
|
7
|
Clarelli F, Barizzone N, Mangano E, Zuccalà M, Basagni C, Anand S, Sorosina M, Mascia E, Santoro S, Guerini FR, Virgilio E, Gallo A, Pizzino A, Comi C, Martinelli V, Comi G, De Bellis G, Leone M, Filippi M, Esposito F, Bordoni R, Martinelli Boneschi F, D'Alfonso S. Contribution of Rare and Low-Frequency Variants to Multiple Sclerosis Susceptibility in the Italian Continental Population. Front Genet 2022; 12:800262. [PMID: 35047017 PMCID: PMC8762330 DOI: 10.3389/fgene.2021.800262] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 11/17/2021] [Indexed: 12/15/2022] Open
Abstract
Genome-wide association studies identified over 200 risk loci for multiple sclerosis (MS) focusing on common variants, which account for about 50% of disease heritability. The goal of this study was to investigate whether low-frequency and rare functional variants, located in MS-established associated loci, may contribute to disease risk in a relatively homogeneous population, testing their cumulative effect (burden) with gene-wise tests. We sequenced 98 genes in 588 Italian patients with MS and 408 matched healthy controls (HCs). Variants were selected using different filtering criteria based on allelic frequency and in silico functional impacts. Genes showing a significant burden (n = 17) were sequenced in an independent cohort of 504 MS and 504 HC. The highest signal in both cohorts was observed for the disruptive variants (stop-gain, stop-loss, or splicing variants) located in EFCAB13, a gene coding for a protein of an unknown function (p < 10-4). Among these variants, the minor allele of a stop-gain variant showed a significantly higher frequency in MS versus HC in both sequenced cohorts (p = 0.0093 and p = 0.025), confirmed by a meta-analysis on a third independent cohort of 1298 MS and 1430 HC (p = 0.001) assayed with an SNP array. Real-time PCR on 14 heterozygous individuals for this variant did not evidence the presence of the stop-gain allele, suggesting a transcript degradation by non-sense mediated decay, supported by the evidence that the carriers of the stop-gain variant had a lower expression of this gene (p = 0.0184). In conclusion, we identified a novel low-frequency functional variant associated with MS susceptibility, suggesting the possible role of rare/low-frequency variants in MS as reported for other complex diseases.
Collapse
Affiliation(s)
- Ferdinando Clarelli
- Laboratory of Human Genetics of Neurological Disorders, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Nadia Barizzone
- Department of Health Sciences, UPO, University of Eastern Piedmont, and CAAD (Center for Translational Research on Autoimmune and Allergic Disease), Novara, Italy
| | - Eleonora Mangano
- Institute for Biomedical Technologies, National Research Council of Italy, Segrate, Italy
| | - Miriam Zuccalà
- Department of Health Sciences, UPO, University of Eastern Piedmont, and CAAD (Center for Translational Research on Autoimmune and Allergic Disease), Novara, Italy
| | - Chiara Basagni
- Department of Health Sciences, UPO, University of Eastern Piedmont, and CAAD (Center for Translational Research on Autoimmune and Allergic Disease), Novara, Italy
| | - Santosh Anand
- Department of Informatics, Systems and Communications (DISCo), University of Milano-Bicocca, Milan, Italy
| | - Melissa Sorosina
- Laboratory of Human Genetics of Neurological Disorders, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Elisabetta Mascia
- Laboratory of Human Genetics of Neurological Disorders, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Silvia Santoro
- Laboratory of Human Genetics of Neurological Disorders, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | | | | | | | - Eleonora Virgilio
- Department of Translational Medicine, Section of Neurology and IRCAD, UNIUPO, Novara, Italy
| | - Antonio Gallo
- MS Center, I Division of Neurology, Department of Advanced Medical and Surgical Sciences (DAMSS), University of Campania "Luigi Vanvitelli", Naples, Italy
| | - Alessandro Pizzino
- Department of Health Sciences, UPO, University of Eastern Piedmont, and CAAD (Center for Translational Research on Autoimmune and Allergic Disease), Novara, Italy
| | - Cristoforo Comi
- Department of Translational Medicine, Section of Neurology and IRCAD, UNIUPO, Novara, Italy
| | - Vittorio Martinelli
- Neurology Unit and Neurorehabilitation Unit, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | | | - Gianluca De Bellis
- Institute for Biomedical Technologies, National Research Council of Italy, Segrate, Italy
| | - Maurizio Leone
- Neurology Unit, Fondazione IRCCS Casa Sollievo Della Sofferenza, San Giovanni Rotondo, Italy
| | - Massimo Filippi
- Neurology Unit and Neurorehabilitation Unit, IRCCS San Raffaele Scientific Institute, Milan, Italy.,Vita-Salute San Raffaele University, Milan, Italy.,Neuroimaging Research Unit, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy.,Neurophysiology Service, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Federica Esposito
- Laboratory of Human Genetics of Neurological Disorders, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy.,Neurology Unit and Neurorehabilitation Unit, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Roberta Bordoni
- Institute for Biomedical Technologies, National Research Council of Italy, Segrate, Italy
| | - Filippo Martinelli Boneschi
- Department of Pathophysiology and Transplantation (DEPT), Dino Ferrari Centre, Neuroscience Section, University of Milan, Milan, Italy.,Neurology Unit, MS Centre, Foundation IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy
| | - Sandra D'Alfonso
- Department of Health Sciences, UPO, University of Eastern Piedmont, and CAAD (Center for Translational Research on Autoimmune and Allergic Disease), Novara, Italy
| |
Collapse
|
8
|
Malik R, Beaufort N, Frerich S, Gesierich B, Georgakis MK, Rannikmäe K, Ferguson AC, Haffner C, Traylor M, Ehrmann M, Sudlow CLM, Dichgans M. Whole-exome sequencing reveals a role of HTRA1 and EGFL8 in brain white matter hyperintensities. Brain 2021; 144:2670-2682. [PMID: 34626176 PMCID: PMC8557338 DOI: 10.1093/brain/awab253] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/01/2021] [Accepted: 06/19/2021] [Indexed: 11/13/2022] Open
Abstract
White matter hyperintensities (WMH) are among the most common radiological abnormalities in the ageing population and an established risk factor for stroke and dementia. While common variant association studies have revealed multiple genetic loci with an influence on their volume, the contribution of rare variants to the WMH burden in the general population remains largely unexplored. We conducted a comprehensive analysis of this burden in the UK Biobank using publicly available whole-exome sequencing data (n up to 17 830) and found a splice-site variant in GBE1, encoding 1,4-alpha-glucan branching enzyme 1, to be associated with lower white matter burden on an exome-wide level [c.691+2T>C, β = -0.74, standard error (SE) = 0.13, P = 9.7 × 10-9]. Applying whole-exome gene-based burden tests, we found damaging missense and loss-of-function variants in HTRA1 (frequency of 1 in 275 in the UK Biobank population) to associate with an increased WMH volume (P = 5.5 × 10-6, false discovery rate = 0.04). HTRA1 encodes a secreted serine protease implicated in familial forms of small vessel disease. Domain-specific burden tests revealed that the association with WMH volume was restricted to rare variants in the protease domain (amino acids 204-364; β = 0.79, SE = 0.14, P = 9.4 × 10-8). The frequency of such variants in the UK Biobank population was 1 in 450. The WMH volume was brought forward by ∼11 years in carriers of a rare protease domain variant. A comparison with the effect size of established risk factors for WMH burden revealed that the presence of a rare variant in the HTRA1 protease domain corresponded to a larger effect than meeting the criteria for hypertension (β = 0.26, SE = 0.02, P = 2.9 × 10-59) or being in the upper 99.8% percentile of the distribution of a polygenic risk score based on common genetic variants (β = 0.44, SE = 0.14, P = 0.002). In biochemical experiments, most (6/9) of the identified protease domain variants resulted in markedly reduced protease activity. We further found EGFL8, which showed suggestive evidence for association with WMH volume (P = 1.5 × 10-4, false discovery rate = 0.22) in gene burden tests, to be a direct substrate of HTRA1 and to be preferentially expressed in cerebral arterioles and arteries. In a phenome-wide association study mapping ICD-10 diagnoses to 741 standardized Phecodes, rare variants in the HTRA1 protease domain were associated with multiple neurological and non-neurological conditions including migraine with aura (odds ratio = 12.24, 95%CI: 2.54-35.25; P = 8.3 × 10-5]. Collectively, these findings highlight an important role of rare genetic variation and the HTRA1 protease in determining WMH burden in the general population.
Collapse
Affiliation(s)
- Rainer Malik
- Institute for Stroke and Dementia Research (ISD), University Hospital, LMU Munich, 81377 Munich, Germany
| | - Nathalie Beaufort
- Institute for Stroke and Dementia Research (ISD), University Hospital, LMU Munich, 81377 Munich, Germany
| | - Simon Frerich
- Institute for Stroke and Dementia Research (ISD), University Hospital, LMU Munich, 81377 Munich, Germany
| | - Benno Gesierich
- Institute for Stroke and Dementia Research (ISD), University Hospital, LMU Munich, 81377 Munich, Germany
| | - Marios K Georgakis
- Institute for Stroke and Dementia Research (ISD), University Hospital, LMU Munich, 81377 Munich, Germany
| | - Kristiina Rannikmäe
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh EH16 4TL, UK
| | - Amy C Ferguson
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh EH16 4TL, UK
| | - Christof Haffner
- Institute for Stroke and Dementia Research (ISD), University Hospital, LMU Munich, 81377 Munich, Germany
| | - Matthew Traylor
- Clinical Pharmacology, William Harvey Research Institute, Queen Mary University of London, London EC1M 6BQ, UK
- The Barts Heart Centre and NIHR Barts Biomedical Research Centre - Barts Health NHS Trust, The William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Michael Ehrmann
- Center of Medical Biotechnology, Faculty of Biology, University Duisburg-Essen, Essen 45141, Germany
- School of Biosciences, Cardiff University, Cardiff CF10 3AX, UK
| | - Cathie L M Sudlow
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh EH16 4TL, UK
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh EH16 4TL, UK
- Health Data Research UK Scotland, University of Edinburgh, Edinburgh EH16 4TL, UK
| | - Martin Dichgans
- Institute for Stroke and Dementia Research (ISD), University Hospital, LMU Munich, 81377 Munich, Germany
- Munich Cluster for Systems Neurology, Munich 81377, Germany
- German Center for Neurodegenerative Diseases (DZNE), Munich 81377, Germany
| |
Collapse
|
9
|
Zhu B, Mirabello L, Chatterjee N. A subregion-based burden test for simultaneous identification of susceptibility loci and subregions within. Genet Epidemiol 2018; 42:673-683. [PMID: 29931698 PMCID: PMC6185783 DOI: 10.1002/gepi.22134] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 04/14/2018] [Accepted: 05/04/2018] [Indexed: 01/08/2023]
Abstract
In rare variant association studies, aggregating rare and/or low frequency variants, may increase statistical power for detection of the underlying susceptibility gene or region. However, it is unclear which variants, or class of them, in a gene contribute most to the association. We proposed a subregion-based burden test (REBET) to simultaneously select susceptibility genes and identify important underlying subregions. The subregions are predefined by shared common biologic characteristics, such as the protein domain or functional impact. Based on a subset-based approach considering local correlations between combinations of test statistics of subregions, REBET is able to properly control the type I error rate while adjusting for multiple comparisons in a computationally efficient manner. Simulation studies show that REBET can achieve power competitive to alternative methods when rare variants cluster within subregions. In two case studies, REBET is able to identify known disease susceptibility genes, and more importantly pinpoint the unreported most susceptible subregions, which represent protein domains essential for gene function. R package REBET is available at https://dceg.cancer.gov/tools/analysis/rebet.
Collapse
Affiliation(s)
- Bin Zhu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Lisa Mirabello
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD 21205, USA
| |
Collapse
|
10
|
Wu X, Guan T, Liu DJ, León Novelo LG, Bandyopadhyay D. ADAPTIVE-WEIGHT BURDEN TEST FOR ASSOCIATIONS BETWEEN QUANTITATIVE TRAITS AND GENOTYPE DATA WITH COMPLEX CORRELATIONS. Ann Appl Stat 2018; 12:1558-1582. [PMID: 30214655 DOI: 10.1214/17-aoas1121] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
High-throughput sequencing has often been used to screen samples from pedigrees or with population structure, producing genotype data with complex correlations rendered from both familial relation and linkage disequilibrium. With such data, it is critical to account for these genotypic correlations when assessing the contribution of variants by gene or pathway. Recognizing the limitations of existing association testing methods, we propose Adaptive-weight Burden Test (ABT), a retrospective, mixed-model test for genetic association of quantitative traits on genotype data with complex correlations. This method makes full use of genotypic correlations across both samples and variants, and adopts "data-driven" weights to improve power. We derive the ABT statistic and its explicit distribution under the null hypothesis, and demonstrate through simulation studies that it is generally more powerful than the fixed-weight burden test and family-based SKAT in various scenarios, controlling for the type I error rate. Further investigation reveals the connection of ABT with kernel tests, as well as the adaptability of its weights to the direction of genetic effects. The application of ABT is illustrated by a whole genome analysis of genes with common and rare variants associated with fasting glucose from the NHLBI "Grand Opportunity" Exome Sequencing Project.
Collapse
Affiliation(s)
- Xiaowei Wu
- Department of Statistics, Virginia Tech, 250 Drillfield Drive, MC0439, Blacksburg, VA 24061, USA
| | - Ting Guan
- Department of Statistics, Virginia Tech, 250 Drillfield Drive, MC0439, Blacksburg, VA 24061, USA
| | - Dajiang J Liu
- Department of Public Health Sciences, Hershey Institute of Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA
| | - Luis G León Novelo
- Department of Biostatistics, School of Public Health, University of Texas Health Science Center, Houston, TX 77030, USA
| | | |
Collapse
|
11
|
Russo A, Di Gaetano C, Cugliari G, Matullo G. Advances in the Genetics of Hypertension: The Effect of Rare Variants. Int J Mol Sci 2018; 19:E688. [PMID: 29495593 PMCID: PMC5877549 DOI: 10.3390/ijms19030688] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 02/19/2018] [Accepted: 02/26/2018] [Indexed: 12/22/2022] Open
Abstract
Worldwide, hypertension still represents a serious health burden with nine million people dying as a consequence of hypertension-related complications. Essential hypertension is a complex trait supported by multifactorial genetic inheritance together with environmental factors. The heritability of blood pressure (BP) is estimated to be 30-50%. A great effort was made to find genetic variants affecting BP levels through Genome-Wide Association Studies (GWAS). This approach relies on the "common disease-common variant" hypothesis and led to the identification of multiple genetic variants which explain, in aggregate, only 2-3% of the genetic variance of hypertension. Part of the missing genetic information could be caused by variants too rare to be detected by GWAS. The use of exome chips and Next-Generation Sequencing facilitated the discovery of causative variants. Here, we report the advances in the detection of novel rare variants, genes, and/or pathways through the most promising approaches, and the recent statistical tests that have emerged to handle rare variants. We also discuss the need to further support rare novel variants with replication studies within larger consortia and with deeper functional studies to better understand how new genes might improve patient care and the stratification of the response to antihypertensive treatments.
Collapse
Affiliation(s)
- Alessia Russo
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| | - Cornelia Di Gaetano
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| | - Giovanni Cugliari
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| | - Giuseppe Matullo
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| |
Collapse
|
12
|
Bourcier R, Le Scouarnec S, Bonnaud S, Karakachoff M, Bourcereau E, Heurtebise-Chrétien S, Menguy C, Dina C, Simonet F, Moles A, Lenoble C, Lindenbaum P, Chatel S, Isidor B, Génin E, Deleuze JF, Schott JJ, Le Marec H, Loirand G, Desal H, Redon R, Desal H, Bourcier R, Daumas-Duport B, Isidor B, Connault J, Lebranchu P, Le Tourneau T, Viarouge MP, Papagiannaki C, Piotin M, Redjem H, Mazighi M, Desilles JP, Naggara O, Trystram D, Edjlali-Goujon M, Rodriguez C, Ben Hassen W, Saleme S, Mounayer C, Levrier O, Aguettaz P, Combaz X, Pasco A, Berthier E, Bintner M, Molho M, Gauthier P, Chivot C, Costalat V, Darganzil C, Bonafé A, Januel AC, Michelozzi C, Cognard C, Bonneville F, Tall P, Darcourt J, Biondi A, Iosif C, Pomero E, Ferre JC, Gauvrit JY, Eugene F, Raoult H, Gentric JC, Ognard J, Anxionnat R, Bracard S, Derelle AL, Tonnelet R, Spelle L, Ikka L, Fahed R, Rouchaud A, Ozanne A, Caroff J, Ben Achour N, Moret J, Chabert E, Berge J, Marnat G, Barreau X, Gariel F, Clarencon F, Aggour M, Ricolfi F, Chavent A, Thouant P, Lebidinsky P, Lemogne B, Herbreteau D, Bibi R, Pierot L, Soize S, Labeyrie MA, Vandendries C, Houdart E, Kazemi A, Leclerc X, Pruvo JP, Gallas S, Velasco S. Rare Coding Variants in ANGPTL6 Are Associated with Familial Forms of Intracranial Aneurysm. Am J Hum Genet 2018; 102:133-141. [PMID: 29304371 DOI: 10.1016/j.ajhg.2017.12.006] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 12/05/2017] [Indexed: 10/18/2022] Open
Abstract
Intracranial aneurysms (IAs) are acquired cerebrovascular abnormalities characterized by localized dilation and wall thinning in intracranial arteries, possibly leading to subarachnoid hemorrhage and severe outcome in case of rupture. Here, we identified one rare nonsense variant (c.1378A>T) in the last exon of ANGPTL6 (Angiopoietin-Like 6)-which encodes a circulating pro-angiogenic factor mainly secreted from the liver-shared by the four tested affected members of a large pedigree with multiple IA-affected case subjects. We showed a 50% reduction of ANGPTL6 serum concentration in individuals heterozygous for the c.1378A>T allele (p.Lys460Ter) compared to relatives homozygous for the normal allele, probably due to the non-secretion of the truncated protein produced by the c.1378A>T transcripts. Sequencing ANGPTL6 in a series of 94 additional index case subjects with familial IA identified three other rare coding variants in five case subjects. Overall, we detected a significant enrichment (p = 0.023) in rare coding variants within this gene among the 95 index case subjects with familial IA, compared to a reference population of 404 individuals with French ancestry. Among the 6 recruited families, 12 out of 13 (92%) individuals carrying IA also carry such variants in ANGPTL6, versus 15 out of 41 (37%) unaffected ones. We observed a higher rate of individuals with a history of high blood pressure among affected versus healthy individuals carrying ANGPTL6 variants, suggesting that ANGPTL6 could trigger cerebrovascular lesions when combined with other risk factors such as hypertension. Altogether, our results indicate that rare coding variants in ANGPTL6 are causally related to familial forms of IA.
Collapse
|
13
|
Grinde KE, Arbet J, Green A, O'Connell M, Valcarcel A, Westra J, Tintle N. Illustrating, Quantifying, and Correcting for Bias in Post-hoc Analysis of Gene-Based Rare Variant Tests of Association. Front Genet 2017; 8:117. [PMID: 28959274 PMCID: PMC5603735 DOI: 10.3389/fgene.2017.00117] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2017] [Accepted: 08/25/2017] [Indexed: 11/13/2022] Open
Abstract
To date, gene-based rare variant testing approaches have focused on aggregating information across sets of variants to maximize statistical power in identifying genes showing significant association with diseases. Beyond identifying genes that are associated with diseases, the identification of causal variant(s) in those genes and estimation of their effect is crucial for planning replication studies and characterizing the genetic architecture of the locus. However, we illustrate that straightforward single-marker association statistics can suffer from substantial bias introduced by conditioning on gene-based test significance, due to the phenomenon often referred to as "winner's curse." We illustrate the ramifications of this bias on variant effect size estimation and variant prioritization/ranking approaches, outline parameters of genetic architecture that affect this bias, and propose a bootstrap resampling method to correct for this bias. We find that our correction method significantly reduces the bias due to winner's curse (average two-fold decrease in bias, p < 2.2 × 10-6) and, consequently, substantially improves mean squared error and variant prioritization/ranking. The method is particularly helpful in adjustment for winner's curse effects when the initial gene-based test has low power and for relatively more common, non-causal variants. Adjustment for winner's curse is recommended for all post-hoc estimation and ranking of variants after a gene-based test. Further work is necessary to continue seeking ways to reduce bias and improve inference in post-hoc analysis of gene-based tests under a wide variety of genetic architectures.
Collapse
Affiliation(s)
- Kelsey E Grinde
- Department of Biostatistics, University of WashingtonSeattle, WA, United States
| | - Jaron Arbet
- Department of Biostatistics, University of MinnesotaMinneapolis, MN, United States
| | - Alden Green
- Department of Statistics, Carnegie Mellon UniversityPittsburgh, PA, United States
| | - Michael O'Connell
- Department of Biostatistics, University of MinnesotaMinneapolis, MN, United States
| | - Alessandra Valcarcel
- Department of Biostatistics and Epidemiology, University of PennsylvaniaPhiladelphia, PA, United States
| | - Jason Westra
- Department of Statistics, Iowa State UniversityAmes, IA, United States.,Department of Mathematics, Statistics, and Computer Science, Dordt CollegeSioux Center, IA, United States
| | - Nathan Tintle
- Department of Mathematics, Statistics, and Computer Science, Dordt CollegeSioux Center, IA, United States
| |
Collapse
|
14
|
Leslie EJ, Carlson JC, Shaffer JR, Buxó CJ, Castilla EE, Christensen K, Deleyiannis FWB, Field LL, Hecht JT, Moreno L, Orioli IM, Padilla C, Vieira AR, Wehby GL, Feingold E, Weinberg SM, Murray JC, Marazita ML. Association studies of low-frequency coding variants in nonsyndromic cleft lip with or without cleft palate. Am J Med Genet A 2017; 173:1531-1538. [PMID: 28425186 DOI: 10.1002/ajmg.a.38210] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Revised: 02/05/2017] [Accepted: 02/15/2017] [Indexed: 11/10/2022]
Abstract
Nonsyndromic cleft lip with or without cleft palate (NSCL/P) is a group of common human birth defects with complex etiology. Although genome-wide association studies have successfully identified a number of risk loci, these loci only account for about 20% of the heritability of orofacial clefts. The "missing" heritability may be found in rare variants, copy number variants, or interactions. In this study, we investigated the role of low-frequency variants genotyped in 1995 cases and 1626 controls on the Illumina HumanCore + Exome chip. We performed two statistical tests, Sequence Kernel Association Test (SKAT) and Combined Multivariate and Collapsing (CMC) method using two minor allele frequency cutoffs (1% and 5%). We found that a burden of low-frequency coding variants in N4BP2, CDSN, PRTG, and AHRR were associated with increased risk of NSCL/P. Low-frequency variants in other genes were associated with decreased risk of NSCL/P. These results demonstrate that low-frequency variants contribute to the genetic etiology of NSCL/P.
Collapse
Affiliation(s)
- Elizabeth J Leslie
- Center for Craniofacial and Dental Genetics, Department of Oral Biology, School of Dental Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Jenna C Carlson
- Center for Craniofacial and Dental Genetics, Department of Oral Biology, School of Dental Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania.,Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - John R Shaffer
- Center for Craniofacial and Dental Genetics, Department of Oral Biology, School of Dental Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania.,Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Carmen J Buxó
- School of Dental Medicine, University of Puerto Rico, San Juan, Puerto Rico
| | - Eduardo E Castilla
- CEMIC: Center for Medical Education and Clinical Research, Buenos Aires, Argentina.,ECLAMC (Latin American Collaborative Study of Congenital Malformations) at INAGEMP (National Institute of Population Medical Genetics), Rio de Janeiro, Brazil.,Laboratory of Congenital Malformation Epidemiology, Oswaldo Cruz Institute, FIOCRUZ, Rio de Janeiro, Brazil
| | - Kaare Christensen
- Department of Epidemiology, Institute of Public Health, University of Southern Denmark, Odense, Denmark
| | - Frederic W B Deleyiannis
- Department of Surgery, Plastic and Reconstructive Surgery, University of Colorado School of Medicine, Denver, Colorado
| | - Leigh L Field
- Department of Medical Genetics, University of British Columbia, Vancouver, Canada
| | - Jacqueline T Hecht
- Department of Pediatrics, McGovern Medical School and School of Dentistry UT Health at Houston, Houston, Texas
| | - Lina Moreno
- Department of Orthodontics, College of Dentistry, University of Iowa, Iowa City, Iowa
| | - Ieda M Orioli
- ECLAMC (Latin American Collaborative Study of Congenital Malformations) at INAGEMP (National Institute of Population Medical Genetics), Rio de Janeiro, Brazil.,Department of Genetics, Institute of Biology, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Carmencita Padilla
- Department of Pediatrics, College of Medicine; and Institute of Human Genetics, National Institutes of Health, University of the Philippines Manila, Manila, The Philippines.,Philippine Genome Center, University of the Philippines System, Manila, The Philippines
| | - Alexandre R Vieira
- Center for Craniofacial and Dental Genetics, Department of Oral Biology, School of Dental Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania.,Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - George L Wehby
- Department of Health Management and Policy, College of Public Health, University of Iowa, Iowa City, Iowa
| | - Eleanor Feingold
- Center for Craniofacial and Dental Genetics, Department of Oral Biology, School of Dental Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania.,Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania.,Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Seth M Weinberg
- Center for Craniofacial and Dental Genetics, Department of Oral Biology, School of Dental Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Jeffrey C Murray
- Department of Pediatrics, Carver College of Medicine, University of Iowa, Iowa City, Iowa
| | - Mary L Marazita
- Center for Craniofacial and Dental Genetics, Department of Oral Biology, School of Dental Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania.,Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania.,Clinical and Translational Science, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
15
|
Abstract
Over the past few years, interest in the identification of rare variants that influence human phenotype has led to the development of many statistical methods for testing for association between sets of rare variants and binary or quantitative traits. Here, I review some of the most important ideas that underlie these methods and the most relevant issues when choosing a method for analysis. In addition to the tests for association, I review crucial issues in performing a rare variant study, from experimental design to interpretation and validation. I also discuss the many challenges of these studies, some of their limitations, and future research directions.
Collapse
Affiliation(s)
- Dan L Nicolae
- Departments of Medicine and Statistics, University of Chicago, Chicago, Illinois 60637;
| |
Collapse
|
16
|
Abstract
For biological and statistical reasons it makes sense to combine information from variants at the level of the gene. One may wish to give more weight to variants which are rare and those that are more likely to affect function. A combined weighting scheme, implemented in the SCOREASSOC program, was applied to whole exome sequence data for 1392 subjects with schizophrenia and 982 with obesity from the UK10K project. Results conformed fairly well with null hypothesis expectations and no individual gene was strongly implicated. However, a number of the higher ranked genes appear plausible candidates as being involved in one or other phenotype and may warrant further investigation. These include MC4R, NLGN2, CRP, DONSON, GTF3A, IL36B, ADCYAP1R1, ARSA, DLG1, SIK2, SLAIN1, UBE2Q2, ZNF507, CRHR1, MUSK, NSF, SNORD115, GDF3 and HIBADH. Some individual variants in these genes have different frequencies between cohorts and could be genotyped in additional subjects. For other genes, there is a general excess of variants at many different sites so attempts at replication would be more difficult. Overall, the weighted burden test provides a convenient method for using sequence data to highlight genes of interest.
Collapse
Affiliation(s)
- David Curtis
- UCL Genetics Institute, UCL, Darwin Building, Gower Street, London, WC1E 6BT, UK
| | | |
Collapse
|
17
|
Dering C, König IR, Ramsey LB, Relling MV, Yang W, Ziegler A. A comprehensive evaluation of collapsing methods using simulated and real data: excellent annotation of functionality and large sample sizes required. Front Genet 2014; 5:323. [PMID: 25309579 PMCID: PMC4164031 DOI: 10.3389/fgene.2014.00323] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 08/28/2014] [Indexed: 01/23/2023] Open
Abstract
The advent of next generation sequencing (NGS) technologies enabled the investigation of the rare variant-common disease hypothesis in unrelated individuals, even on the genome-wide level. Analysis of this hypothesis requires tailored statistical methods as single marker tests fail on rare variants. An entire class of statistical methods collapses rare variants from a genomic region of interest (ROI), thereby aggregating rare variants. In an extensive simulation study using data from the Genetic Analysis Workshop 17 we compared the performance of 15 collapsing methods by means of a variety of pre-defined ROIs regarding minor allele frequency thresholds and functionality. Findings of the simulation study were additionally confirmed by a real data set investigating the association between methotrexate clearance and the SLCO1B1 gene in patients with acute lymphoblastic leukemia. Our analyses showed substantially inflated type I error levels for many of the proposed collapsing methods. Only four approaches yielded valid type I errors in all considered scenarios. None of the statistical tests was able to detect true associations over a substantial proportion of replicates in the simulated data. Detailed annotation of functionality of variants is crucial to detect true associations. These findings were confirmed in the analysis of the real data. Recent theoretical work showed that large power is achieved in gene-based analyses only if large sample sizes are available and a substantial proportion of causing rare variants is present in the gene-based analysis. Many of the investigated statistical approaches use permutation requiring high computational cost. There is a clear need for valid, powerful and fast to calculate test statistics for studies investigating rare variants.
Collapse
Affiliation(s)
- Carmen Dering
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein Lübeck, Germany
| | - Inke R König
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein Lübeck, Germany
| | - Laura B Ramsey
- Pharmaceutical Department, St. Jude Children's Research Hospital Memphis, TN, USA
| | - Mary V Relling
- Pharmaceutical Department, St. Jude Children's Research Hospital Memphis, TN, USA
| | - Wenjian Yang
- Pharmaceutical Department, St. Jude Children's Research Hospital Memphis, TN, USA
| | - Andreas Ziegler
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein Lübeck, Germany ; Zentrum für Klinische Studien, Universität zu Lübeck Lübeck, Germany ; School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal Durban, South Africa
| |
Collapse
|
18
|
Zhang Q, Wang L, Koboldt D, Boreki IB, Province MA. Adjusting family relatedness in data-driven burden test of rare variants. Genet Epidemiol 2014; 38:722-7. [PMID: 25169066 DOI: 10.1002/gepi.21848] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 07/01/2014] [Accepted: 07/16/2014] [Indexed: 11/08/2022]
Abstract
Family data represent a rich resource for detecting association between rare variants (RVs) and human traits. However, most RV association analysis methods developed in recent years are data-driven burden tests which can adaptively learn weights from data but require permutation to evaluate significance, thus are not readily applicable to family data, because random permutation will destroy family structure. Direct application of these methods to family data may result in a significant inflation of false positives. To overcome this issue, we have developed a generalized, weighted sum mixed model (WSMM), and corresponding computational techniques that can incorporate family information into data-driven burden tests, and allow adaptive and efficient permutation test in family data. Using simulated and real datasets, we demonstrate that the WSMM method can be used to appropriately adjust for genetic relatedness among family members and has a good control for the inflation of false positives. We compare WSMM with a nondata-driven, family-based Sequence Kernel Association Test (famSKAT), showing that WSMM has significantly higher power in some cases. WSMM provides a generalized, flexible framework for adapting different data-driven burden tests to analyze data with any family structures, and it can be extended to binary and time-to-onset traits, with or without covariates.
Collapse
Affiliation(s)
- Qunyuan Zhang
- Division of Statistical Genomics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | | | | | | | | |
Collapse
|
19
|
Saad M, Wijsman EM. Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees. Genet Epidemiol 2014; 38:579-90. [PMID: 25132070 DOI: 10.1002/gepi.21844] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 05/24/2014] [Accepted: 06/27/2014] [Indexed: 12/27/2022]
Abstract
In the last two decades, complex traits have become the main focus of genetic studies. The hypothesis that both rare and common variants are associated with complex traits is increasingly being discussed. Family-based association studies using relatively large pedigrees are suitable for both rare and common variant identification. Because of the high cost of sequencing technologies, imputation methods are important for increasing the amount of information at low cost. A recent family-based imputation method, Genotype Imputation Given Inheritance (GIGI), is able to handle large pedigrees and accurately impute rare variants, but does less well for common variants where population-based methods perform better. Here, we propose a flexible approach to combine imputation data from both family- and population-based methods. We also extend the Sequence Kernel Association Test for Rare and Common variants (SKAT-RC), originally proposed for data from unrelated subjects, to family data in order to make use of such imputed data. We call this extension "famSKAT-RC." We compare the performance of famSKAT-RC and several other existing burden and kernel association tests. In simulated pedigree sequence data, our results show an increase of imputation accuracy from use of our combining approach. Also, they show an increase of power of the association tests with this approach over the use of either family- or population-based imputation methods alone, in the context of rare and common variants. Moreover, our results show better performance of famSKAT-RC compared to the other considered tests, in most scenarios investigated here.
Collapse
Affiliation(s)
- Mohamad Saad
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, United States of America; Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | | |
Collapse
|
20
|
Moutsianas L, Morris AP. Methodology for the analysis of rare genetic variation in genome-wide association and re-sequencing studies of complex human traits. Brief Funct Genomics 2014; 13:362-70. [PMID: 24916163 PMCID: PMC4168660 DOI: 10.1093/bfgp/elu012] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Genome-wide association studies have been successful in identifying common variants that impact complex human traits and diseases. However, despite this success, the joint effects of these variants explain only a small proportion of the genetic variance in these phenotypes, leading to speculation that rare genetic variation might account for much of the ‘missing heritability’. Consequently, there has been an exciting period of research and development into the methodology for the analysis of rare genetic variants, typically by considering their joint effects on complex traits within the same functional unit or genomic region. In this review, we describe a general framework for modelling the joint effects of rare genetic variants on complex traits in association studies of unrelated individuals. We summarise a range of widely used association tests that have been developed from this model and provide an overview of the relative performance of these approaches from published simulation studies.
Collapse
|
21
|
Saad M, Wijsman EM. Power of family-based association designs to detect rare variants in large pedigrees using imputed genotypes. Genet Epidemiol 2013; 38:1-9. [PMID: 24243664 DOI: 10.1002/gepi.21776] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2013] [Revised: 09/30/2013] [Accepted: 10/15/2013] [Indexed: 01/09/2023]
Abstract
Recently, the "Common Disease-Multiple Rare Variants" hypothesis has received much attention, especially with current availability of next-generation sequencing. Family-based designs are well suited for discovery of rare variants, with large and carefully selected pedigrees enriching for multiple copies of such variants. However, sequencing a large number of samples is still prohibitive. Here, we evaluate a cost-effective strategy (pseudosequencing) to detect association with rare variants in large pedigrees. This strategy consists of sequencing a small subset of subjects, genotyping the remaining sampled subjects on a set of sparse markers, and imputing the untyped markers in the remaining subjects conditional on the sequenced subjects and pedigree information. We used a recent pedigree imputation method (GIGI), which is able to efficiently handle large pedigrees and accurately impute rare variants. We used burden and kernel association tests, famWS and famSKAT, which both account for family relationships and heterogeneity of allelic effect for famSKAT only. We simulated pedigree sequence data and compared the power of association tests for pseudosequence data, a subset of sequence data used for imputation, and all subjects sequenced. We also compared, within the pseudosequence data, the power of association test using best-guess genotypes and allelic dosages. Our results show that the pseudosequencing strategy considerably improves the power to detect association with rare variants. They also show that the use of allelic dosages results in much higher power than use of best-guess genotypes in these family-based data. Moreover, famSKAT shows greater power than famWS in most of scenarios we considered.
Collapse
Affiliation(s)
- Mohamad Saad
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, United States of America; Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | | |
Collapse
|