1
|
Family-based gene-environment interaction using sequence kernel association test (FGE-SKAT) for complex quantitative traits. Sci Rep 2021; 11:7431. [PMID: 33795796 PMCID: PMC8016937 DOI: 10.1038/s41598-021-86871-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 03/22/2021] [Indexed: 11/30/2022] Open
Abstract
After the genome-wide association studies (GWAS) era, whole-genome sequencing is highly engaged in identifying the association of complex traits with rare variations. A score-based variance-component test has been proposed to identify common and rare genetic variants associated with complex traits while quickly adjusting for covariates. Such kernel score statistic allows for familial dependencies and adjusts for random confounding effects. However, the etiology of complex traits may involve the effects of genetic and environmental factors and the complex interactions between genes and the environment. Therefore, in this research, a novel method is proposed to detect gene and gene-environment interactions in a complex family-based association study with various correlated structures. We also developed an R function for the Fast Gene-Environment Sequence Kernel Association Test (FGE-SKAT), which is freely available as supplementary material for easy GWAS implementation to unveil such family-based joint effects. Simulation studies confirmed the validity of the new strategy and the superior statistical power. The FGE-SKAT was applied to the whole genome sequence data provided by Genetic Analysis Workshop 18 (GAW18) and discovered concordant and discordant regions compared to the methods without considering gene by environment interactions.
Collapse
|
2
|
Jiang Y, Chiu CY, Yan Q, Chen W, Gorin MB, Conley YP, Lakhal-Chaieb ML, Cook RJ, Amos CI, Wilson AF, Bailey-Wilson JE, McMahon FJ, Vazquez AI, Yuan A, Zhong X, Xiong M, Weeks DE, Fan R. Gene-Based Association Testing of Dichotomous Traits With Generalized Functional Linear Mixed Models Using Extended Pedigrees: Applications to Age-Related Macular Degeneration. J Am Stat Assoc 2020; 116:531-545. [PMID: 34321704 PMCID: PMC8315575 DOI: 10.1080/01621459.2020.1799809] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 07/09/2020] [Accepted: 07/17/2020] [Indexed: 10/23/2022]
Abstract
Genetics plays a role in age-related macular degeneration (AMD), a common cause of blindness in the elderly. There is a need for powerful methods for carrying out region-based association tests between a dichotomous trait like AMD and genetic variants on family data. Here, we apply our new generalized functional linear mixed models (GFLMM) developed to test for gene-based association in a set of AMD families. Using common and rare variants, we observe significant association with two known AMD genes: CFH and ARMS2. Using rare variants, we find suggestive signals in four genes: ASAH1, CLEC6A, TMEM63C, and SGSM1. Intriguingly, ASAH1 is down-regulated in AMD aqueous humor, and ASAH1 deficiency leads to retinal inflammation and increased vulnerability to oxidative stress. These findings were made possible by our GFLMM which model the effect of a major gene as a fixed mean, the polygenic contributions as a random variation, and the correlation of pedigree members by kinship coefficients. Simulations indicate that the GFLMM likelihood ratio tests (LRTs) accurately control the Type I error rates. The LRTs have similar or higher power than existing retrospective kernel and burden statistics. Our GFLMM-based statistics provide a new tool for conducting family-based genetic studies of complex diseases. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
Collapse
Affiliation(s)
- Yingda Jiang
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | - Chi-Yang Chiu
- Division of Biostatistics, Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, TN
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
| | - Qi Yan
- Division of Pulmonary Medicine, Allergy and Immunology, Children’s Hospital of Pittsburgh at The University of Pittsburgh, Pittsburgh, PA
| | - Wei Chen
- Division of Pulmonary Medicine, Allergy and Immunology, Children’s Hospital of Pittsburgh at The University of Pittsburgh, Pittsburgh, PA
| | - Michael B. Gorin
- Department of Ophthalmology, David Geffen School of Medicine, UCLA Stein Eye Institute, Los Angeles, CA
| | - Yvette P. Conley
- Department of Health Promotion and Development, University of Pittsburgh, Pittsburgh, PA
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | | | - Richard J. Cook
- Department of Statistics and Actuarial Science, Waterloo, ON, Canada
| | | | - Alexander F. Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
| | - Joan E. Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
| | - Francis J. McMahon
- Human Genetics Branch and Genetic Basis of Mood and Anxiety Disorders Section, National Institute of Mental Health, NIH, Bethesda, MD
| | - Ana I. Vazquez
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI
| | - Ao Yuan
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC
| | - Xiaogang Zhong
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC
| | - Momiao Xiong
- Human Genetics Center, University of Texas, Houston, TX
| | - Daniel E. Weeks
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | - Ruzong Fan
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC
| |
Collapse
|
3
|
WHEELER NICHOLASR, BENCHEK PENELOPE, KUNKLE BRIANW, HAMILTON-NELSON KARAL, WARFE MIKE, FONDRAN JEREMYR, HAINES JONATHANL, BUSH WILLIAMS. Hadoop and PySpark for reproducibility and scalability of genomic sequencing studies. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020; 25:523-534. [PMID: 31797624 PMCID: PMC6956992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Modern genomic studies are rapidly growing in scale, and the analytical approaches used to analyze genomic data are increasing in complexity. Genomic data management poses logistic and computational challenges, and analyses are increasingly reliant on genomic annotation resources that create their own data management and versioning issues. As a result, genomic datasets are increasingly handled in ways that limit the rigor and reproducibility of many analyses. In this work, we examine the use of the Spark infrastructure for the management, access, and analysis of genomic data in comparison to traditional genomic workflows on typical cluster environments. We validate the framework by reproducing previously published results from the Alzheimer's Disease Sequencing Project. Using the framework and analyses designed using Jupyter notebooks, Spark provides improved workflows, reduces user-driven data partitioning, and enhances the portability and reproducibility of distributed analyses required for large-scale genomic studies.
Collapse
Affiliation(s)
- NICHOLAS R. WHEELER
- Cleveland Institute for Computational Biology, Department of Population and Quantitative Health Sciences, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road Cleveland OH 44106, USA
| | - PENELOPE BENCHEK
- Cleveland Institute for Computational Biology, Department of Population and Quantitative Health Sciences, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road Cleveland OH 44106, USA
| | - BRIAN W. KUNKLE
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, 1501 NW 10th Ave, Miami, FL 33136, USA
| | - KARA L. HAMILTON-NELSON
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, 1501 NW 10th Ave, Miami, FL 33136, USA
| | - MIKE WARFE
- Cleveland Institute for Computational Biology, Center for Advanced Research Computing, University Technology, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road Cleveland OH 44106, USA
| | - JEREMY R. FONDRAN
- Cleveland Institute for Computational Biology, Center for Advanced Research Computing, University Technology, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road Cleveland OH 44106, USA
| | - JONATHAN L. HAINES
- Cleveland Institute for Computational Biology, Department of Population and Quantitative Health Sciences, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road Cleveland OH 44106, USA
| | - WILLIAM S. BUSH
- Cleveland Institute for Computational Biology, Department of Population and Quantitative Health Sciences, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road Cleveland OH 44106, USA
| |
Collapse
|
4
|
Svishcheva GR. A generalized model for combining dependent SNP-level summary statistics and its extensions to statistics of other levels. Sci Rep 2019; 9:5461. [PMID: 30940856 PMCID: PMC6445108 DOI: 10.1038/s41598-019-41827-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 03/06/2019] [Indexed: 11/12/2022] Open
Abstract
Here I propose a fundamentally new flexible model to reveal the association between a trait and a set of genetic variants in a genomic region/gene. This model was developed for the situation when original individual-level phenotype and genotype data are not available, but the researcher possesses the results of statistical analyses conducted on these data (namely, SNP-level summary Z score statistics and SNP-by-SNP correlations). The new model was analytically derived from the classical multiple linear regression model applied for the region-based association analysis of individual-level phenotype and genotype data by using the linear compression of data, where the SNP-by-SNP correlations are among the explanatory variables, and the summary Z score statistics are categorized as the response variables. I analytically show that the regional association analysis methods developed within the framework of the classical multiple linear regression model with additive effects of genetic variants can be reformulated in terms of the new model without the loss of information. The results obtained from the regional association analysis utilizing the classical model and those derived using the proposed model are identical when SNP-by-SNP correlations and SNP-level statistics are estimated from the same genetic data.
Collapse
Affiliation(s)
- Gulnara R Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, 630090, Russia. .,Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russia.
| |
Collapse
|
5
|
Hamvas A, Feng R, Bi Y, Wang F, Bhattacharya S, Mereness J, Kaushal M, Cotten CM, Ballard PL, Mariani TJ. Exome sequencing identifies gene variants and networks associated with extreme respiratory outcomes following preterm birth. BMC Genet 2018; 19:94. [PMID: 30342483 PMCID: PMC6195962 DOI: 10.1186/s12863-018-0679-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 10/01/2018] [Indexed: 12/28/2022] Open
Abstract
Background Previous studies have identified genetic variants associated with bronchopulmonary dysplasia (BPD) in extremely preterm infants. However, findings with genome-wide significance have been rare, and not replicated. We hypothesized that whole exome sequencing (WES) of premature subjects with extremely divergent phenotypic outcomes could facilitate the identification of genetic variants or gene networks contributing disease risk. Results The Prematurity and Respiratory Outcomes Program (PROP) recruited a cohort of > 765 extremely preterm infants for the identification of markers of respiratory morbidity. We completed WES on 146 PROP subjects (85 affected, 61 unaffected) representing extreme phenotypes of early respiratory morbidity. We tested for association between disease status and individual common variants, screened for rare variants exclusive to either affected or unaffected subjects, and tested the combined association of variants across gene loci. Pathway analysis was performed and disease-related expression patterns were assessed. Marginal association with BPD was observed for numerous common and rare variants. We identified 345 genes with variants unique to BPD-affected preterm subjects, and 292 genes with variants unique to our unaffected preterm subjects. Of these unique variants, 28 (19 in the affected cohort and 9 in unaffected cohort) replicate a prior WES study of BPD-associated variants. Pathway analysis of sets of variants, informed by disease-related gene expression, implicated protein kinase A, MAPK and Neuregulin/epidermal growth factor receptor signaling. Conclusions We identified novel genes and associated pathways that may play an important role in susceptibility/resilience for the development of lung disease in preterm infants. Electronic supplementary material The online version of this article (10.1186/s12863-018-0679-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Aaron Hamvas
- Department of Pediatrics, Northwestern University, Chicago, IL, USA. .,Ann and Robert H. Lurie Children's Hospital of Chicago and Northwestern University, Chicago, IL, USA.
| | - Rui Feng
- Department of Biostatistics, University of Pennsylvania, Philadelphia, PA, USA
| | - Yingtao Bi
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Fan Wang
- Department of Biostatistics, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Jared Mereness
- Department of Pediatrics, University of Rochester, Rochester, NY, USA
| | - Madhurima Kaushal
- Center for Biomedical Informatics, Washington University, St. Louis, MO, USA
| | | | - Philip L Ballard
- Department of Pediatrics, University of California, San Francisco, CA, USA
| | - Thomas J Mariani
- Department of Pediatrics, University of Rochester, Rochester, NY, USA. .,Division of Neonatology and Pediatric Molecular and Personalized Medicine Program University of Rochester Medical Center, 601 Elmwood Ave, Box 850, Rochester, NY, 14642, USA.
| | | |
Collapse
|
6
|
Analysis of genetic and nongenetic factors influencing triglycerides-lowering drug effects based on paired observations. BMC Proc 2018; 12:46. [PMID: 30275894 PMCID: PMC6157156 DOI: 10.1186/s12919-018-0153-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Obesity is a risk factor for heart disease, stroke, diabetes, high blood pressure, and other chronic diseases. Some drugs, including fenofibrate, are used to treat obesity or excessive weight by lowering the level of specific triglycerides. However, different groups have different drug sensitivities and, consequently, there are differences in drug effects. In this study, we assessed both genetic and nongenetic factors that influence drug responses and stratified patients into groups based on differential drug effect and sensitivity. Our methodology of investigating genetic factors and nongenetic factors is applicable to studying differential effects of other drugs, such as statins, and provides an approach to the development of personalized medicine.
Collapse
|
7
|
Kirichenko AV, Zorkoltseva IV, Belonogova NM, Axenovich TI. Use of Genotypes of Common Variants for Genome-Wide Regional Association Analysis. RUSS J GENET+ 2018. [DOI: 10.1134/s1022795418010076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
8
|
Abstract
Relatedness within a sample can be of ancient (population stratification) or recent (familial structure) origin, and can either be known (pedigree data) or unknown (cryptic relatedness). All of these forms of familial relatedness have the potential to confound the results of genome-wide association studies. This chapter reviews the major methods available to researchers to adjust for the biases introduced by relatedness and maximize power to detect associations. The advantages and disadvantages of different methods are presented with reference to elements of study design, population characteristics, and computational requirements.
Collapse
Affiliation(s)
- Russell Thomson
- Centre for Research in Mathematics, School of Computing, Engineering and Mathematics, Western Sydney University, Parramatta, Australia.
| | - Rebekah McWhirter
- Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS, Australia
| |
Collapse
|
9
|
Abstract
While genome-wide association studies have been very successful in identifying associations of common genetic variants with many different traits, the rarer frequency spectrum of the genome has not yet been comprehensively explored. Technological developments increasingly lift restrictions to access rare genetic variation. Dense reference panels enable improved genotype imputation for rarer variants in studies using DNA microarrays. Moreover, the decreasing cost of next generation sequencing makes whole exome and genome sequencing increasingly affordable for large samples. Large-scale efforts based on sequencing, such as ExAC, 100,000 Genomes, and TopMed, are likely to significantly advance this field.The main challenge in evaluating complex trait associations of rare variants is statistical power. The choice of population should be considered carefully because allele frequencies and linkage disequilibrium structure differ between populations. Genetically isolated populations can have favorable genomic characteristics for the study of rare variants.One strategy to increase power is to assess the combined effect of multiple rare variants within a region, known as aggregate testing. A range of methods have been developed for this. Model performance depends on the genetic architecture of the region of interest.
Collapse
Affiliation(s)
- Karoline Kuchenbaecker
- Wellcome Trust Sanger Institute, Cambridge, UK. .,University College London, London, UK.
| | - Emil Vincent Rosenbaum Appel
- Novo Nordisk Foundation Center for Basic Metabolic Research, Section for Metabolic Genetics, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
10
|
Wen Y, Burt A, Lu Q. Risk Prediction Modeling on Family-Based Sequencing Data Using a Random Field Method. Genetics 2017; 207:63-73. [PMID: 28679544 PMCID: PMC5586386 DOI: 10.1534/genetics.117.199752] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Accepted: 06/27/2017] [Indexed: 01/08/2023] Open
Abstract
Family-based design is one of the most popular designs in genetic studies and has many unique features for risk-prediction research. It is robust against genetic heterogeneity, and the relatedness among family members can be informative for predicting an individual's risk for disease with polygenic and shared environmental components of risk. Despite these strengths, family-based designs have been used infrequently in current risk-prediction studies, and their related statistical methods have not been well developed. In this article, we developed a generalized random field (GRF) method for family-based risk-prediction modeling on sequencing data. In GRF, subjects' phenotypes are viewed as stochastic realizations of a random field in a space, and a subject's phenotype is predicted by adjacent subjects, where adjacencies between subjects are determined by their genetic and within-family similarities. Different from existing methods that adjust for familial correlations, the GRF uses this information to form surrogates to further improve prediction accuracy. It also uses within-family information to capture predictors (e.g., rare mutations) that are homogeneous in families. Through simulations, we have demonstrated that the GRF method attained better performance than an existing method by considering additional information from family members and accounting for genetic heterogeneity. We further provided practical recommendations for designing family-based risk prediction studies. Finally, we illustrated the GRF method with an application to a whole-genome exome data set from the Michigan State University Twin Registry study.
Collapse
Affiliation(s)
- Yalu Wen
- Institute of Cancer Stem Cell, Dalian Medical University, Liaoning, 116044, China
- Department of Statistics, University of Auckland, 1010, New Zealand
| | - Alexandra Burt
- Department of Psychology, Michigan State University, East Lansing, Michigan 48824
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan 48824
| |
Collapse
|
11
|
Minică CC, Genovese G, Hultman CM, Pool R, Vink JM, Neale MC, Dolan CV, Neale BM. The Weighting is the Hardest Part: On the Behavior of the Likelihood Ratio Test and the Score Test Under a Data-Driven Weighting Scheme in Sequenced Samples. Twin Res Hum Genet 2017; 20:108-118. [PMID: 28238293 PMCID: PMC5357183 DOI: 10.1017/thg.2017.7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Sequence-based association studies are at a critical inflexion point with the increasing availability of exome-sequencing data. A popular test of association is the sequence kernel association test (SKAT). Weights are embedded within SKAT to reflect the hypothesized contribution of the variants to the trait variance. Because the true weights are generally unknown, and so are subject to misspecification, we examined the efficiency of a data-driven weighting scheme. We propose the use of a set of theoretically defensible weighting schemes, of which, we assume, the one that gives the largest test statistic is likely to capture best the allele frequency-functional effect relationship. We show that the use of alternative weights obviates the need to impose arbitrary frequency thresholds. As both the score test and the likelihood ratio test (LRT) may be used in this context, and may differ in power, we characterize the behavior of both tests. The two tests have equal power, if the weights in the set included weights resembling the correct ones. However, if the weights are badly specified, the LRT shows superior power (due to its robustness to misspecification). With this data-driven weighting procedure the LRT detected significant signal in genes located in regions already confirmed as associated with schizophrenia - the PRRC2A (p = 1.020e-06) and the VARS2 (p = 2.383e-06) - in the Swedish schizophrenia case-control cohort of 11,040 individuals with exome-sequencing data. The score test is currently preferred for its computational efficiency and power. Indeed, assuming correct specification, in some circumstances, the score test is the most powerful test. However, LRT has the advantageous properties of being generally more robust and more powerful under weight misspecification. This is an important result given that, arguably, misspecified models are likely to be the rule rather than the exception in weighting-based approaches.
Collapse
Affiliation(s)
- Camelia C. Minică
- Department of Biological Psychology, Vrije Universiteit, Amsterdam
1081 BT, The Netherlands
- The EMGO Institute for Health and Care Research,
Amsterdam 1081 BT, The Netherlands
| | - Giulio Genovese
- The Stanley Center for Psychiatric Research, Broad Institute of the
Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
- The Program in Medical and Population Genetics, Broad Institute of
the Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142,
USA
- Department of Genetics, Harvard Medical School, Cambridge, MA 02115,
USA
| | - Christina M. Hultman
- Department of Medical Epidemiology and Biostatistics, Karolinska
Institute, Stockholm SE-171 77, Sweden
| | - René Pool
- Department of Biological Psychology, Vrije Universiteit, Amsterdam
1081 BT, The Netherlands
- The EMGO Institute for Health and Care Research,
Amsterdam 1081 BT, The Netherlands
| | - Jacqueline M. Vink
- Behavioural Science Institute, Radboud University, Nijmegen, The
Netherlands
| | - Michael C. Neale
- Department of Biological Psychology, Vrije Universiteit, Amsterdam
1081 BT, The Netherlands
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia
Commonwealth University, Richmond, USA
| | - Conor V. Dolan
- Department of Biological Psychology, Vrije Universiteit, Amsterdam
1081 BT, The Netherlands
- The EMGO Institute for Health and Care Research,
Amsterdam 1081 BT, The Netherlands
| | - Benjamin M. Neale
- The Stanley Center for Psychiatric Research, Broad Institute of the
Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
- The Program in Medical and Population Genetics, Broad Institute of
the Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142,
USA
- The Analytical and Translational Genetics Unit, Department of
Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA
02114, USA
| |
Collapse
|
12
|
Zhu H, Wang Z, Wang X, Sha Q. A novel statistical method for rare-variant association studies in general pedigrees. BMC Proc 2016; 10:193-196. [PMID: 27980635 PMCID: PMC5133499 DOI: 10.1186/s12919-016-0029-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Both population-based and family-based designs are commonly used in genetic association studies to identify rare variants that underlie complex diseases. For any type of study design, the statistical power will be improved if rare variants can be enriched in the samples. Family-based designs, with ascertainment based on phenotype, may enrich the sample for causal rare variants and thus can be more powerful than population-based designs. Therefore, it is important to develop family-based statistical methods that can account for ascertainment. In this paper, we develop a novel statistical method for rare-variant association studies in general pedigrees for quantitative traits. This method uses a retrospective view that treats the traits as fixed and the genotypes as random, which allows us to account for complex and undefined ascertainment of families. We then apply the newly developed method to the Genetic Analysis Workshop 19 data set and compare the power of the new method with two other methods for general pedigrees. The results show that the newly proposed method increases power in most of the cases we consider, more than the other two methods.
Collapse
Affiliation(s)
- Huanhuan Zhu
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931 USA
| | - Zhenchuan Wang
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931 USA
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, 1155 Union Circle #311430, Denton, TX 76203-5017 USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931 USA
| |
Collapse
|
13
|
Yang XR, Rotunno M, Xiao Y, Ingvar C, Helgadottir H, Pastorino L, van Doorn R, Bennett H, Graham C, Sampson JN, Malasky M, Vogt A, Zhu B, Bianchi-Scarra G, Bruno W, Queirolo P, Fornarini G, Hansson J, Tuominen R, Burdett L, Hicks B, Hutchinson A, Jones K, Yeager M, Chanock SJ, Landi MT, Höiom V, Olsson H, Gruis N, Ghiorzo P, Tucker MA, Goldstein AM. Multiple rare variants in high-risk pancreatic cancer-related genes may increase risk for pancreatic cancer in a subset of patients with and without germline CDKN2A mutations. Hum Genet 2016; 135:1241-1249. [PMID: 27449771 PMCID: PMC5152573 DOI: 10.1007/s00439-016-1715-1] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Accepted: 07/16/2016] [Indexed: 12/29/2022]
Abstract
The risk of pancreatic cancer (PC) is increased in melanoma-prone families but the causal relationship between germline CDKN2A mutations and PC risk is uncertain, suggesting the existence of non-CDKN2A factors. One genetic possibility involves patients having mutations in multiple high-risk PC-related genes; however, no systematic examination has yet been conducted. We used next-generation sequencing data to examine 24 putative PC-related genes in 43 PC patients with and 23 PC patients without germline CDKN2A mutations and 1001 controls. For each gene and the four pathways in which they occurred, we tested whether PC patients (overall or CDKN2A+ and CDKN2A- cases separately) had an increased number of rare nonsynonymous variants. Overall, we identified 35 missense variants in PC patients, 14 in CDKN2A+ and 21 in CDKN2A- PC cases. We found nominally significant associations for mismatch repair genes (MLH1, MSH2, MSH6, PMS2) in all PC patients and for ATM, CPA1, and PMS2 in CDKN2A- PC patients. Further, nine CDKN2A+ and four CDKN2A- PC patients had rare potentially deleterious variants in multiple PC-related genes. Loss-of-function variants were only observed in CDKN2A- PC patients, with ATM having the most pathogenic variants. Also, ATM variants (n = 5) were only observed in CDKN2A- PC patients with a family history that included digestive system tumors. Our results suggest that a subset of PC patients may have increased risk because of germline mutations in multiple PC-related genes.
Collapse
Affiliation(s)
- Xiaohong R Yang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Melissa Rotunno
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
- Division of Cancer Control and Population Studies, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Yanzi Xiao
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | | | - Hildur Helgadottir
- Department of Oncology Pathology, Karolinska Institutet and Karolinska University Hospital, Solna, Stockholm, Sweden
| | - Lorenza Pastorino
- Department of Internal Medicine and Medical Specialties, University of Genoa, Genoa, Italy
- Genetics of Rare Cancers, IRCCS AOU San Martino-IST, Genoa, Italy
| | - Remco van Doorn
- Department of Dermatology, Leiden University Medical Center, Leiden, The Netherlands
| | - Hunter Bennett
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Cole Graham
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Joshua N Sampson
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Michael Malasky
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD, USA
| | - Aurelie Vogt
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD, USA
| | - Bin Zhu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD, USA
| | - Giovanna Bianchi-Scarra
- Department of Internal Medicine and Medical Specialties, University of Genoa, Genoa, Italy
- Genetics of Rare Cancers, IRCCS AOU San Martino-IST, Genoa, Italy
| | - William Bruno
- Department of Internal Medicine and Medical Specialties, University of Genoa, Genoa, Italy
- Genetics of Rare Cancers, IRCCS AOU San Martino-IST, Genoa, Italy
| | - Paola Queirolo
- Medical Oncology Unit, IRCCS AOU San Martino-IST, Genoa, Italy
| | | | - Johan Hansson
- Department of Oncology Pathology, Karolinska Institutet and Karolinska University Hospital, Solna, Stockholm, Sweden
| | - Rainer Tuominen
- Department of Oncology Pathology, Karolinska Institutet and Karolinska University Hospital, Solna, Stockholm, Sweden
| | - Laurie Burdett
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD, USA
| | - Belynda Hicks
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD, USA
| | - Amy Hutchinson
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD, USA
| | - Kristine Jones
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD, USA
| | - Meredith Yeager
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD, USA
| | - Stephen J Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Maria Teresa Landi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Veronica Höiom
- Department of Oncology Pathology, Karolinska Institutet and Karolinska University Hospital, Solna, Stockholm, Sweden
| | - Håkan Olsson
- Department of Oncology, Lund University Hospital, Lund, Sweden
| | - Nelleke Gruis
- Department of Dermatology, Leiden University Medical Center, Leiden, The Netherlands
| | - Paola Ghiorzo
- Department of Internal Medicine and Medical Specialties, University of Genoa, Genoa, Italy
- Genetics of Rare Cancers, IRCCS AOU San Martino-IST, Genoa, Italy
| | - Margaret A Tucker
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Alisa M Goldstein
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA.
- , 9609 Medical Center Dr, Bethesda, MD, 20892-9769, USA.
| |
Collapse
|
14
|
Belonogova NM, Svishcheva GR, Axenovich TI. FREGAT: an R package for region-based association analysis. ACTA ACUST UNITED AC 2016; 32:2392-3. [PMID: 27153598 DOI: 10.1093/bioinformatics/btw160] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Accepted: 03/20/2016] [Indexed: 11/14/2022]
Abstract
UNLABELLED Several approaches to the region-based association analysis of quantitative traits have recently been developed and successively applied. However, no software package has been developed that implements all of these approaches for either independent or structured samples. Here we introduce FREGAT (Family REGional Association Tests), an R package that can handle family and population samples and implements a wide range of region-based association methods including burden tests, functional linear models, and kernel machine-based regression. FREGAT can be used in genome/exome-wide region-based association studies of quantitative traits and candidate gene analysis. FREGAT offers many useful options to empower its users and increase the effectiveness and applicability of region-based association analysis. AVAILABILITY AND IMPLEMENTATION https://cran.r-project.org/web/packages/FREGAT/index.html SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Online. CONTACT belon@bionet.nsc.ru.
Collapse
Affiliation(s)
- Nadezhda M Belonogova
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk
| | - Gulnara R Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk Vavilov Institute of General Genetics, the Russian Academy of Sciences, Moscow, Russia
| | - Tatiana I Axenovich
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
15
|
Shirali M, Pong-Wong R, Navarro P, Knott S, Hayward C, Vitart V, Rudan I, Campbell H, Hastie ND, Wright AF, Haley CS. Regional heritability mapping method helps explain missing heritability of blood lipid traits in isolated populations. Heredity (Edinb) 2015; 116:333-8. [PMID: 26696135 PMCID: PMC4751621 DOI: 10.1038/hdy.2015.107] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Revised: 10/23/2015] [Accepted: 10/26/2015] [Indexed: 11/21/2022] Open
Abstract
Single single-nucleotide polymorphism (SNP) genome-wide association studies (SSGWAS) may fail to identify loci with modest effects on a trait. The recently developed regional heritability mapping (RHM) method can potentially identify such loci. In this study, RHM was compared with the SSGWAS for blood lipid traits (high-density lipoprotein (HDL), low-density lipoprotein (LDL), plasma concentrations of total cholesterol (TC) and triglycerides (TG)). Data comprised 2246 adults from isolated populations genotyped using ∼300 000 SNP arrays. The results were compared with large meta-analyses of these traits for validation. Using RHM, two significant regions affecting HDL on chromosomes 15 and 16 and one affecting LDL on chromosome 19 were identified. These regions covered the most significant SNPs associated with HDL and LDL from the meta-analysis. The chromosome 19 region was identified in our data despite the fact that the most significant SNP in the meta-analysis (or any SNP tagging it) was not genotyped in our SNP array. The SSGWAS identified one SNP associated with HDL on chromosome 16 (the top meta-analysis SNP) and one on chromosome 10 (not reported by RHM or in the meta-analysis and hence possibly a false positive association). The results further confirm that RHM can have better power than SSGWAS in detecting causal regions including regions containing crucial ungenotyped variants. This study suggests that RHM can be a useful tool to explain some of the ‘missing heritability' of complex trait variation.
Collapse
Affiliation(s)
- M Shirali
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - R Pong-Wong
- Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, UK
| | - P Navarro
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - S Knott
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | - C Hayward
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - V Vitart
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - I Rudan
- Croatian Centre for Global Health, Faculty of Medicine, University of Split, Split, Croatia.,Centre for Population Health sciences, University of Edinburgh, Edinburgh, UK
| | - H Campbell
- Centre for Population Health sciences, University of Edinburgh, Edinburgh, UK
| | - N D Hastie
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - A F Wright
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - C S Haley
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK.,Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, UK
| |
Collapse
|
16
|
Svishcheva GR, Belonogova NM, Axenovich TI. Region-Based Association Test for Familial Data under Functional Linear Models. PLoS One 2015; 10:e0128999. [PMID: 26111046 PMCID: PMC4481467 DOI: 10.1371/journal.pone.0128999] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Accepted: 05/04/2015] [Indexed: 12/22/2022] Open
Abstract
Region-based association analysis is a more powerful tool for gene mapping than testing of individual genetic variants, particularly for rare genetic variants. The most powerful methods for regional mapping are based on the functional data analysis approach, which assumes that the regional genome of an individual may be considered as a continuous stochastic function that contains information about both linkage and linkage disequilibrium. Here, we extend this powerful approach, earlier applied only to independent samples, to the samples of related individuals. To this end, we additionally include a random polygene effects in functional linear model used for testing association between quantitative traits and multiple genetic variants in the region. We compare the statistical power of different methods using Genetic Analysis Workshop 17 mini-exome family data and a wide range of simulation scenarios. Our method increases the power of regional association analysis of quantitative traits compared with burden-based and kernel-based methods for the majority of the scenarios. In addition, we estimate the statistical power of our method using regions with small number of genetic variants, and show that our method retains its advantage over burden-based and kernel-based methods in this case as well. The new method is implemented as the R-function 'famFLM' using two types of basis functions: the B-spline and Fourier bases. We compare the properties of the new method using models that differ from each other in the type of their function basis. The models based on the Fourier basis functions have an advantage in terms of speed and power over the models that use the B-spline basis functions and those that combine B-spline and Fourier basis functions. The 'famFLM' function is distributed under GPLv3 license and is freely available at http://mga.bionet.nsc.ru/soft/famFLM/.
Collapse
Affiliation(s)
- Gulnara R. Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Nadezhda M. Belonogova
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Tatiana I. Axenovich
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
17
|
Feng S, Pistis G, Zhang H, Zawistowski M, Mulas A, Zoledziewska M, Holmen OL, Busonero F, Sanna S, Hveem K, Willer C, Cucca F, Liu DJ, Abecasis GR. Methods for association analysis and meta-analysis of rare variants in families. Genet Epidemiol 2015; 39:227-38. [PMID: 25740221 DOI: 10.1002/gepi.21892] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Revised: 01/03/2015] [Accepted: 01/26/2015] [Indexed: 11/09/2022]
Abstract
Advances in exome sequencing and the development of exome genotyping arrays are enabling explorations of association between rare coding variants and complex traits. To ensure power for these rare variant analyses, a variety of association tests that group variants by gene or functional unit have been proposed. Here, we extend these tests to family-based studies. We develop family-based burden tests, variable frequency threshold tests and sequence kernel association tests. Through simulations, we compare the performance of different tests. We describe situations where family-based studies provide greater power than studies of unrelated individuals to detect rare variants associated with moderate to large changes in trait values. Broadly speaking, we find that when sample sizes are limited and only a modest fraction of all trait-associated variants can be identified, family samples are more powerful. Finally, we illustrate our approach by analyzing the relationship between coding variants and levels of high-density lipoprotein (HDL) cholesterol in 11,556 individuals from the HUNT and SardiNIA studies, demonstrating association for coding variants in the APOC3, CETP, LIPC, LIPG, and LPL genes and illustrating the value of family samples, meta-analysis, and gene-level tests. Our methods are implemented in freely available C++ code.
Collapse
Affiliation(s)
- Shuang Feng
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Lin WY. Adaptive combination of P-values for family-based association testing with sequence data. PLoS One 2014; 9:e115971. [PMID: 25541952 PMCID: PMC4277421 DOI: 10.1371/journal.pone.0115971] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Accepted: 12/01/2014] [Indexed: 12/24/2022] Open
Abstract
Family-based study design will play a key role in identifying rare causal variants, because rare causal variants can be enriched in families with multiple affected subjects. Furthermore, different from population-based studies, family studies are robust to bias induced by population substructure. It is well known that rare causal variants are difficult to detect from single-locus tests. Therefore, burden tests and non-burden tests have been developed, by combining signals of multiple variants in a chromosomal region or a functional unit. This inevitably incorporates some neutral variants into the test statistics, which can dilute the power of statistical methods. To guard against the noise caused by neutral variants, we here propose an 'adaptive combination of P-values method' (abbreviated as 'ADA'). This method combines per-site P-values of variants that are more likely to be causal. Variants with large P-values (which are more likely to be neutral variants) are discarded from the combined statistic. In addition to performing extensive simulation studies, we applied these tests to the Genetic Analysis Workshop 17 data sets, where real sequence data were generated according to the 1000 Genomes Project. Compared with some existing methods, ADA is more robust to the inclusion of neutral variants. This is a merit especially when dichotomous traits are analyzed. However, there are some limitations for ADA. First, it is more computationally intensive. Second, pedigree structures and founders' sequence data are required for the permutation procedure. Third, unrelated controls cannot be included. We here show that, for family-based studies, the application of ADA is limited to dichotomous trait analyses with full pedigree information.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|