1
|
Burrows K, Heiskala A, Bradfield JP, Balkhiyarova Z, Ning L, Boissel M, Chan YM, Froguel P, Bonnefond A, Hakonarson H, Alves AC, Lawlor DA, Kaakinen M, Järvelin MR, Grant SF, Tilling K, Prokopenko I, Sebert S, Canouil M, Warrington NM. A framework for conducting time-varying genome-wide association studies: An application to body mass index across childhood in six multiethnic cohorts. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.13.24304263. [PMID: 38559031 PMCID: PMC10980110 DOI: 10.1101/2024.03.13.24304263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Genetic effects on changes in human traits over time are understudied and may have important pathophysiological impact. We propose a framework that enables data quality control, implements mixed models to evaluate trajectories of change in traits, and estimates phenotypes to identify age-varying genetic effects in genome-wide association studies (GWASs). Using childhood body mass index (BMI) as an example, we included 71,336 participants from six cohorts and estimated the slope and area under the BMI curve within four time periods (infancy, early childhood, late childhood and adolescence) for each participant, in addition to the age and BMI at the adiposity peak and the adiposity rebound. GWAS on each of the estimated phenotypes identified 28 genome-wide significant variants at 13 loci across the 12 estimated phenotypes, one of which was novel (in DAOA) and had not been previously associated with childhood or adult BMI. Genetic studies of changes in human traits over time could uncover novel biological mechanisms influencing quantitative traits.
Collapse
Affiliation(s)
- Kimberley Burrows
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Anni Heiskala
- Research Unit of Population Health, University of Oulu, Oulu, Finland
| | - Jonathan P. Bradfield
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Quantinuum Research LLC, Wayne, PA, USA
| | - Zhanna Balkhiyarova
- Department of Clinical and Experimental Medicine, School of Biosciences and Medicine, University of Surrey, Guildford, UK
- People-Centred Artificial Intelligence Institute, University of Surrey, Guildford, UK
- Section of Metabolism, Digestion and Reproduction, Department of Medicine, Imperial College London, London, UK
| | - Lijiao Ning
- Univ Lille, INSERM/CNRS UMR1283/8199, EGID, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Mathilde Boissel
- Univ Lille, INSERM/CNRS UMR1283/8199, EGID, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Yee-Ming Chan
- Division of Endocrinology, Department of Pediatrics, Boston Children’s Hospital
- Department of Pediatrics, Harvard Medical School
| | - Philippe Froguel
- Univ Lille, INSERM/CNRS UMR1283/8199, EGID, Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Department of Metabolism, Digestion and Reproduction, Imperial College London, London, United Kingdom
| | - Amelie Bonnefond
- Univ Lille, INSERM/CNRS UMR1283/8199, EGID, Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Department of Metabolism, Digestion and Reproduction, Imperial College London, London, United Kingdom
| | - Hakon Hakonarson
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | | | - Deborah A Lawlor
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Marika Kaakinen
- Department of Clinical and Experimental Medicine, School of Biosciences and Medicine, University of Surrey, Guildford, UK
- People-Centred Artificial Intelligence Institute, University of Surrey, Guildford, UK
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, UK
| | - Marjo-Riitta Järvelin
- Research Unit of Population Health, University of Oulu, Oulu, Finland
- MRC Centre for Environment and Health, Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, United Kingdom
- Department of Life Sciences, College of Health and Life Sciences, Brunel University London, London, United Kingdom
| | - Struan F.A. Grant
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Divisions of Human Genetics and Endocrinology and Diabetes, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kate Tilling
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, UK
| | - Inga Prokopenko
- Department of Clinical and Experimental Medicine, School of Biosciences and Medicine, University of Surrey, Guildford, UK
- People-Centred Artificial Intelligence Institute, University of Surrey, Guildford, UK
| | - Sylvain Sebert
- Research Unit of Population Health, University of Oulu, Oulu, Finland
| | - Mickaël Canouil
- Univ Lille, INSERM/CNRS UMR1283/8199, EGID, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Nicole M Warrington
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
- Frazer Institute, University of Queensland, Brisbane, Australia
| |
Collapse
|
2
|
Xu G, Amei A, Wu W, Liu Y, Shen L, Oh EC, Wang Z. RETROSPECTIVE VARYING COEFFICIENT ASSOCIATION ANALYSIS OF LONGITUDINAL BINARY TRAITS: APPLICATION TO THE IDENTIFICATION OF GENETIC LOCI ASSOCIATED WITH HYPERTENSION. Ann Appl Stat 2024; 18:487-505. [PMID: 38577266 PMCID: PMC10994004 DOI: 10.1214/23-aoas1798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024]
Abstract
Many genetic studies contain rich information on longitudinal phenotypes that require powerful analytical tools for optimal analysis. Genetic analysis of longitudinal data that incorporates temporal variation is important for understanding the genetic architecture and biological variation of complex diseases. Most of the existing methods assume that the contribution of genetic variants is constant over time and fail to capture the dynamic pattern of disease progression. However, the relative influence of genetic variants on complex traits fluctuates over time. In this study, we propose a retrospective varying coefficient mixed model association test, RVMMAT, to detect time-varying genetic effect on longitudinal binary traits. We model dynamic genetic effect using smoothing splines, estimate model parameters by maximizing a double penalized quasi-likelihood function, design a joint test using a Cauchy combination method, and evaluate statistical significance via a retrospective approach to achieve robustness to model misspecification. Through simulations we illustrated that the retrospective varying-coefficient test was robust to model misspecification under different ascertainment schemes and gained power over the association methods assuming constant genetic effect. We applied RVMMAT to a genome-wide association analysis of longitudinal measure of hypertension in the Multi-Ethnic Study of Atherosclerosis. Pathway analysis identified two important pathways related to G-protein signaling and DNA damage. Our results demonstrated that RVMMAT could detect biologically relevant loci and pathways in a genome scan and provided insight into the genetic architecture of hypertension.
Collapse
Affiliation(s)
- Gang Xu
- Department of Mathematical Sciences, University of Nevada
| | - Amei Amei
- Department of Mathematical Sciences, University of Nevada
| | - Weimiao Wu
- Department of Biostatistics, Yale School of Public Health
| | - Yunqing Liu
- Department of Biostatistics, Yale School of Public Health
| | - Linchuan Shen
- Department of Mathematical Sciences, University of Nevada
| | - Edwin C. Oh
- Department of Internal Medicine, University of Nevada School of Medicine
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health
| |
Collapse
|
3
|
Jin H, Kwak SH, Yoon JW, Lee S, Park KS, Won S, Cho NH. Genome-Wide Association Study on Longitudinal Change in Fasting Plasma Glucose in Korean Population. Diabetes Metab J 2023; 47:255-266. [PMID: 36653889 PMCID: PMC10040618 DOI: 10.4093/dmj.2021.0375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 04/27/2022] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) on type 2 diabetes mellitus (T2DM) have identified more than 400 distinct genetic loci associated with diabetes and nearly 120 loci for fasting plasma glucose (FPG) and fasting insulin level to date. However, genetic risk factors for the longitudinal deterioration of FPG have not been thoroughly evaluated. We aimed to identify genetic variants associated with longitudinal change of FPG over time. METHODS We used two prospective cohorts in Korean population, which included a total of 10,528 individuals without T2DM. GWAS of repeated measure of FPG using linear mixed model was performed to investigate the interaction of genetic variants and time, and meta-analysis was conducted. Genome-wide complex trait analysis was used for heritability calculation. In addition, expression quantitative trait loci (eQTL) analysis was performed using the Genotype-Tissue Expression project. RESULTS A small portion (4%) of the genome-wide single nucleotide polymorphism (SNP) interaction with time explained the total phenotypic variance of longitudinal change in FPG. A total of four known genetic variants of FPG were associated with repeated measure of FPG levels. One SNP (rs11187850) showed a genome-wide significant association for genetic interaction with time. The variant is an eQTL for NOC3 like DNA replication regulator (NOC3L) gene in pancreas and adipose tissue. Furthermore, NOC3L is also differentially expressed in pancreatic β-cells between subjects with or without T2DM. However, this variant was not associated with increased risk of T2DM nor elevated FPG level. CONCLUSION We identified rs11187850, which is an eQTL of NOC3L, to be associated with longitudinal change of FPG in Korean population.
Collapse
Affiliation(s)
- Heejin Jin
- Institute of Health and Environment, Seoul National University, Seoul, Korea
| | - Soo Heon Kwak
- Department of Internal Medicine, Seoul National University Hospital, Seoul, Korea
| | - Ji Won Yoon
- Department of Internal Medicine, Seoul National University Hospital Healthcare System Gangnam Center, Seoul, Korea
| | - Sanghun Lee
- Department of Bioconvergence & Engineering, Dankook University, Yongin, Korea
| | - Kyong Soo Park
- Department of Internal Medicine, Seoul National University Hospital, Seoul, Korea
- Department of Internal Medicine, Seoul National University College of Medicine, Seoul, Korea
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Korea
| | - Sungho Won
- Institute of Health and Environment, Seoul National University, Seoul, Korea
- Department of Public Health Sciences, Seoul National University, Seoul, Korea
- RexSoft Inc., Seoul, Korea
| | - Nam H. Cho
- Department of Preventive Medicine, Ajou University School of Medicine, Suwon, Korea
| |
Collapse
|
4
|
Alamin M, Sultana MH, Lou X, Jin W, Xu H. Dissecting Complex Traits Using Omics Data: A Review on the Linear Mixed Models and Their Application in GWAS. PLANTS (BASEL, SWITZERLAND) 2022; 11:3277. [PMID: 36501317 PMCID: PMC9739826 DOI: 10.3390/plants11233277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/23/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
Genome-wide association study (GWAS) is the most popular approach to dissecting complex traits in plants, humans, and animals. Numerous methods and tools have been proposed to discover the causal variants for GWAS data analysis. Among them, linear mixed models (LMMs) are widely used statistical methods for regulating confounding factors, including population structure, resulting in increased computational proficiency and statistical power in GWAS studies. Recently more attention has been paid to pleiotropy, multi-trait, gene-gene interaction, gene-environment interaction, and multi-locus methods with the growing availability of large-scale GWAS data and relevant phenotype samples. In this review, we have demonstrated all possible LMMs-based methods available in the literature for GWAS. We briefly discuss the different LMM methods, software packages, and available open-source applications in GWAS. Then, we include the advantages and weaknesses of the LMMs in GWAS. Finally, we discuss the future perspective and conclusion. The present review paper would be helpful to the researchers for selecting appropriate LMM models and methods quickly for GWAS data analysis and would benefit the scientific society.
Collapse
Affiliation(s)
- Md. Alamin
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | | | - Xiangyang Lou
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Wenfei Jin
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Haiming Xu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
5
|
Degani E, Maestrini L, Toczydłowska D, Wand MP. Sparse linear mixed model selection via streamlined variational Bayes. Electron J Stat 2022. [DOI: 10.1214/22-ejs2063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Emanuele Degani
- Dipartimento di Scienze Statistiche Università degli Studi di Padova, Padova, Italy
| | - Luca Maestrini
- Research School of Finance, Actuarial Studies and Statistics The Australian National University, Canberra, Australia
| | - Dorota Toczydłowska
- School of Mathematical and Physical Sciences University of Technology Sydney, Sydney, Australia
| | - Matt P. Wand
- School of Mathematical and Physical Sciences University of Technology Sydney, Sydney, Australia
| |
Collapse
|
6
|
A genome-wide association study of the longitudinal course of executive functions. Transl Psychiatry 2021; 11:386. [PMID: 34247186 PMCID: PMC8272719 DOI: 10.1038/s41398-021-01510-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 06/04/2021] [Accepted: 06/15/2021] [Indexed: 01/13/2023] Open
Abstract
Executive functions are metacognitive capabilities that control and coordinate mental processes. In the transdiagnostic PsyCourse Study, comprising patients of the affective-to-psychotic spectrum and controls, we investigated the genetic basis of the time course of two core executive subfunctions: set-shifting (Trail Making Test, part B (TMT-B)) and updating (Verbal Digit Span backwards) in 1338 genotyped individuals. Time course was assessed with four measurement points, each 6 months apart. Compared to the initial assessment, executive performance improved across diagnostic groups. We performed a genome-wide association study to identify single nucleotide polymorphisms (SNPs) associated with performance change over time by testing for SNP-by-time interactions using linear mixed models. We identified nine genome-wide significant SNPs for TMT-B in strong linkage disequilibrium with each other on chromosome 5. These were associated with decreased performance on the continuous TMT-B score across time. Variant rs150547358 had the lowest P value = 7.2 × 10-10 with effect estimate beta = 1.16 (95% c.i.: 1.11, 1.22). Implementing data of the FOR2107 consortium (1795 individuals), we replicated these findings for the SNP rs150547358 (P value = 0.015), analyzing the difference of the two available measurement points two years apart. In the replication study, rs150547358 exhibited a similar effect estimate beta = 0.85 (95% c.i.: 0.74, 0.97). Our study demonstrates that longitudinally measured phenotypes have the potential to unmask novel associations, adding time as a dimension to the effects of genomics.
Collapse
|
7
|
Wu W, Wang Z, Xu K, Zhang X, Amei A, Gelernter J, Zhao H, Justice AC, Wang Z. Retrospective Association Analysis of Longitudinal Binary Traits Identifies Important Loci and Pathways in Cocaine Use. Genetics 2019; 213:1225-1236. [PMID: 31591132 PMCID: PMC6893384 DOI: 10.1534/genetics.119.302598] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 10/04/2019] [Indexed: 12/15/2022] Open
Abstract
Longitudinal phenotypes have been increasingly available in genome-wide association studies (GWAS) and electronic health record-based studies for identification of genetic variants that influence complex traits over time. For longitudinal binary data, there remain significant challenges in gene mapping, including misspecification of the model for phenotype distribution due to ascertainment. Here, we propose L-BRAT (Longitudinal Binary-trait Retrospective Association Test), a retrospective, generalized estimating equation-based method for genetic association analysis of longitudinal binary outcomes. We also develop RGMMAT, a retrospective, generalized linear mixed model-based association test. Both tests are retrospective score approaches in which genotypes are treated as random conditional on phenotype and covariates. They allow both static and time-varying covariates to be included in the analysis. Through simulations, we illustrated that retrospective association tests are robust to ascertainment and other types of phenotype model misspecification, and gain power over previous association methods. We applied L-BRAT and RGMMAT to a genome-wide association analysis of repeated measures of cocaine use in a longitudinal cohort. Pathway analysis implicated association with opioid signaling and axonal guidance signaling pathways. Lastly, we replicated important pathways in an independent cocaine dependence case-control GWAS. Our results illustrate that L-BRAT is able to detect important loci and pathways in a genome scan and to provide insights into genetic architecture of cocaine use.
Collapse
Affiliation(s)
- Weimiao Wu
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| | - Zhong Wang
- Baker Institute for Animal Health, Cornell University, Ithaca, New York 14850
| | - Ke Xu
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06511
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
| | - Xinyu Zhang
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06511
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
| | - Amei Amei
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Nevada 89154
| | - Joel Gelernter
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06511
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| | - Amy C Justice
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
- Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut 06511
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| |
Collapse
|
8
|
Ierodiakonou D, Coull BA, Zanobetti A, Postma DS, Boezen HM, Vonk JM, McKone EF, Schildcrout JS, Koppelman GH, Croteau-Chonka DC, Lumley T, Koutrakis P, Schwartz J, Gold DR, Weiss ST. Pathway analysis of a genome-wide gene by air pollution interaction study in asthmatic children. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2019; 29:539-547. [PMID: 31028280 PMCID: PMC10730425 DOI: 10.1038/s41370-019-0136-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2018] [Revised: 11/23/2018] [Accepted: 03/08/2019] [Indexed: 05/05/2023]
Abstract
OBJECTIVES We aimed to investigate the role of genetics in the respiratory response of asthmatic children to air pollution, with a genome-wide level analysis of gene by nitrogen dioxide (NO2) and carbon monoxide (CO) interaction on lung function and to identify biological pathways involved. METHODS We used a two-step method for fast linear mixed model computations for genome-wide association studies, exploring whether variants modify the longitudinal relationship between 4-month average pollution and post-bronchodilator FEV1 in 522 Caucasian and 88 African-American asthmatic children. Top hits were confirmed with classic linear mixed-effect models. We used the improved gene set enrichment analysis for GWAS (i-GSEA4GWAS) to identify plausible pathways. RESULTS Two SNPs near the EPHA3 (rs13090972 and rs958144) and one in TXNDC8 (rs7041938) showed significant interactions with NO2 in Caucasians but we did not replicate this locus in African-Americans. SNP-CO interactions did not reach genome-wide significance. The i-GSEA4GWAS showed a pathway linked to the HO-1/CO system to be associated with CO-related FEV1 changes. For NO2-related FEV1 responses, we identified pathways involved in cellular adhesion, oxidative stress, inflammation, and metabolic responses. CONCLUSION The host lung function response to long-term exposure to pollution is linked to genes involved in cellular adhesion, oxidative stress, inflammatory, and metabolic pathways.
Collapse
Affiliation(s)
- Despo Ierodiakonou
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
- Groningen Research Institute for Asthma and COPD, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
| | - Brent A Coull
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Antonella Zanobetti
- Environmental Epidemiology and Risk Program, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Dirkje S Postma
- Groningen Research Institute for Asthma and COPD, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
- Department of Pulmonology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - H Marike Boezen
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
- Groningen Research Institute for Asthma and COPD, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Judith M Vonk
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
- Groningen Research Institute for Asthma and COPD, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Edward F McKone
- Department of Respiratory Medicine, St. Vincent University Hospital, Dublin, Ireland
| | - Jonathan S Schildcrout
- Department of Environmental and Occupational Health Sciences, School of Public Health, University of Washington, Seattle, WA, United States
| | - Gerard H Koppelman
- Groningen Research Institute for Asthma and COPD, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
- Department of Pediatric Pulmonology and Pediatric Allergology-Beatrix Children Hospital, University of Groningen, University Medical Center, Groningen, The Netherlands
| | - Damien C Croteau-Chonka
- Channing Division of Network Medicine, Brigham and Women's Hospital, Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Thomas Lumley
- Department of Biostatistics, University of Auckland, Auckland, New Zealand
| | - Petros Koutrakis
- Environmental Epidemiology and Risk Program, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Joel Schwartz
- Environmental Epidemiology and Risk Program, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Diane R Gold
- Environmental Epidemiology and Risk Program, Harvard T.H. Chan School of Public Health, Boston, MA, United States
- Channing Division of Network Medicine, Brigham and Women's Hospital, Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Scott T Weiss
- Channing Division of Network Medicine, Brigham and Women's Hospital, Department of Medicine, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
9
|
Wang Z, Wang N, Wu R, Wang Z. fGWAS: An R package for genome-wide association analysis with longitudinal phenotypes. J Genet Genomics 2018; 45:411-413. [PMID: 30049619 DOI: 10.1016/j.jgg.2018.06.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Revised: 06/18/2018] [Accepted: 06/27/2018] [Indexed: 10/28/2022]
Affiliation(s)
- Zhong Wang
- College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China; Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY 14850, USA.
| | - Nating Wang
- College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Rongling Wu
- Center for Computational Biology, Beijing Forestry University, Beijing 100083, China; Center for Statistical Genetics, Department of Public Health Sciences, Pennsylvania State College of Medicine, Hershey, PA 17033, USA
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510, USA.
| |
Collapse
|
10
|
Varga TV, Kurbasic A, Aine M, Eriksson P, Ali A, Hindy G, Gustafsson S, Luan J, Shungin D, Chen Y, Schulz CA, Nilsson PM, Hallmans G, Barroso I, Deloukas P, Langenberg C, Scott RA, Wareham NJ, Lind L, Ingelsson E, Melander O, Orho-Melander M, Renström F, Franks PW. Novel genetic loci associated with long-term deterioration in blood lipid concentrations and coronary artery disease in European adults. Int J Epidemiol 2018; 46:1211-1222. [PMID: 27864399 DOI: 10.1093/ije/dyw245] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/04/2016] [Indexed: 11/14/2022] Open
Abstract
Background Cross-sectional genome-wide association studies have identified hundreds of loci associated with blood lipids and related cardiovascular traits, but few genetic association studies have focused on long-term changes in blood lipids. Methods Participants from the GLACIER Study (Nmax = 3492) were genotyped with the MetaboChip array, from which 29 387 SNPs (single nucleotide polymorphisms; replication, fine-mapping regions and wildcard SNPs for lipid traits) were extracted for association tests with 10-year change in total cholesterol (ΔTC) and triglycerides (ΔTG). Four additional prospective cohort studies (MDC, PIVUS, ULSAM, MRC Ely; Nmax = 8263 participants) were used for replication. We conducted an in silico look-up for association with coronary artery disease (CAD) in the Coronary ARtery DIsease Genome-wide Replication and Meta-analysis (CARDIoGRAMplusC4D) Consortium (N ∼ 190 000) and functional annotation for the top ranking variants. Results In total, 956 variants were associated (P < 0.01) with either ΔTC or ΔTG in GLACIER. In GLACIER, chr19:50121999 at APOE was associated with ΔTG and multiple SNPs in the APOA1/A4/C3/A5 region at genome-wide significance (P < 5 × 10-8), whereas variants in four loci, DOCK7, BRE, SYNE1 and KCNIP1, reached study-wide significance (P < 1.7 × 10-6). The rs7412 variant at APOE was associated with ΔTC in GLACIER (P < 1.7 × 10-6). In pooled analyses of all cohorts, 139 SNPs at six and five loci were associated with ΔTC and for ΔTG, respectively (P < 10-3). Of these, a variant at CAPN3 (P = 1.2 × 10-4), multiple variants at HPR (Pmin = 1.5 × 10-6) and a variant at SIX5 (P = 1.9 × 10-4) showed evidence for association with CAD. Conclusions We identified seven novel genomic regions associated with long-term changes in blood lipids, of which three also raise CAD risk.
Collapse
Affiliation(s)
- Tibor V Varga
- Genetic and Molecular Epidemiology Unit, Department of Clinical Sciences, Lund University, Skåne University Hospital, Malmö, Sweden
| | - Azra Kurbasic
- Genetic and Molecular Epidemiology Unit, Department of Clinical Sciences, Lund University, Skåne University Hospital, Malmö, Sweden
| | - Mattias Aine
- Division of Oncology and Pathology, Skåne University Hospital, Lund University, Lund, Sweden
| | - Pontus Eriksson
- Division of Oncology and Pathology, Skåne University Hospital, Lund University, Lund, Sweden
| | - Ashfaq Ali
- Genetic and Molecular Epidemiology Unit, Department of Clinical Sciences, Lund University, Skåne University Hospital, Malmö, Sweden
| | - George Hindy
- Diabetes and Cardiovascular Disease - Genetic Epidemiology, Skåne University Hospital, Malmö, Sweden
| | - Stefan Gustafsson
- Molecular Epidemiology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Jian'an Luan
- Medical Research Council Epidemiology Unit, University of Cambridge, Cambridge, UK
| | - Dmitry Shungin
- Genetic and Molecular Epidemiology Unit, Department of Clinical Sciences, Lund University, Skåne University Hospital, Malmö, Sweden.,Department of Odontology.,Department of Public Health & Clinical Medicine, Umeå University, Umeå, Sweden
| | - Yan Chen
- Genetic and Molecular Epidemiology Unit, Department of Clinical Sciences, Lund University, Skåne University Hospital, Malmö, Sweden
| | | | - Peter M Nilsson
- Department of Clinical Sciences, Lund University, Skåne University Hospital, Malmö, Sweden
| | - Göran Hallmans
- Department of Biobank Research, Umeå University, Umeå, Sweden
| | - Inês Barroso
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.,Metabolic Research Laboratories.,NIHR Cambridge Biomedical Research Centre, Addenbrooke's Hospital, Cambridge, UK
| | - Panos Deloukas
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, London, UK.,Princess Al-Jawhara Al-Brahim Centre of Excellence in Research of Hereditary Disorders, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Claudia Langenberg
- Medical Research Council Epidemiology Unit, University of Cambridge, Cambridge, UK
| | - Robert A Scott
- Medical Research Council Epidemiology Unit, University of Cambridge, Cambridge, UK
| | - Nicholas J Wareham
- Medical Research Council Epidemiology Unit, University of Cambridge, Cambridge, UK
| | - Lars Lind
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Erik Ingelsson
- Molecular Epidemiology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.,Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Olle Melander
- Department of Clinical Sciences, Hypertension and Cardiovascular Diseases, Skåne University Hospital, Malmö, Sweden
| | - Marju Orho-Melander
- Diabetes and Cardiovascular Disease - Genetic Epidemiology, Skåne University Hospital, Malmö, Sweden
| | - Frida Renström
- Genetic and Molecular Epidemiology Unit, Department of Clinical Sciences, Lund University, Skåne University Hospital, Malmö, Sweden.,Department of Biobank Research, Umeå University, Umeå, Sweden
| | - Paul W Franks
- Genetic and Molecular Epidemiology Unit, Department of Clinical Sciences, Lund University, Skåne University Hospital, Malmö, Sweden.,Department of Public Health & Clinical Medicine, Umeå University, Umeå, Sweden.,Department of Nutrition, Harvard T.H Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
11
|
Sikorska K, Lesaffre E, Groenen PJF, Rivadeneira F, Eilers PHC. Genome-wide Analysis of Large-scale Longitudinal Outcomes using Penalization -GALLOP algorithm. Sci Rep 2018; 8:6815. [PMID: 29717146 PMCID: PMC5931565 DOI: 10.1038/s41598-018-24578-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 03/22/2018] [Indexed: 11/24/2022] Open
Abstract
Genome-wide association studies (GWAS) with longitudinal phenotypes provide opportunities to identify genetic variations associated with changes in human traits over time. Mixed models are used to correct for the correlated nature of longitudinal data. GWA studies are notorious for their computational challenges, which are considerable when mixed models for thousands of individuals are fitted to millions of SNPs. We present a new algorithm that speeds up a genome-wide analysis of longitudinal data by several orders of magnitude. It solves the equivalent penalized least squares problem efficiently, computing variances in an initial step. Factorizations and transformations are used to avoid inversion of large matrices. Because the system of equations is bordered, we can re-use components, which can be precomputed for the mixed model without a SNP. Two SNP effects (main and its interaction with time) are obtained. Our method completes the analysis a thousand times faster than the R package lme4, providing an almost identical solution for the coefficients and p-values. We provide an R implementation of our algorithm.
Collapse
Affiliation(s)
- Karolina Sikorska
- Department of Biometrics, Netherlands Cancer Institute, Amsterdam, The Netherlands.
| | - Emmanuel Lesaffre
- Leuven Biostatistics and Statistical Bioinformatics Centre, Leuven University, Leuven, Belgium
| | | | - Fernando Rivadeneira
- Department of Internal Medicine, Erasmus Medical Centre, Rotterdam, The Netherlands
| | - Paul H C Eilers
- Department of Biostatistics, Erasmus Medical Centre, Rotterdam, The Netherlands
| |
Collapse
|
12
|
Staley JR, Suderman M, Simpkin AJ, Gaunt TR, Heron J, Relton CL, Tilling K. Longitudinal analysis strategies for modelling epigenetic trajectories. Int J Epidemiol 2018; 47:516-525. [PMID: 29462323 PMCID: PMC5913606 DOI: 10.1093/ije/dyy012] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 12/22/2017] [Accepted: 01/24/2018] [Indexed: 12/20/2022] Open
Abstract
Background DNA methylation levels are known to vary over time, and modelling these trajectories is crucial for our understanding of the biological relevance of these changes over time. However, due to the computational cost of fitting multilevel models across the epigenome, most trajectory modelling efforts to date have focused on a subset of CpG sites identified through epigenome-wide association studies (EWAS) at individual time-points. Methods We propose using linear regression across the repeated measures, estimating cluster-robust standard errors using a sandwich estimator, as a less computationally intensive strategy than multilevel modelling. We compared these two longitudinal approaches, as well as three approaches based on EWAS (associated at baseline, at any time-point and at all time-points), for identifying epigenetic change over time related to an exposure using simulations and by applying them to blood DNA methylation profiles from the Accessible Resource for Integrated Epigenomics Studies (ARIES). Results Restricting association testing to EWAS at baseline identified a less complete set of associations than performing EWAS at each time-point or applying the longitudinal modelling approaches to the full dataset. Linear regression models with cluster-robust standard errors identified similar sets of associations with almost identical estimates of effect as the multilevel models, while also being 74 times more efficient. Both longitudinal modelling approaches identified comparable sets of CpG sites in ARIES with an association with prenatal exposure to smoking (>70% agreement). Conclusions Linear regression with cluster-robust standard errors is an appropriate and efficient approach for longitudinal analysis of DNA methylation data.
Collapse
Affiliation(s)
- James R Staley
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Matthew Suderman
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Andrew J Simpkin
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Tom R Gaunt
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Jon Heron
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Caroline L Relton
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Kate Tilling
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| |
Collapse
|
13
|
Use of Selective Serotonin Reuptake Inhibitors and Bone Mineral Density Change: A Population-Based Longitudinal Study in Middle-Aged and Elderly Individuals. J Clin Psychopharmacol 2017; 37:524-530. [PMID: 28816927 DOI: 10.1097/jcp.0000000000000756] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
BACKGROUND Longitudinal studies showed conflicting results regarding the association between use of selective serotonin reuptake inhibitors (SSRIs) and bone mineral density (BMD). Therefore, we investigate the association between-duration of-SSRI use and BMD, and change in BMD ([INCREMENT]BMD). METHODS Data from the population-based Rotterdam Study cohort (1991-2008) were used. In total, 4915 men and 5831 postmenopausal women, aged 45 years and older, were included, having measurement visits at 4- to 5-year intervals. Multivariable linear mixed models were applied to examine the association between SSRI use, based on pharmacy records, duration of SSRI use, and repeated measures of BMD, and changes in BMD, compared with nonuse. Femoral neck BMD (grams per centimeters squared) was measured at 4 visits, comprising 19,861 BMD measurements. Three [INCREMENT]BMD periods were examined, comprising 7897 [INCREMENT]BMD values. Change in BMD was expressed in the annual percentage [INCREMENT]BMD between 2 consecutive visits. RESULTS In men and women, we observed no association between SSRI and BMD when compared with nonuse (women: mean difference, 0.007 g/cm; 95% confidence interval, -0.002 to 0.017; P = 0.123). We did not find an association between duration of SSRI use and [INCREMENT]BMD (women: annual percentage change, -0.081; 95% confidence interval, -0.196 to 0.033; P = 0.164). CONCLUSIONS In conclusion, use of SSRIs is not associated with BMD or [INCREMENT]BMD, after taking duration of treatment into account, in middle-aged and elderly individuals. Therefore, our results question previously raised concerns on the adverse effects of SSRIs on BMD.
Collapse
|
14
|
Qian J, Nunez S, Kim S, Reilly MP, Foulkes AS. A score test for genetic class-level association with nonlinear biomarker trajectories. Stat Med 2017; 36:3075-3091. [PMID: 28543585 DOI: 10.1002/sim.7314] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Revised: 01/12/2017] [Accepted: 03/22/2017] [Indexed: 11/06/2022]
Abstract
Emerging data suggest that the genetic regulation of the biological response to inflammatory stress may be fundamentally different to the genetic underpinning of the homeostatic control (resting state) of the same biological measures. In this paper, we interrogate this hypothesis using a single-SNP score test and a novel class-level testing strategy to characterize protein-coding gene and regulatory element-level associations with longitudinal biomarker trajectories in response to stimulus. Using the proposed class-level association score statistic for longitudinal data, which accounts for correlations induced by linkage disequilibrium, the genetic underpinnings of evoked dynamic changes in repeatedly measured biomarkers are investigated. The proposed method is applied to data on two biomarkers arising from the Genetics of Evoked Responses to Niacin and Endotoxemia study, a National Institutes of Health-sponsored investigation of the genomics of inflammatory and metabolic responses during low-grade endotoxemia. Our results suggest that the genetic basis of evoked inflammatory response is different than the genetic contributors to resting state, and several potentially novel loci are identified. A simulation study demonstrates appropriate control of type-1 error rates, relative computational efficiency, and power. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Jing Qian
- Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA, U.S.A
| | - Sara Nunez
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA, U.S.A
| | - Soohyun Kim
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA, U.S.A
| | | | - Andrea S Foulkes
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA, U.S.A
| |
Collapse
|
15
|
Wang Z, Xu K, Zhang X, Wu X, Wang Z. Longitudinal SNP-set association analysis of quantitative phenotypes. Genet Epidemiol 2016; 41:81-93. [PMID: 27859628 DOI: 10.1002/gepi.22016] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Revised: 08/10/2016] [Accepted: 09/19/2016] [Indexed: 02/06/2023]
Abstract
Many genetic epidemiological studies collect repeated measurements over time. This design not only provides a more accurate assessment of disease condition, but allows us to explore the genetic influence on disease development and progression. Thus, it is of great interest to study the longitudinal contribution of genes to disease susceptibility. Most association testing methods for longitudinal phenotypes are developed for single variant, and may have limited power to detect association, especially for variants with low minor allele frequency. We propose Longitudinal SNP-set/sequence kernel association test (LSKAT), a robust, mixed-effects method for association testing of rare and common variants with longitudinal quantitative phenotypes. LSKAT uses several random effects to account for the within-subject correlation in longitudinal data, and allows for adjustment for both static and time-varying covariates. We also present a longitudinal trait burden test (LBT), where we test association between the trait and the burden score in linear mixed models. In simulation studies, we demonstrate that LBT achieves high power when variants are almost all deleterious or all protective, while LSKAT performs well in a wide range of genetic models. By making full use of trait values from repeated measures, LSKAT is more powerful than several tests applied to a single measurement or average over all time points. Moreover, LSKAT is robust to misspecification of the covariance structure. We apply the LSKAT and LBT methods to detect association with longitudinally measured body mass index in the Framingham Heart Study, where we are able to replicate association with a circadian gene NR1D2.
Collapse
Affiliation(s)
- Zhong Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.,Baker Institute for Animal Health, Cornell University, Ithaca, NY, USA.,Center for Computational Biology, Beijing Forestry University, Beijing, China
| | - Ke Xu
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA.,VA Connecticut Healthcare System, West Haven, CT, USA
| | - Xinyu Zhang
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA.,VA Connecticut Healthcare System, West Haven, CT, USA
| | - Xiaowei Wu
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
16
|
Rönnegård L, McFarlane SE, Husby A, Kawakami T, Ellegren H, Qvarnström A. Increasing the power of genome wide association studies in natural populations using repeated measures - evaluation and implementation. Methods Ecol Evol 2016; 7:792-799. [PMID: 27478587 PMCID: PMC4950150 DOI: 10.1111/2041-210x.12535] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2015] [Accepted: 12/12/2015] [Indexed: 12/03/2022]
Abstract
Genomewide association studies (GWAS) enable detailed dissections of the genetic basis for organisms' ability to adapt to a changing environment. In long‐term studies of natural populations, individuals are often marked at one point in their life and then repeatedly recaptured. It is therefore essential that a method for GWAS includes the process of repeated sampling. In a GWAS, the effects of thousands of single‐nucleotide polymorphisms (SNPs) need to be fitted and any model development is constrained by the computational requirements. A method is therefore required that can fit a highly hierarchical model and at the same time is computationally fast enough to be useful. Our method fits fixed SNP effects in a linear mixed model that can include both random polygenic effects and permanent environmental effects. In this way, the model can correct for population structure and model repeated measures. The covariance structure of the linear mixed model is first estimated and subsequently used in a generalized least squares setting to fit the SNP effects. The method was evaluated in a simulation study based on observed genotypes from a long‐term study of collared flycatchers in Sweden. The method we present here was successful in estimating permanent environmental effects from simulated repeated measures data. Additionally, we found that especially for variable phenotypes having large variation between years, the repeated measurements model has a substantial increase in power compared to a model using average phenotypes as a response. The method is available in the r package RepeatABEL. It increases the power in GWAS having repeated measures, especially for long‐term studies of natural populations, and the R implementation is expected to facilitate modelling of longitudinal data for studies of both animal and human populations.
Collapse
Affiliation(s)
- Lars Rönnegård
- Department of Clinical Sciences Swedish University of Agricultural Sciences SE-75007 Uppsala Sweden
| | - S Eryn McFarlane
- Department of Animal Ecology Evolutionary Biology Centre (EBC) Uppsala University Norbyvägen 18D SE-75236 Uppsala Sweden
| | - Arild Husby
- Department of Biosciences Metapopulation Research Centre University of Helsinki PO Box 65FI-00014 Helsinki Finland; Department of Biology Centre for Biodiversity Dynamics Norwegian University of Science and Technology N-7491 Trondheim Norway
| | - Takeshi Kawakami
- Department of Evolutionary Biology Evolutionary Biology Centre (EBC) Uppsala University Norbyvägen 18D SE-75236 Uppsala Sweden
| | - Hans Ellegren
- Department of Evolutionary Biology Evolutionary Biology Centre (EBC) Uppsala University Norbyvägen 18D SE-75236 Uppsala Sweden
| | - Anna Qvarnström
- Department of Animal Ecology Evolutionary Biology Centre (EBC) Uppsala University Norbyvägen 18D SE-75236 Uppsala Sweden
| |
Collapse
|
17
|
Sung Y, Feng Z, Subedi S. A genome-wide association study of multiple longitudinal traits with related subjects. Stat (Int Stat Inst) 2016; 5:22-44. [PMID: 27134745 DOI: 10.1002/sta4.102] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Pleiotropy is a phenomenon that a single gene inflicts multiple correlated phenotypic effects, often characterized as traits, involving multiple biological systems. We propose a two-stage method to identify pleiotropic effects on multiple longitudinal traits from a family-based data set. The first stage analyzes each longitudinal trait via a three-level mixed-effects model. Random effects at the subject-level and at the family-level measure the subject-specific genetic effects and between-subjects intraclass correlations within families, respectively. The second stage performs a simultaneous association test between a single nucleotide polymorphism and all subject-specific effects for multiple longitudinal traits. This is performed using a quasi-likelihood scoring method in which the correlation structure among related subjects is adjusted. Two simulation studies for the proposed method are undertaken to assess both the type I error control and the power. Furthermore, we demonstrate the utility of the two-stage method in identifying pleiotropic genes or loci by analyzing the Genetic Analysis Workshop 16 Problem 2 cohort data drawn from the Framingham Heart Study and illustrate an example of the kind of complexity in data that can be handled by the proposed approach. We establish that our two-stage method can identify pleiotropic effects whilst accommodating varying data types in the model.
Collapse
Affiliation(s)
- Yubin Sung
- Department of Mathematics & Statistics, University of Guelph, Guelph, Ontario, N1G 2W1, Canada
| | - Zeny Feng
- Department of Mathematics & Statistics, University of Guelph, Guelph, Ontario, N1G 2W1, Canada
| | - Sanjeena Subedi
- Department of Mathematics & Statistics, University of Guelph, Guelph, Ontario, N1G 2W1, Canada
| |
Collapse
|
18
|
Li Z, Sillanpää MJ. Dynamic Quantitative Trait Locus Analysis of Plant Phenomic Data. TRENDS IN PLANT SCIENCE 2015; 20:822-833. [PMID: 26482958 DOI: 10.1016/j.tplants.2015.08.012] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Revised: 08/12/2015] [Accepted: 08/26/2015] [Indexed: 05/27/2023]
Abstract
Advanced platforms have recently become available for automatic and systematic quantification of plant growth and development. These new techniques can efficiently produce multiple measurements of phenotypes over time, and introduce time as an extra dimension to quantitative trait locus (QTL) studies. Functional mapping utilizes a class of statistical models for identifying QTLs associated with the growth characteristics of interest. A major benefit of functional mapping is that it integrates information over multiple timepoints, and therefore could increase the statistical power for QTL detection. We review the current development of computationally efficient functional mapping methods which provide invaluable tools for analyzing large-scale timecourse data that are readily available in our post-genome era.
Collapse
Affiliation(s)
- Zitong Li
- Biocenter Oulu, Oulu, Finland; Department of Mathematical Sciences and Department of Biology, University of Oulu, 90014 Oulu, Finland
| | - Mikko J Sillanpää
- Biocenter Oulu, Oulu, Finland; Department of Mathematical Sciences and Department of Biology, University of Oulu, 90014 Oulu, Finland.
| |
Collapse
|
19
|
Warrington NM, Kemp JP, Tilling K, Tobias JH, Evans DM. Genetic variants in adult bone mineral density and fracture risk genes are associated with the rate of bone mineral density acquisition in adolescence. Hum Mol Genet 2015; 24:4158-66. [PMID: 25941325 PMCID: PMC4476449 DOI: 10.1093/hmg/ddv143] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Revised: 04/09/2015] [Accepted: 04/19/2015] [Indexed: 01/27/2023] Open
Abstract
Previous studies have identified 63 single-nucleotide polymorphisms (SNPs) associated with bone mineral density (BMD) in adults. These SNPs are thought to reflect variants that influence bone maintenance and/or loss in adults. It is unclear whether they affect the rate of bone acquisition during adolescence. Bone measurements and genetic data were available on 6397 individuals from the Avon Longitudinal Study of Parents and Children at up to five follow-up clinics. Linear mixed effects models with smoothing splines were used for longitudinal modelling of BMD and its components bone mineral content (BMC) and bone area (BA), from 9 to 17 years. Genotype data from the 63 adult BMD associated SNPs were investigated individually and as a genetic risk score in the longitudinal model. Each additional BMD lowering allele of the genetic risk score was associated with lower BMD at age 13 [per allele effect size, 0.002 g/cm(2) (SE = 0.0001, P = 1.24 × 10(-38))] and decreased BMD acquisition from 9 to 17 years (P = 9.17 × 10(-7)). This association was driven by changes in BMC rather than BA. The genetic risk score explained ∼2% of the variation in BMD at 9 and 17 years, a third of that explained in adults (6%). Genetic variants that putatively affect bone maintenance and/or loss in adults appear to have a small influence on the rate of bone acquisition through adolescence.
Collapse
Affiliation(s)
- Nicole M Warrington
- The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, Australia,
| | - John P Kemp
- The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, Australia, MRC Integrative Epidemiology Unit, School of Social and Community Medicine and
| | - Kate Tilling
- MRC Integrative Epidemiology Unit, School of Social and Community Medicine and
| | | | - David M Evans
- The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, Australia, MRC Integrative Epidemiology Unit, School of Social and Community Medicine and
| |
Collapse
|
20
|
GWAS with longitudinal phenotypes: performance of approximate procedures. Eur J Hum Genet 2015; 23:1384-91. [PMID: 25712081 DOI: 10.1038/ejhg.2015.1] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2014] [Revised: 12/09/2014] [Accepted: 12/12/2014] [Indexed: 11/08/2022] Open
Abstract
Analysis of genome-wide association studies with longitudinal data using standard procedures, such as linear mixed model (LMM) fitting, leads to discouragingly long computation times. There is a need to speed up the computations significantly. In our previous work (Sikorska et al: Fast linear mixed model computations for genome-wide association studies with longitudinal data. Stat Med 2012; 32.1: 165-180), we proposed the conditional two-step (CTS) approach as a fast method providing an approximation to the P-value for the longitudinal single-nucleotide polymorphism (SNP) effect. In the first step a reduced conditional LMM is fit, omitting all the SNP terms. In the second step, the estimated random slopes are regressed on SNPs. The CTS has been applied to the bone mineral density data from the Rotterdam Study and proved to work very well even in unbalanced situations. In another article (Sikorska et al: GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies. BMC Bioinformatics 2013; 14: 166), we suggested semi-parallel computations, greatly speeding up fitting many linear regressions. Combining CTS with fast linear regression reduces the computation time from several weeks to a few minutes on a single computer. Here, we explore further the properties of the CTS both analytically and by simulations. We investigate the performance of our proposal in comparison with a related but different approach, the two-step procedure. It is analytically shown that for the balanced case, under mild assumptions, the P-value provided by the CTS is the same as from the LMM. For unbalanced data and in realistic situations, simulations show that the CTS method does not inflate the type I error rate and implies only a minimal loss of power.
Collapse
|
21
|
Wu K, Gamazon ER, Im HK, Geeleher P, White SR, Solway J, Clemmer GL, Weiss ST, Tantisira KG, Cox NJ, Ratain MJ, Huang RS. Genome-wide interrogation of longitudinal FEV1 in children with asthma. Am J Respir Crit Care Med 2014; 190:619-27. [PMID: 25221879 DOI: 10.1164/rccm.201403-0460oc] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
RATIONALE Most genomic studies of lung function have used phenotypic data derived from a single time-point (e.g., presence/absence of disease) without considering the dynamic progression of a chronic disease. OBJECTIVES To characterize lung function change over time in subjects with asthma and identify genetic contributors to a longitudinal phenotype. METHODS We present a method that models longitudinal FEV1 data, collected from 1,041 children with asthma who participated in the Childhood Asthma Management Program. This longitudinal progression model was built using population-based nonlinear mixed-effects modeling with an exponential structure and the determinants of age and height. MEASUREMENTS AND MAIN RESULTS We found ethnicity was a key covariate for FEV1 level. Budesonide-treated children with asthma had a slight but significant effect on FEV1 when compared with those treated with placebo or nedocromil (P < 0.001). A genome-wide association study identified seven single-nucleotide polymorphisms nominally associated with longitudinal lung function phenotypes in 581 white Childhood Asthma Management Program subjects (P < 10(-4) in the placebo ["discovery"] and P < 0.05 in the nedocromil treatment ["replication"] group). Using ChIP-seq and RNA-seq data, we found that some of the associated variants were in strong enhancer regions in human lung fibroblasts and may affect gene expression in human lung tissue. Genetic mapping restricted to genome-wide enhancer single-nucleotide polymorphisms in lung fibroblasts revealed a highly significant variant (rs6763931; P = 4 × 10(-6); false discovery rate < 0.05). CONCLUSIONS This study offers a strategy to explore the genetic determinants of longitudinal phenotypes, provide a comprehensive picture of disease pathophysiology, and suggest potential treatment targets.
Collapse
|
22
|
Functional multi-locus QTL mapping of temporal trends in Scots pine wood traits. G3-GENES GENOMES GENETICS 2014; 4:2365-79. [PMID: 25305041 PMCID: PMC4267932 DOI: 10.1534/g3.114.014068] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Quantitative trait loci (QTL) mapping of wood properties in conifer species has focused on single time point measurements or on trait means based on heterogeneous wood samples (e.g., increment cores), thus ignoring systematic within-tree trends. In this study, functional QTL mapping was performed for a set of important wood properties in increment cores from a 17-yr-old Scots pine (Pinus sylvestris L.) full-sib family with the aim of detecting wood trait QTL for general intercepts (means) and for linear slopes by increasing cambial age. Two multi-locus functional QTL analysis approaches were proposed and their performances were compared on trait datasets comprising 2 to 9 time points, 91 to 455 individual tree measurements and genotype datasets of amplified length polymorphisms (AFLP), and single nucleotide polymorphism (SNP) markers. The first method was a multilevel LASSO analysis whereby trend parameter estimation and QTL mapping were conducted consecutively; the second method was our Bayesian linear mixed model whereby trends and underlying genetic effects were estimated simultaneously. We also compared several different hypothesis testing methods under either the LASSO or the Bayesian framework to perform QTL inference. In total, five and four significant QTL were observed for the intercepts and slopes, respectively, across wood traits such as earlywood percentage, wood density, radial fiberwidth, and spiral grain angle. Four of these QTL were represented by candidate gene SNPs, thus providing promising targets for future research in QTL mapping and molecular function. Bayesian and LASSO methods both detected similar sets of QTL given datasets that comprised large numbers of individuals.
Collapse
|
23
|
Xu Z, Shen X, Pan W. Longitudinal analysis is more powerful than cross-sectional analysis in detecting genetic association with neuroimaging phenotypes. PLoS One 2014; 9:e102312. [PMID: 25098835 PMCID: PMC4123854 DOI: 10.1371/journal.pone.0102312] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 06/17/2014] [Indexed: 01/08/2023] Open
Abstract
Most existing genome-wide association analyses are cross-sectional, utilizing only phenotypic data at a single time point, e.g. baseline. On the other hand, longitudinal studies, such as Alzheimer's Disease Neuroimaging Initiative (ADNI), collect phenotypic information at multiple time points. In this article, as a case study, we conducted both longitudinal and cross-sectional analyses of the ADNI data with several brain imaging (not clinical diagnosis) phenotypes, demonstrating the power gains of longitudinal analysis over cross-sectional analysis. Specifically, we scanned genome-wide single nucleotide polymorphisms (SNPs) with 56 brain-wide imaging phenotypes processed by FreeSurfer on 638 subjects. At the genome-wide significance level P < 1.8 x 10(9)) or a less stringent level (e.g. P < 10(7)), longitudinal analysis of the phenotypic data from the baseline to month 48 identified more SNP-phenotype associations than cross-sectional analysis of only the baseline data. In particular, at the genome-wide significance level, both SNP rs429358 in gene APOE and SNP rs2075650 in gene TOMM40 were confirmed to be associated with various imaging phenotypes in multiple regions of interests (ROIs) by both analyses, though longitudinal analysis detected more regional phenotypes associated with the two SNPs and indicated another significant SNP rs439401 in gene APOE. In light of the power advantage of longitudinal analysis, we advocate its use in current and future longitudinal neuroimaging studies.
Collapse
Affiliation(s)
- Zhiyuan Xu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
- * E-mail:
| | | |
Collapse
|
24
|
Howe LD, Parmar PG, Paternoster L, Warrington NM, Kemp JP, Briollais L, Newnham JP, Timpson NJ, Smith GD, Ring SM, Evans DM, Tilling K, Pennell CE, Beilin LJ, Palmer LJ, Lawlor DA. Genetic influences on trajectories of systolic blood pressure across childhood and adolescence. ACTA ACUST UNITED AC 2013; 6:608-14. [PMID: 24200906 DOI: 10.1161/circgenetics.113.000197] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Blood pressure (BP) tends to increase across childhood and adolescence, but the genetic influences on rates of BP change are not known. Potentially important genetic influences could include genetic variants identified in genome-wide association studies of adults as being associated with BP, height, and body mass index. Understanding the contribution of these genetic variants to changes in BP across childhood and adolescence could yield understanding into the life course development of cardiovascular risk. METHODS AND RESULTS Pooling data from 2 cohorts (the Avon Longitudinal Study of Parents and Children [n=7013] and the Western Australian Pregnancy Cohort [n=1459]), we examined the associations of allelic scores of 29 single-nucleotide polymorphisms (SNPs) for adult BP, 180 height SNPs, and 32 body mass index SNPs, with trajectories of systolic BP (SBP) from 6 to 17 years of age, using linear spline multilevel models. The allelic scores of BP and body mass index SNPs were associated with SBP at 6 years of age (per-allele effect sizes, 0.097 mm Hg [SE, 0.039 mm Hg] and 0.107 mm Hg [SE, 0.037 mm Hg]); associations with age-related changes in SBP between 6 and 17 years of age were of small magnitude and imprecisely estimated. The allelic score of height SNPs was only weakly associated with SBP changes. No sex or cohort differences in genetic effects were observed. CONCLUSIONS Allelic scores of BP and body mass index SNPs demonstrated associations with SBP at 6 years of age with a similar magnitude but were not strongly associated with changes in SBP with age between 6 and 17 years. Further work is required to identify variants associated with changes with age in BP.
Collapse
Affiliation(s)
- Laura D Howe
- MRC Integrative Epidemiology Unit at the University of Bristol
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Sikorska K, Lesaffre E, Groenen PFJ, Eilers PHC. GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies. BMC Bioinformatics 2013; 14:166. [PMID: 23711206 PMCID: PMC3695771 DOI: 10.1186/1471-2105-14-166] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 04/24/2013] [Indexed: 11/27/2022] Open
Abstract
Background Genome-wide association studies have become very popular in identifying genetic contributions to phenotypes. Millions of SNPs are being tested for their association with diseases and traits using linear or logistic regression models. This conceptually simple strategy encounters the following computational issues: a large number of tests and very large genotype files (many Gigabytes) which cannot be directly loaded into the software memory. One of the solutions applied on a grand scale is cluster computing involving large-scale resources. We show how to speed up the computations using matrix operations in pure R code. Results We improve speed: computation time from 6 hours is reduced to 10-15 minutes. Our approach can handle essentially an unlimited amount of covariates efficiently, using projections. Data files in GWAS are vast and reading them into computer memory becomes an important issue. However, much improvement can be made if the data is structured beforehand in a way allowing for easy access to blocks of SNPs. We propose several solutions based on the R packages ff and ncdf. We adapted the semi-parallel computations for logistic regression. We show that in a typical GWAS setting, where SNP effects are very small, we do not lose any precision and our computations are few hundreds times faster than standard procedures. Conclusions We provide very fast algorithms for GWAS written in pure R code. We also show how to rearrange SNP data for fast access.
Collapse
Affiliation(s)
- Karolina Sikorska
- Department of Biostatistics, Erasmus MC, Rotterdam, The Netherlands.
| | | | | | | |
Collapse
|
26
|
Benke KS, Wu Y, Fallin DM, Maher B, Palmer LJ. Strategy to control type I error increases power to identify genetic variation using the full biological trajectory. Genet Epidemiol 2013; 37:419-30. [PMID: 23633177 DOI: 10.1002/gepi.21733] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Revised: 03/21/2013] [Accepted: 04/02/2013] [Indexed: 01/18/2023]
Abstract
Genome-wide association studies have been successful in identifying loci that underlie continuous traits measured at a single time point. To additionally consider continuous traits longitudinally, it is desirable to look at SNP effects at baseline and over time using linear-mixed effects models. Estimation and interpretation of two coefficients in the same model raises concern regarding the optimal control of type I error. To investigate this issue, we calculate type I error and power under an alternative for joint tests, including the two degree of freedom likelihood ratio test, and compare this to single degree of freedom tests for each effect separately at varying alpha levels. We show which joint tests are the optimal way to control the type I error and also illustrate that information can be gained by joint testing in situations where either or both SNP effects are underpowered. We also show that closed form power calculations can approximate simulated power for the case of balanced data, provide reasonable approximations for imbalanced data, but overestimate power for complicated residual error structures. We conclude that a two degree of freedom test is an attractive strategy in a hypothesis-free genome-wide setting and recommend its use for genome-wide studies employing linear-mixed effects models.
Collapse
Affiliation(s)
- K S Benke
- Johns Hopkins Bloomberg School of Public Health, Mental Health Department, Baltimore, Maryland 21205, USA.
| | | | | | | | | |
Collapse
|